Figure 2 - uploaded by Jean Alain Trejaut
Content may be subject to copyright.
Phylogenetic tree of 47 Y-chromosome haplogroups seen in this study (shown in boldface) and hierarchically defined using 81 slowly evolving binary markers (68 in the Figure). The marker names are shown along the branches, and haplogroup names are shown on the right side according to ISOGG Y-DNA Haplogroup Tree 2011. Potentially paraphyletic undefined subgroups are distinguished from recognized haplogroups by the asterisk symbol. Haplogroups tested for but not seen in this study are shown in (italic). See Additional file 1: Table S2 for a more detailed frequency table. 

Phylogenetic tree of 47 Y-chromosome haplogroups seen in this study (shown in boldface) and hierarchically defined using 81 slowly evolving binary markers (68 in the Figure). The marker names are shown along the branches, and haplogroup names are shown on the right side according to ISOGG Y-DNA Haplogroup Tree 2011. Potentially paraphyletic undefined subgroups are distinguished from recognized haplogroups by the asterisk symbol. Haplogroups tested for but not seen in this study are shown in (italic). See Additional file 1: Table S2 for a more detailed frequency table. 

Source publication
Article
Full-text available
Background: Much of the data resolution of the haploid non-recombining Y chromosome (NRY) haplogroup O in East Asia are still rudimentary and could be an explanatory factor for current debates on the settlement history of Island Southeast Asia (ISEA). Here, 81 slowly evolving markers (mostly SNPs) and 17 Y-chromosomal short tandem repeats were used...

Contexts in source publication

Context 1
... of 5), DYS389I (4), DYS389II (3), DYS390 (2), DYS391 (2), DYS392 (20), and DYS393 (20) [57-60]. The frequency distribution of Y-chromosome SNP haplogroups detected in Taiwan, ISEA and Indochina is shown in Figure 2 and reported in detail in Additional file 1: Table S2, and are summarized is Additional file 1: Figures S1 and S2. Additional file 1: Figure S3 displays the variation of diversity measures according to latitude among TwMtA. The interpolation contour maps resulting from applying the Kriging method both to the frequency distributions of Y-SNP clades and their internal STR diversity are shown side-by-side in Figure 3. Forty seven out of the 81 genotyped Y-SNPs were observed in the derived state [61], thus defining 47 haplogroups observed in our samples, that belong to major clades C, D, F*, G, H, J, K*, L, N, O, P*, Q and R. MJ networks of major Y-SNP clades in SEA are shown in Figure 4. Age estimates of the diversity of these clades, based on the assumption that mutations in haplotypes accumulate in situ (i.e. no gene flow), are reported in Table 2 and Additional file 1: Table S3. The most prominent clade in Taiwan, O-M175 as a whole, represents almost 90% of Y chromosomes among the TwHan, about 95% among the TwPlt and more than 99% among the TwMtA (Additional file 1: Figures S1 and S2). Only one representative of the basal O*/O1*- M175 (xM119, P31, M122) was seen in each of the Luzon (Philippines), Fujian and TwPlt samples (Figure 2 and Additional file 1: Table S2). All other haplogroups of the O clade were observed at the derived state for the M119, P31 and M122 markers (Figure 2, Additional file 1: Table S2, and Additional file 1: Figures S1 and S2). Haplogroup O1a*-M119 (Figure 2 and Additional file 1: Table S2) is seen throughout Batan (42%), the Philippines (4% to 33%) and Indonesia (4% to 18%). It has a patchy distribution among TwMtA (3-33%) and only some southern TwMtA show frequencies greater than 10% (i.e. Puyuma, Paiwan and Yami). O1a*-M119 was not observed in our Amis sample, neither in the Bunun, Saisiyat and Thao and has a low frequency among TwHan (1.4% to 2%). O1a*-M119 was neither observed outside Taiwan, among the Kalimantan in Borneo nor in Sumatra. However, the contour map interpolation (which includes published data, see Additional file 1: Table S1) indicates its presence in Western Sumatra, towards the Indian Ocean (Figure 3). The internal STR diversity of O1a*-M119 decreases gradually from mainland China towards the south, although a second, somewhat lower peak of diversity is observed around the Moluccas. The MJ network of O1a*-M119 (Figure 4) clearly differentiates TwMtA and TwPlt from Indonesia (Indonesian STR haplotypes are all included in the lower left reticula- tion of the O1a*-M119 network), whereas most Filipinos O1a*-M119 haplotypes are shared or very similar to those found among TwMtA and TwPlt. Age estimates based on the amount of molecular variation for O1a*-M119 were higher among TwMtA (19.96 ± 5.93 Kya) and the Philippines (20.36 ± 5.65 Kya) than in Indonesia (4.14 ± 2.67 Kya) (Table 2 and Additional file 1: Table S3). O1a1*-P203, derived from O1a*-M119 (Figure 2 and Additional file 1: Table S2), is the most common haplogroup among the northern TwMtA and the Tsou (90%), while among southern TwMtA it represents about half of the Y chromosomes observed. It is also quite common in several TwPlt (e.g. Pazeh, Ketagalan and Siraya in various locations), but less frequent in the Philippines (13.7%) and Indonesia (16.3%), although it is observed in 36% of the Kalimantan in Borneo. It is also commonly seen in Han (22% and 14.2% in Fujian and TwHan, respectively), but uncommon in Thailand (2.7%), and was not observed in the Vietnamese (Hanoi) and the Akha. Accordingly, the contour map of the frequency variation of O1a1*-P203 shows two main locations of high frequency, in Taiwan and southwestern Borneo, and decreasing frequencies from these locations towards the Philippines, whereas the contour map of its internal diversity is more complex (Figure 3). The O1a1*-P203 MJ network has a marked star-like shape, with a central frequent haplotype detected throughout mainland and insular SEA (Figure 4). Interestingly, haplotypes observed in the Atayal branch off from this central node to form a distinct network, suggesting a founding event and a period of isolation in this population. The intriguing observation of Tsou O1a1*-P203 haplotypes deriving from those of Atayal evidenced by the seven STRs MJ network is no longer sustained when using more than 13 Y-STRs, as the two groups then split at an earlier founding level (13 Y-STR MJ network, data not shown). Age estimates of O1a1*-P203 in TwMtA (16.3 ± 5.9 Kya) and the Philippines (16.05 ± 3.6) were higher than estimates obtained for TwHan, Fujian or Western Indonesia (8.59 ± 3.3, 11.6 ± 6.1 and 7.28 ± 4.0, respectively) (Table 2 and Additional file 1: Table S3). Haplogroup O1a1a-M101 [46] is a para-group of O1a1*-P203 (Figure 2). It was tested on all O1a1*-P203 in our dataset but was not seen. Haplogroup O1a2-M50 (or O1a2-M110, Figure 2 and Additional file 1: Table S2), a sister haplogroup of O1a1*-P203, is frequently observed among southern TwMtA (from 18% to 28%) and in the Bunun (61%), variable in the Philippines (from 3% to 16.7%), but rather rare in Western Indonesia (Java and Sumatra, ~5.3%). Note however that a high frequency of this haplotype has been reported in South Nias (~80%), an island of the Indian Ocean in front of Sumatra [26,63,64]. O1a2-M50 was not seen in our dataset in the Yami and was rare in TwHan, Fujian, Thailand, Vietnam and Malaysia. However, it has been reported as prevalent among a few Daic-speaking groups from southern China and Hainan (3% to 25%) [26,65], thus explaining the decrease in frequency and molecular variation of O1a2-M50 from Taiwan to the Philippines and Indonesia (Figure 3). Age estimates of diversity of this haplogroup decrease from Taiwan (17.54 ± 3.56 Kya) to the Philippines (12.5 ± 5.9 Kya) and Western Indonesia (7.06 ± 2.63 Kya) (Table 2 and Additional file 1: Table S3). The star-like shape of the O1a2-M50 MJ network places in the central nodes the STR haplotypes observed among TwMtA, TwPlt, Filipinos and Indonesians (Figure 4). We also notice that the haplotype continuity observed along the Taiwan-Philippines-Indonesian pathway can be traced further away toward Madagascar and the Solomon Islands [7,66,67]. Haplogroup O2 is comprised of two derived branches (Figure 2). The first branch includes subtypes O2*-P31 observed in Han (4%), TwPlt (5.7%), the Philippines (Luzon, 4.7%) and Indonesia (3.25%), and O2a*-PK4, O2a1*-M95 and O2a1a-M88 mostly seen among groups speaking Daic languages in South China and Indochina, and in Indonesia (Additional file 1: Table S2). The second O2 branch includes subtypes of O2b-SRY 465 that are only found among Korean, Japanese and northern Chinese [11,68]. The distribution of the recently defined O2a*-PK4 [42] was reanalyzed by two separate laboratories (P. Underhill, Stanford University School of Medicine, CA, USA, personal communication and this study) using a total of 105 individuals (data not shown). The A to T transversion of PK4 [8,42] was confirmed to be ancestral to the M95 SNP (Figure 2). In our dataset, haplogroups O2a1-M95 and O2a1a-M88 occurred principally in Indochina (~20% and 20% for O2a1-M95 and O2a1a-M88, respectively, in Thailand, and 8% and 25% in Vietnam) and Indonesia (29% and 3%), and had a scattered distribution between Fujian, TwHan and TwPlt, with rather low frequencies. Among the TwMtA, the Yami (10% and 3.33% for O2a1-M95 and O2a1a-M88, respectively) and Bunun (0% and 37.5%) were the only Mountain tribes bearing haplogroup O2. While a study reported the presence of O2a1-M95 and O2a1a-M88 in Mindanao [7], only O2a1a-M88 (3%) was seen in our Philippines dataset (6.45% in Visayas, 3.42% on average for all the Philippines). We notice that the most frequent O2a1a-M88 Y-STR haplotype in the Bunun is also observed in China, among Daic-speaking populations, in Indochina and in Indonesia, but not among Solomon islanders, TwPlt or Han (Figure 4). In fact, Bunun O2a1a-M88 lineages do not belong to the MJ branch observed in the Solomon Islands or Madagascar. Eleven O3 lineages, out of the 19 described by Karafet et al. [8,26,39] were observed in this study (Figure 2 and Additional file 1: Table S2). The most widely distributed sub-haplogroups of O3 in our dataset are O3a1c*-IMS- JST002611, O3a2*-P201, O3a2b*-M7 and O3a2c*-P164. A rather high prevalence of haplogroup O3a1c*-IMS- JST002611 is observed in Fujian Han (~25%) and in the Taiwanese Minnan and Hakka (13% and 21%), as well as among the TwPlt (12%), whereas it occurs at an extremely low frequency in TwMtA (it was only observed in the Yami, at 3%), in the Philippines (0.8%) and in Indonesia (2%) (Figure 2 and Additional file 1: Table S2). Its high frequency among the Ivatan (25%) of the Batan archipelago north of the Philippines is however associated with a low age estimate (0.9 Kya) (Table 2 and Additional file 1: Table S3). The MJ network of this haplogroup is characterized by the presence of Han Y-chromosomes at most founding nodes (Figure 4). Accordingly, a location of high frequency of haplogroup O3a1c*-IMS-JST002611 emerges from the contour map, centered in the southeastern Chinese coast, with no clear clinal pattern of frequency variation across SEA (Figure 3). The contour map of O3a1c*-IMS-JST002611 diversity displays a high peak in Indonesia, but this is due to a few (only 4), molecularly distant STR-haplotypes which translates to an older but less precise age estimate (28.47 ± 13.76 Kya, Table 2 and Additional file 1: Table S3). Other commonly observed haplogroups in the O3 series are the paragroups of the O3a2*-P201 branch (Figure 2 and Additional file 1: Table S2). Note that mutation P201 has not been tested in the Daic-speaking groups of China and Hainan ...
Context 2
... (17.54 ± 3.56 Kya) to the Philippines (12.5 ± 5.9 Kya) and Western Indonesia (7.06 ± 2.63 Kya) (Table 2 and Additional file 1: Table S3). The star-like shape of the O1a2-M50 MJ network places in the central nodes the STR haplotypes observed among TwMtA, TwPlt, Filipinos and Indonesians (Figure 4). We also notice that the haplotype continuity observed along the Taiwan-Philippines-Indonesian pathway can be traced further away toward Madagascar and the Solomon Islands [7,66,67]. Haplogroup O2 is comprised of two derived branches (Figure 2). The first branch includes subtypes O2*-P31 observed in Han (4%), TwPlt (5.7%), the Philippines (Luzon, 4.7%) and Indonesia (3.25%), and O2a*-PK4, O2a1*-M95 and O2a1a-M88 mostly seen among groups speaking Daic languages in South China and Indochina, and in Indonesia (Additional file 1: Table S2). The second O2 branch includes subtypes of O2b-SRY 465 that are only found among Korean, Japanese and northern Chinese [11,68]. The distribution of the recently defined O2a*-PK4 [42] was reanalyzed by two separate laboratories (P. Underhill, Stanford University School of Medicine, CA, USA, personal communication and this study) using a total of 105 individuals (data not shown). The A to T transversion of PK4 [8,42] was confirmed to be ancestral to the M95 SNP (Figure 2). In our dataset, haplogroups O2a1-M95 and O2a1a-M88 occurred principally in Indochina (~20% and 20% for O2a1-M95 and O2a1a-M88, respectively, in Thailand, and 8% and 25% in Vietnam) and Indonesia (29% and 3%), and had a scattered distribution between Fujian, TwHan and TwPlt, with rather low frequencies. Among the TwMtA, the Yami (10% and 3.33% for O2a1-M95 and O2a1a-M88, respectively) and Bunun (0% and 37.5%) were the only Mountain tribes bearing haplogroup O2. While a study reported the presence of O2a1-M95 and O2a1a-M88 in Mindanao [7], only O2a1a-M88 (3%) was seen in our Philippines dataset (6.45% in Visayas, 3.42% on average for all the Philippines). We notice that the most frequent O2a1a-M88 Y-STR haplotype in the Bunun is also observed in China, among Daic-speaking populations, in Indochina and in Indonesia, but not among Solomon islanders, TwPlt or Han (Figure 4). In fact, Bunun O2a1a-M88 lineages do not belong to the MJ branch observed in the Solomon Islands or Madagascar. Eleven O3 lineages, out of the 19 described by Karafet et al. [8,26,39] were observed in this study (Figure 2 and Additional file 1: Table S2). The most widely distributed sub-haplogroups of O3 in our dataset are O3a1c*-IMS- JST002611, O3a2*-P201, O3a2b*-M7 and O3a2c*-P164. A rather high prevalence of haplogroup O3a1c*-IMS- JST002611 is observed in Fujian Han (~25%) and in the Taiwanese Minnan and Hakka (13% and 21%), as well as among the TwPlt (12%), whereas it occurs at an extremely low frequency in TwMtA (it was only observed in the Yami, at 3%), in the Philippines (0.8%) and in Indonesia (2%) (Figure 2 and Additional file 1: Table S2). Its high frequency among the Ivatan (25%) of the Batan archipelago north of the Philippines is however associated with a low age estimate (0.9 Kya) (Table 2 and Additional file 1: Table S3). The MJ network of this haplogroup is characterized by the presence of Han Y-chromosomes at most founding nodes (Figure 4). Accordingly, a location of high frequency of haplogroup O3a1c*-IMS-JST002611 emerges from the contour map, centered in the southeastern Chinese coast, with no clear clinal pattern of frequency variation across SEA (Figure 3). The contour map of O3a1c*-IMS-JST002611 diversity displays a high peak in Indonesia, but this is due to a few (only 4), molecularly distant STR-haplotypes which translates to an older but less precise age estimate (28.47 ± 13.76 Kya, Table 2 and Additional file 1: Table S3). Other commonly observed haplogroups in the O3 series are the paragroups of the O3a2*-P201 branch (Figure 2 and Additional file 1: Table S2). Note that mutation P201 has not been tested in the Daic-speaking groups of China and Hainan [27], in which the resolution of O3 * included O3*, O3a*, O3a1*, O3a1b, O3a1c, O3a2*, O3a2a, O3a2c*, O3a3 and O3a4 [42]. Here O3a2*-P201 did not exceed 2% in any pooled population group of our data set, and reached 4.35% in the Puyuma. These low frequencies and the high age estimates seen in Taiwan (i.e. Puyuma) (15 ± 3 Kya) and Western Indonesia (17 Kya ± 2 Kya; Additional file 1: Table S3) hint late to recent gene flow. The Kriging contour maps suggest two paths linking the patterns of diversity of O3a2*-P201, namely from mainland SEA towards Taiwan and from mainland SEA to Indonesia through the Indochinese peninsula (Figure 3). However, these latter results must be viewed with caution given the general low frequency of O3a2*-P201 over the whole region. Haplogroup O3a2b*-M7 is uncommon in the Fujian Han (less than 2%), in Taiwan (only observed at low frequencies in TwHan and TwPlt, not observed among TwMtA) and the Philippines, but was more frequent in Indonesia (8.54%) (Figure 2 and Additional file 1: Table S2). In this study we observe it also in Indochina (12.5% and 6.7% in Vietnam and Thailand, respectively), as well as in Malaysia (although this latter region is represented by a sample of only 8 Y chromosomes). Age estimates of O3a2b*-M7 diversity for Indonesia (22.18 ± 6.27 Kya) and TwPlt (16.56 ± 4.67) are somewhat older than that for Thailand (12.42 ± 3.45) (Additional file 1: Table S3). However, as already noted, these estimates are based on the assumption that no diversity is introduced in a clade by gene flow between populations. However, the scattered and generally derived location of Indonesian, TwPlt and Filipino haplotypes in the O3a2b*-M7 MJ network is consistent with repeated introductions of these haplotypes by gene flow from the mainland into the islands (Figure 4). Haplogroup O3a2c*-P164 is frequently observed among TwMtA on the east coast of Taiwan (Amis 35.9%, Puyuma 13%, Taroko 5%), in the Philippines (from 8% to 50%) and in western Indonesia (17.8%, Kalimantan 20%, and Sulawesi 11.8%), while it is rather uncommon in Indochina and among the Han (Figures 2 and 3, and Additional file 1: Table S2). Interestingly, the star-like MJ network of O3a2c*-P164 shows core nodes mostly comprised of Han, Korean/Japanese and Indochinese groups (Figure 3) and radiating sectors mainly comprised of either Korean/ Japanese and Tibet, or Indochinese, or delineating TwMtA. This structure supports the dispersal paths put forward in the contour map (Figure 4). In turn, derived haplogroups O3a2c1*-M134 and O3a2c1a-M133 are rarely seen among TwMtA. They often occur together and are more frequent among Han and TwPlt as well as in Sumatra, Borneo and the Visayas in the Philippines. Age estimates of the diversity of these two haplogroups all point to the early Neolithic period (i.e. < 10,000 Kya), with the notable exception of estimates for the Han groups (Table 2 and Additional file 1: Table S3). Accordingly, the contour maps of O3a2c1*-M134 display a general location of higher frequency and diversity related to Han populations, in mainland SEA (Figure 3). Finally, haplogroups C, D, F*, H, K*, N, P*, Q and R haplogroups were observed at low frequency in the region, with patchy distributions (Figure 2 and Additional file 1: Table S2). The plot of the multidimensional scaling (MDS) analysis of pairwise Reynolds genetic distances obtained from high definition Y-SNP haplogroup frequencies in our dataset is shown in Figure 5. Most population samples are located in the center of the plot, with Austronesian speaking groups from Indonesia and the Philippines surrounded by populations from Vietnam, Thailand and Malaysia (differentiating towards the upper-left part of the plot), most TwMtA differentiating towards the bottom- right, and the Han, both from the mainland and from Taiwan, differentiating towards the upper-left along the x axis. The heavily sinisized TwPlt surround Han populations with Yulin and Papora on the left part of the Han, and the Pazeh and Siraya (from Tainan, Pingtung, and Hwalian) getting close to southern TwMtA and most likely representing less sinisized groups than other TwPlt. A relationship with the frequencies of O clades in these populations (Additional file 1: Figures S1 and S2) is evidenced by the fact that O2 haplogroups prevail in Indochinese populations located in the upper-left quad- rant of the MDS plot, whereas O1 haplogroups become more and more frequent in Indonesian, Filipino, south TwMtA and north TwMtA populations, which differentiate towards the bottom of the plot. Bunun, Akha, Atayal and Tsou are clearly four distinct outliers in the plot, due to their particularly low diversity. Indeed, only 3 haplogroups were observed among 56 Bunun chromosomes, of which O1a2-M50 at 61% and O2a1a-M88 at 37.5%, and this translates in a gene diversity level of only 0.49 (Additional file 1: Table S2). Also, only 3 haplogroups were observed among 41 Tsou and 52 Atayal chromosomes, of which O1a1*-P203 at 90% (amongst the highest frequencies observed in ISEA for this haplogroup), with a gene diversity of 0.18 for the two groups. As for the Akha population, its differentiation from all other SEA populations is driven by the unique presence of haplogroup Q-M242 (56%), which was mainly unobserved in our dataset, except for a very low frequency (<1%) in TwHan. The axis of differentiation constituted by TwMtA in the lower-right part of the MDS plot is mostly driven by the northern tribal groups (Taroko, Thao, Saisiyat and Atayal) and underlines their low gene diversity ( h is 0.10, 0.23, 0.23 and 0.18, respectively) due to the high frequency of haplogroup O1a1*-P203 (Additional file 1: Figure S1 and Additional file 1: Table S2). In a second MDS analysis, we included earlier published data (Additional file 1: Table S1) to investigate the genetic relationships of populations represented in our dataset to other East Asian groups. The resulting 2-dimensional plot is ...
Context 3
... haplogroups (clades and sub-clades) in the population samples by mere counting, and the unbiased gene diversity index, h , and its standard error were calculated using the formulas given by Nei [47]. Contour maps of interpolated spatial frequency variations of the most relevant clades in East Asia were constructed by applying the Kriging algorithm in Surfer 8.0 (Golden software). Similarly, the internal diversity of each relevant Y-SNP clade was estimated on the basis of its STR variation by computing the average variance in repeat size over STR loci (the rho statistic), following the method of Zhivotovsky et al. [48], and the spatial variation of this statistic was also interpolated with Surfer 8.0. Thirteen Y-STR loci (DYS19, DYS389 I/II, DYS390, DYS391, DYS392, DYS393, DYS385a/b, DYS437, DY438, DYS439, DYS635 and Y GATA-H4) were used to estimate the age of the variation within each SNP haplogroup following the modified coalescence method of Zhivotovsky et al. [48-50], assuming a generation time of 25 years and a single mutation rate of 0.00069 per locus per generation. However, the results were very similar to those obtained only with the seven most commonly used STRs in the data retrieved from the literature. Accordingly, the same seven Y-STR loci, or only five of them in the worst cases (noted in italic here after), were used to accommodate the literature data (DYS19, DYS389 I , DYS389 II, DYS390, DYS391, DYS392 , and DYS393 ). Age estimates of STR variation of haplogroups comprised less than 10 individuals were also calculated but results are to be considered with caution. Gene contribution estimates between populations were inferred by two methods; firstly the coalescent approach of Admix version 2.0, which infers contributions from parental populations according to STR haplotype frequencies considering each haplotype as an allele of the same locus [51,52], and secondly using the analysis of shared STR lineages (LS) between populations [53] to infer those contributions as well as to determine the unshared gene portion in the hybrid population. The Fujian and Taiwan Han samples were pooled so as to constitute a putative Han parental contributor and a pool of most samples of Austronesian speaking groups, namely TwMtA, Filipinos and Western Indonesians (i.e. all samples from Borneo, Sumatra and Java) as the other putative parental contributor. Seven STRs were used in these analyses (i.e. DYS19, DYS389I, DYS389II, DYS390, DYS391, DYS392 and DYS393. Both the SNP and STR datasets were used to perform AMOVA analyses of population structure using the Arlequin 3.5.1.2 software [54]. Multidimensional scaling (MDS) analysis was performed to represent patterns of genetic relationships between all groups in our data set. A Reynolds distance matrix was obtained from frequency distributions of SNP haplogroups with Arlequin, and used as input for MDS analysis. MDS plots were constructed using XLSTAT software version 7.5.2 [55]. Median joining (MJ) networks of Y-STR haplotypes (defined by DYS19, DYS389I, DYS389II, DYS390, DYS391, DYS392 and DYS393) for relevant SNP haplogroups and sub-haplogroups of the O clade were constructed from reduced median joining networks using the NETWORK 4.1.0.6 software [56]. We used a reduction of one and locus specific weights based on the relative mutation rates of the following Y-STR loci: DYS19 (weight of 5), DYS389I (4), DYS389II (3), DYS390 (2), DYS391 (2), DYS392 (20), and DYS393 (20) [57-60]. The frequency distribution of Y-chromosome SNP haplogroups detected in Taiwan, ISEA and Indochina is shown in Figure 2 and reported in detail in Additional file 1: Table S2, and are summarized is Additional file 1: Figures S1 and S2. Additional file 1: Figure S3 displays the variation of diversity measures according to latitude among TwMtA. The interpolation contour maps resulting from applying the Kriging method both to the frequency distributions of Y-SNP clades and their internal STR diversity are shown side-by-side in Figure 3. Forty seven out of the 81 genotyped Y-SNPs were observed in the derived state [61], thus defining 47 haplogroups observed in our samples, that belong to major clades C, D, F*, G, H, J, K*, L, N, O, P*, Q and R. MJ networks of major Y-SNP clades in SEA are shown in Figure 4. Age estimates of the diversity of these clades, based on the assumption that mutations in haplotypes accumulate in situ (i.e. no gene flow), are reported in Table 2 and Additional file 1: Table S3. The most prominent clade in Taiwan, O-M175 as a whole, represents almost 90% of Y chromosomes among the TwHan, about 95% among the TwPlt and more than 99% among the TwMtA (Additional file 1: Figures S1 and S2). Only one representative of the basal O*/O1*- M175 (xM119, P31, M122) was seen in each of the Luzon (Philippines), Fujian and TwPlt samples (Figure 2 and Additional file 1: Table S2). All other haplogroups of the O clade were observed at the derived state for the M119, P31 and M122 markers (Figure 2, Additional file 1: Table S2, and Additional file 1: Figures S1 and S2). Haplogroup O1a*-M119 (Figure 2 and Additional file 1: Table S2) is seen throughout Batan (42%), the Philippines (4% to 33%) and Indonesia (4% to 18%). It has a patchy distribution among TwMtA (3-33%) and only some southern TwMtA show frequencies greater than 10% (i.e. Puyuma, Paiwan and Yami). O1a*-M119 was not observed in our Amis sample, neither in the Bunun, Saisiyat and Thao and has a low frequency among TwHan (1.4% to 2%). O1a*-M119 was neither observed outside Taiwan, among the Kalimantan in Borneo nor in Sumatra. However, the contour map interpolation (which includes published data, see Additional file 1: Table S1) indicates its presence in Western Sumatra, towards the Indian Ocean (Figure 3). The internal STR diversity of O1a*-M119 decreases gradually from mainland China towards the south, although a second, somewhat lower peak of diversity is observed around the Moluccas. The MJ network of O1a*-M119 (Figure 4) clearly differentiates TwMtA and TwPlt from Indonesia (Indonesian STR haplotypes are all included in the lower left reticula- tion of the O1a*-M119 network), whereas most Filipinos O1a*-M119 haplotypes are shared or very similar to those found among TwMtA and TwPlt. Age estimates based on the amount of molecular variation for O1a*-M119 were higher among TwMtA (19.96 ± 5.93 Kya) and the Philippines (20.36 ± 5.65 Kya) than in Indonesia (4.14 ± 2.67 Kya) (Table 2 and Additional file 1: Table S3). O1a1*-P203, derived from O1a*-M119 (Figure 2 and Additional file 1: Table S2), is the most common haplogroup among the northern TwMtA and the Tsou (90%), while among southern TwMtA it represents about half of the Y chromosomes observed. It is also quite common in several TwPlt (e.g. Pazeh, Ketagalan and Siraya in various locations), but less frequent in the Philippines (13.7%) and Indonesia (16.3%), although it is observed in 36% of the Kalimantan in Borneo. It is also commonly seen in Han (22% and 14.2% in Fujian and TwHan, respectively), but uncommon in Thailand (2.7%), and was not observed in the Vietnamese (Hanoi) and the Akha. Accordingly, the contour map of the frequency variation of O1a1*-P203 shows two main locations of high frequency, in Taiwan and southwestern Borneo, and decreasing frequencies from these locations towards the Philippines, whereas the contour map of its internal diversity is more complex (Figure 3). The O1a1*-P203 MJ network has a marked star-like shape, with a central frequent haplotype detected throughout mainland and insular SEA (Figure 4). Interestingly, haplotypes observed in the Atayal branch off from this central node to form a distinct network, suggesting a founding event and a period of isolation in this population. The intriguing observation of Tsou O1a1*-P203 haplotypes deriving from those of Atayal evidenced by the seven STRs MJ network is no longer sustained when using more than 13 Y-STRs, as the two groups then split at an earlier founding level (13 Y-STR MJ network, data not shown). Age estimates of O1a1*-P203 in TwMtA (16.3 ± 5.9 Kya) and the Philippines (16.05 ± 3.6) were higher than estimates obtained for TwHan, Fujian or Western Indonesia (8.59 ± 3.3, 11.6 ± 6.1 and 7.28 ± 4.0, respectively) (Table 2 and Additional file 1: Table S3). Haplogroup O1a1a-M101 [46] is a para-group of O1a1*-P203 (Figure 2). It was tested on all O1a1*-P203 in our dataset but was not seen. Haplogroup O1a2-M50 (or O1a2-M110, Figure 2 and Additional file 1: Table S2), a sister haplogroup of O1a1*-P203, is frequently observed among southern TwMtA (from 18% to 28%) and in the Bunun (61%), variable in the Philippines (from 3% to 16.7%), but rather rare in Western Indonesia (Java and Sumatra, ~5.3%). Note however that a high frequency of this haplotype has been reported in South Nias (~80%), an island of the Indian Ocean in front of Sumatra [26,63,64]. O1a2-M50 was not seen in our dataset in the Yami and was rare in TwHan, Fujian, Thailand, Vietnam and Malaysia. However, it has been reported as prevalent among a few Daic-speaking groups from southern China and Hainan (3% to 25%) [26,65], thus explaining the decrease in frequency and molecular variation of O1a2-M50 from Taiwan to the Philippines and Indonesia (Figure 3). Age estimates of diversity of this haplogroup decrease from Taiwan (17.54 ± 3.56 Kya) to the Philippines (12.5 ± 5.9 Kya) and Western Indonesia (7.06 ± 2.63 Kya) (Table 2 and Additional file 1: Table S3). The star-like shape of the O1a2-M50 MJ network places in the central nodes the STR haplotypes observed among TwMtA, TwPlt, Filipinos and Indonesians (Figure 4). We also notice that the haplotype continuity observed along the Taiwan-Philippines-Indonesian pathway can be traced further away toward Madagascar and the Solomon Islands [7,66,67]. Haplogroup O2 is comprised of two derived branches (Figure 2). The first branch includes subtypes O2*-P31 observed in Han (4%), TwPlt (5.7%), the ...
Context 4
... diversity index, h , and its standard error were calculated using the formulas given by Nei [47]. Contour maps of interpolated spatial frequency variations of the most relevant clades in East Asia were constructed by applying the Kriging algorithm in Surfer 8.0 (Golden software). Similarly, the internal diversity of each relevant Y-SNP clade was estimated on the basis of its STR variation by computing the average variance in repeat size over STR loci (the rho statistic), following the method of Zhivotovsky et al. [48], and the spatial variation of this statistic was also interpolated with Surfer 8.0. Thirteen Y-STR loci (DYS19, DYS389 I/II, DYS390, DYS391, DYS392, DYS393, DYS385a/b, DYS437, DY438, DYS439, DYS635 and Y GATA-H4) were used to estimate the age of the variation within each SNP haplogroup following the modified coalescence method of Zhivotovsky et al. [48-50], assuming a generation time of 25 years and a single mutation rate of 0.00069 per locus per generation. However, the results were very similar to those obtained only with the seven most commonly used STRs in the data retrieved from the literature. Accordingly, the same seven Y-STR loci, or only five of them in the worst cases (noted in italic here after), were used to accommodate the literature data (DYS19, DYS389 I , DYS389 II, DYS390, DYS391, DYS392 , and DYS393 ). Age estimates of STR variation of haplogroups comprised less than 10 individuals were also calculated but results are to be considered with caution. Gene contribution estimates between populations were inferred by two methods; firstly the coalescent approach of Admix version 2.0, which infers contributions from parental populations according to STR haplotype frequencies considering each haplotype as an allele of the same locus [51,52], and secondly using the analysis of shared STR lineages (LS) between populations [53] to infer those contributions as well as to determine the unshared gene portion in the hybrid population. The Fujian and Taiwan Han samples were pooled so as to constitute a putative Han parental contributor and a pool of most samples of Austronesian speaking groups, namely TwMtA, Filipinos and Western Indonesians (i.e. all samples from Borneo, Sumatra and Java) as the other putative parental contributor. Seven STRs were used in these analyses (i.e. DYS19, DYS389I, DYS389II, DYS390, DYS391, DYS392 and DYS393. Both the SNP and STR datasets were used to perform AMOVA analyses of population structure using the Arlequin 3.5.1.2 software [54]. Multidimensional scaling (MDS) analysis was performed to represent patterns of genetic relationships between all groups in our data set. A Reynolds distance matrix was obtained from frequency distributions of SNP haplogroups with Arlequin, and used as input for MDS analysis. MDS plots were constructed using XLSTAT software version 7.5.2 [55]. Median joining (MJ) networks of Y-STR haplotypes (defined by DYS19, DYS389I, DYS389II, DYS390, DYS391, DYS392 and DYS393) for relevant SNP haplogroups and sub-haplogroups of the O clade were constructed from reduced median joining networks using the NETWORK 4.1.0.6 software [56]. We used a reduction of one and locus specific weights based on the relative mutation rates of the following Y-STR loci: DYS19 (weight of 5), DYS389I (4), DYS389II (3), DYS390 (2), DYS391 (2), DYS392 (20), and DYS393 (20) [57-60]. The frequency distribution of Y-chromosome SNP haplogroups detected in Taiwan, ISEA and Indochina is shown in Figure 2 and reported in detail in Additional file 1: Table S2, and are summarized is Additional file 1: Figures S1 and S2. Additional file 1: Figure S3 displays the variation of diversity measures according to latitude among TwMtA. The interpolation contour maps resulting from applying the Kriging method both to the frequency distributions of Y-SNP clades and their internal STR diversity are shown side-by-side in Figure 3. Forty seven out of the 81 genotyped Y-SNPs were observed in the derived state [61], thus defining 47 haplogroups observed in our samples, that belong to major clades C, D, F*, G, H, J, K*, L, N, O, P*, Q and R. MJ networks of major Y-SNP clades in SEA are shown in Figure 4. Age estimates of the diversity of these clades, based on the assumption that mutations in haplotypes accumulate in situ (i.e. no gene flow), are reported in Table 2 and Additional file 1: Table S3. The most prominent clade in Taiwan, O-M175 as a whole, represents almost 90% of Y chromosomes among the TwHan, about 95% among the TwPlt and more than 99% among the TwMtA (Additional file 1: Figures S1 and S2). Only one representative of the basal O*/O1*- M175 (xM119, P31, M122) was seen in each of the Luzon (Philippines), Fujian and TwPlt samples (Figure 2 and Additional file 1: Table S2). All other haplogroups of the O clade were observed at the derived state for the M119, P31 and M122 markers (Figure 2, Additional file 1: Table S2, and Additional file 1: Figures S1 and S2). Haplogroup O1a*-M119 (Figure 2 and Additional file 1: Table S2) is seen throughout Batan (42%), the Philippines (4% to 33%) and Indonesia (4% to 18%). It has a patchy distribution among TwMtA (3-33%) and only some southern TwMtA show frequencies greater than 10% (i.e. Puyuma, Paiwan and Yami). O1a*-M119 was not observed in our Amis sample, neither in the Bunun, Saisiyat and Thao and has a low frequency among TwHan (1.4% to 2%). O1a*-M119 was neither observed outside Taiwan, among the Kalimantan in Borneo nor in Sumatra. However, the contour map interpolation (which includes published data, see Additional file 1: Table S1) indicates its presence in Western Sumatra, towards the Indian Ocean (Figure 3). The internal STR diversity of O1a*-M119 decreases gradually from mainland China towards the south, although a second, somewhat lower peak of diversity is observed around the Moluccas. The MJ network of O1a*-M119 (Figure 4) clearly differentiates TwMtA and TwPlt from Indonesia (Indonesian STR haplotypes are all included in the lower left reticula- tion of the O1a*-M119 network), whereas most Filipinos O1a*-M119 haplotypes are shared or very similar to those found among TwMtA and TwPlt. Age estimates based on the amount of molecular variation for O1a*-M119 were higher among TwMtA (19.96 ± 5.93 Kya) and the Philippines (20.36 ± 5.65 Kya) than in Indonesia (4.14 ± 2.67 Kya) (Table 2 and Additional file 1: Table S3). O1a1*-P203, derived from O1a*-M119 (Figure 2 and Additional file 1: Table S2), is the most common haplogroup among the northern TwMtA and the Tsou (90%), while among southern TwMtA it represents about half of the Y chromosomes observed. It is also quite common in several TwPlt (e.g. Pazeh, Ketagalan and Siraya in various locations), but less frequent in the Philippines (13.7%) and Indonesia (16.3%), although it is observed in 36% of the Kalimantan in Borneo. It is also commonly seen in Han (22% and 14.2% in Fujian and TwHan, respectively), but uncommon in Thailand (2.7%), and was not observed in the Vietnamese (Hanoi) and the Akha. Accordingly, the contour map of the frequency variation of O1a1*-P203 shows two main locations of high frequency, in Taiwan and southwestern Borneo, and decreasing frequencies from these locations towards the Philippines, whereas the contour map of its internal diversity is more complex (Figure 3). The O1a1*-P203 MJ network has a marked star-like shape, with a central frequent haplotype detected throughout mainland and insular SEA (Figure 4). Interestingly, haplotypes observed in the Atayal branch off from this central node to form a distinct network, suggesting a founding event and a period of isolation in this population. The intriguing observation of Tsou O1a1*-P203 haplotypes deriving from those of Atayal evidenced by the seven STRs MJ network is no longer sustained when using more than 13 Y-STRs, as the two groups then split at an earlier founding level (13 Y-STR MJ network, data not shown). Age estimates of O1a1*-P203 in TwMtA (16.3 ± 5.9 Kya) and the Philippines (16.05 ± 3.6) were higher than estimates obtained for TwHan, Fujian or Western Indonesia (8.59 ± 3.3, 11.6 ± 6.1 and 7.28 ± 4.0, respectively) (Table 2 and Additional file 1: Table S3). Haplogroup O1a1a-M101 [46] is a para-group of O1a1*-P203 (Figure 2). It was tested on all O1a1*-P203 in our dataset but was not seen. Haplogroup O1a2-M50 (or O1a2-M110, Figure 2 and Additional file 1: Table S2), a sister haplogroup of O1a1*-P203, is frequently observed among southern TwMtA (from 18% to 28%) and in the Bunun (61%), variable in the Philippines (from 3% to 16.7%), but rather rare in Western Indonesia (Java and Sumatra, ~5.3%). Note however that a high frequency of this haplotype has been reported in South Nias (~80%), an island of the Indian Ocean in front of Sumatra [26,63,64]. O1a2-M50 was not seen in our dataset in the Yami and was rare in TwHan, Fujian, Thailand, Vietnam and Malaysia. However, it has been reported as prevalent among a few Daic-speaking groups from southern China and Hainan (3% to 25%) [26,65], thus explaining the decrease in frequency and molecular variation of O1a2-M50 from Taiwan to the Philippines and Indonesia (Figure 3). Age estimates of diversity of this haplogroup decrease from Taiwan (17.54 ± 3.56 Kya) to the Philippines (12.5 ± 5.9 Kya) and Western Indonesia (7.06 ± 2.63 Kya) (Table 2 and Additional file 1: Table S3). The star-like shape of the O1a2-M50 MJ network places in the central nodes the STR haplotypes observed among TwMtA, TwPlt, Filipinos and Indonesians (Figure 4). We also notice that the haplotype continuity observed along the Taiwan-Philippines-Indonesian pathway can be traced further away toward Madagascar and the Solomon Islands [7,66,67]. Haplogroup O2 is comprised of two derived branches (Figure 2). The first branch includes subtypes O2*-P31 observed in Han (4%), TwPlt (5.7%), the Philippines (Luzon, 4.7%) and Indonesia (3.25%), and O2a*-PK4, O2a1*-M95 and O2a1a-M88 mostly seen among ...
Context 5
... 11.6 ± 6.1 and 7.28 ± 4.0, respectively) (Table 2 and Additional file 1: Table S3). Haplogroup O1a1a-M101 [46] is a para-group of O1a1*-P203 (Figure 2). It was tested on all O1a1*-P203 in our dataset but was not seen. Haplogroup O1a2-M50 (or O1a2-M110, Figure 2 and Additional file 1: Table S2), a sister haplogroup of O1a1*-P203, is frequently observed among southern TwMtA (from 18% to 28%) and in the Bunun (61%), variable in the Philippines (from 3% to 16.7%), but rather rare in Western Indonesia (Java and Sumatra, ~5.3%). Note however that a high frequency of this haplotype has been reported in South Nias (~80%), an island of the Indian Ocean in front of Sumatra [26,63,64]. O1a2-M50 was not seen in our dataset in the Yami and was rare in TwHan, Fujian, Thailand, Vietnam and Malaysia. However, it has been reported as prevalent among a few Daic-speaking groups from southern China and Hainan (3% to 25%) [26,65], thus explaining the decrease in frequency and molecular variation of O1a2-M50 from Taiwan to the Philippines and Indonesia (Figure 3). Age estimates of diversity of this haplogroup decrease from Taiwan (17.54 ± 3.56 Kya) to the Philippines (12.5 ± 5.9 Kya) and Western Indonesia (7.06 ± 2.63 Kya) (Table 2 and Additional file 1: Table S3). The star-like shape of the O1a2-M50 MJ network places in the central nodes the STR haplotypes observed among TwMtA, TwPlt, Filipinos and Indonesians (Figure 4). We also notice that the haplotype continuity observed along the Taiwan-Philippines-Indonesian pathway can be traced further away toward Madagascar and the Solomon Islands [7,66,67]. Haplogroup O2 is comprised of two derived branches (Figure 2). The first branch includes subtypes O2*-P31 observed in Han (4%), TwPlt (5.7%), the Philippines (Luzon, 4.7%) and Indonesia (3.25%), and O2a*-PK4, O2a1*-M95 and O2a1a-M88 mostly seen among groups speaking Daic languages in South China and Indochina, and in Indonesia (Additional file 1: Table S2). The second O2 branch includes subtypes of O2b-SRY 465 that are only found among Korean, Japanese and northern Chinese [11,68]. The distribution of the recently defined O2a*-PK4 [42] was reanalyzed by two separate laboratories (P. Underhill, Stanford University School of Medicine, CA, USA, personal communication and this study) using a total of 105 individuals (data not shown). The A to T transversion of PK4 [8,42] was confirmed to be ancestral to the M95 SNP (Figure 2). In our dataset, haplogroups O2a1-M95 and O2a1a-M88 occurred principally in Indochina (~20% and 20% for O2a1-M95 and O2a1a-M88, respectively, in Thailand, and 8% and 25% in Vietnam) and Indonesia (29% and 3%), and had a scattered distribution between Fujian, TwHan and TwPlt, with rather low frequencies. Among the TwMtA, the Yami (10% and 3.33% for O2a1-M95 and O2a1a-M88, respectively) and Bunun (0% and 37.5%) were the only Mountain tribes bearing haplogroup O2. While a study reported the presence of O2a1-M95 and O2a1a-M88 in Mindanao [7], only O2a1a-M88 (3%) was seen in our Philippines dataset (6.45% in Visayas, 3.42% on average for all the Philippines). We notice that the most frequent O2a1a-M88 Y-STR haplotype in the Bunun is also observed in China, among Daic-speaking populations, in Indochina and in Indonesia, but not among Solomon islanders, TwPlt or Han (Figure 4). In fact, Bunun O2a1a-M88 lineages do not belong to the MJ branch observed in the Solomon Islands or Madagascar. Eleven O3 lineages, out of the 19 described by Karafet et al. [8,26,39] were observed in this study (Figure 2 and Additional file 1: Table S2). The most widely distributed sub-haplogroups of O3 in our dataset are O3a1c*-IMS- JST002611, O3a2*-P201, O3a2b*-M7 and O3a2c*-P164. A rather high prevalence of haplogroup O3a1c*-IMS- JST002611 is observed in Fujian Han (~25%) and in the Taiwanese Minnan and Hakka (13% and 21%), as well as among the TwPlt (12%), whereas it occurs at an extremely low frequency in TwMtA (it was only observed in the Yami, at 3%), in the Philippines (0.8%) and in Indonesia (2%) (Figure 2 and Additional file 1: Table S2). Its high frequency among the Ivatan (25%) of the Batan archipelago north of the Philippines is however associated with a low age estimate (0.9 Kya) (Table 2 and Additional file 1: Table S3). The MJ network of this haplogroup is characterized by the presence of Han Y-chromosomes at most founding nodes (Figure 4). Accordingly, a location of high frequency of haplogroup O3a1c*-IMS-JST002611 emerges from the contour map, centered in the southeastern Chinese coast, with no clear clinal pattern of frequency variation across SEA (Figure 3). The contour map of O3a1c*-IMS-JST002611 diversity displays a high peak in Indonesia, but this is due to a few (only 4), molecularly distant STR-haplotypes which translates to an older but less precise age estimate (28.47 ± 13.76 Kya, Table 2 and Additional file 1: Table S3). Other commonly observed haplogroups in the O3 series are the paragroups of the O3a2*-P201 branch (Figure 2 and Additional file 1: Table S2). Note that mutation P201 has not been tested in the Daic-speaking groups of China and Hainan [27], in which the resolution of O3 * included O3*, O3a*, O3a1*, O3a1b, O3a1c, O3a2*, O3a2a, O3a2c*, O3a3 and O3a4 [42]. Here O3a2*-P201 did not exceed 2% in any pooled population group of our data set, and reached 4.35% in the Puyuma. These low frequencies and the high age estimates seen in Taiwan (i.e. Puyuma) (15 ± 3 Kya) and Western Indonesia (17 Kya ± 2 Kya; Additional file 1: Table S3) hint late to recent gene flow. The Kriging contour maps suggest two paths linking the patterns of diversity of O3a2*-P201, namely from mainland SEA towards Taiwan and from mainland SEA to Indonesia through the Indochinese peninsula (Figure 3). However, these latter results must be viewed with caution given the general low frequency of O3a2*-P201 over the whole region. Haplogroup O3a2b*-M7 is uncommon in the Fujian Han (less than 2%), in Taiwan (only observed at low frequencies in TwHan and TwPlt, not observed among TwMtA) and the Philippines, but was more frequent in Indonesia (8.54%) (Figure 2 and Additional file 1: Table S2). In this study we observe it also in Indochina (12.5% and 6.7% in Vietnam and Thailand, respectively), as well as in Malaysia (although this latter region is represented by a sample of only 8 Y chromosomes). Age estimates of O3a2b*-M7 diversity for Indonesia (22.18 ± 6.27 Kya) and TwPlt (16.56 ± 4.67) are somewhat older than that for Thailand (12.42 ± 3.45) (Additional file 1: Table S3). However, as already noted, these estimates are based on the assumption that no diversity is introduced in a clade by gene flow between populations. However, the scattered and generally derived location of Indonesian, TwPlt and Filipino haplotypes in the O3a2b*-M7 MJ network is consistent with repeated introductions of these haplotypes by gene flow from the mainland into the islands (Figure 4). Haplogroup O3a2c*-P164 is frequently observed among TwMtA on the east coast of Taiwan (Amis 35.9%, Puyuma 13%, Taroko 5%), in the Philippines (from 8% to 50%) and in western Indonesia (17.8%, Kalimantan 20%, and Sulawesi 11.8%), while it is rather uncommon in Indochina and among the Han (Figures 2 and 3, and Additional file 1: Table S2). Interestingly, the star-like MJ network of O3a2c*-P164 shows core nodes mostly comprised of Han, Korean/Japanese and Indochinese groups (Figure 3) and radiating sectors mainly comprised of either Korean/ Japanese and Tibet, or Indochinese, or delineating TwMtA. This structure supports the dispersal paths put forward in the contour map (Figure 4). In turn, derived haplogroups O3a2c1*-M134 and O3a2c1a-M133 are rarely seen among TwMtA. They often occur together and are more frequent among Han and TwPlt as well as in Sumatra, Borneo and the Visayas in the Philippines. Age estimates of the diversity of these two haplogroups all point to the early Neolithic period (i.e. < 10,000 Kya), with the notable exception of estimates for the Han groups (Table 2 and Additional file 1: Table S3). Accordingly, the contour maps of O3a2c1*-M134 display a general location of higher frequency and diversity related to Han populations, in mainland SEA (Figure 3). Finally, haplogroups C, D, F*, H, K*, N, P*, Q and R haplogroups were observed at low frequency in the region, with patchy distributions (Figure 2 and Additional file 1: Table S2). The plot of the multidimensional scaling (MDS) analysis of pairwise Reynolds genetic distances obtained from high definition Y-SNP haplogroup frequencies in our dataset is shown in Figure 5. Most population samples are located in the center of the plot, with Austronesian speaking groups from Indonesia and the Philippines surrounded by populations from Vietnam, Thailand and Malaysia (differentiating towards the upper-left part of the plot), most TwMtA differentiating towards the bottom- right, and the Han, both from the mainland and from Taiwan, differentiating towards the upper-left along the x axis. The heavily sinisized TwPlt surround Han populations with Yulin and Papora on the left part of the Han, and the Pazeh and Siraya (from Tainan, Pingtung, and Hwalian) getting close to southern TwMtA and most likely representing less sinisized groups than other TwPlt. A relationship with the frequencies of O clades in these populations (Additional file 1: Figures S1 and S2) is evidenced by the fact that O2 haplogroups prevail in Indochinese populations located in the upper-left quad- rant of the MDS plot, whereas O1 haplogroups become more and more frequent in Indonesian, Filipino, south TwMtA and north TwMtA populations, which differentiate towards the bottom of the plot. Bunun, Akha, Atayal and Tsou are clearly four distinct outliers in the plot, due to their particularly low diversity. Indeed, only 3 haplogroups were observed among 56 Bunun chromosomes, of which O1a2-M50 at 61% and O2a1a-M88 at 37.5%, and this translates in a gene diversity level ...
Context 6
... In brief, PCR products were mixed with GeneScan 500LIZ (Applied Biosystems) as internal size standard and analyzed by capillary electrophoresis with an ABI Prism 310 genetic analyzer (Applied Biosystems) in the mode of standard fragment analysis protocol. Genetyper 2.5.2 (Applied Biosystems) was used for allele scoring. For all statistical and network analyses, we used data from DYS389II by subtracting DYS389I from DYS389II [46]. The Y-chomosome SNP dataset was used to obtain frequency distributions of haplogroups (clades and sub-clades) in the population samples by mere counting, and the unbiased gene diversity index, h , and its standard error were calculated using the formulas given by Nei [47]. Contour maps of interpolated spatial frequency variations of the most relevant clades in East Asia were constructed by applying the Kriging algorithm in Surfer 8.0 (Golden software). Similarly, the internal diversity of each relevant Y-SNP clade was estimated on the basis of its STR variation by computing the average variance in repeat size over STR loci (the rho statistic), following the method of Zhivotovsky et al. [48], and the spatial variation of this statistic was also interpolated with Surfer 8.0. Thirteen Y-STR loci (DYS19, DYS389 I/II, DYS390, DYS391, DYS392, DYS393, DYS385a/b, DYS437, DY438, DYS439, DYS635 and Y GATA-H4) were used to estimate the age of the variation within each SNP haplogroup following the modified coalescence method of Zhivotovsky et al. [48-50], assuming a generation time of 25 years and a single mutation rate of 0.00069 per locus per generation. However, the results were very similar to those obtained only with the seven most commonly used STRs in the data retrieved from the literature. Accordingly, the same seven Y-STR loci, or only five of them in the worst cases (noted in italic here after), were used to accommodate the literature data (DYS19, DYS389 I , DYS389 II, DYS390, DYS391, DYS392 , and DYS393 ). Age estimates of STR variation of haplogroups comprised less than 10 individuals were also calculated but results are to be considered with caution. Gene contribution estimates between populations were inferred by two methods; firstly the coalescent approach of Admix version 2.0, which infers contributions from parental populations according to STR haplotype frequencies considering each haplotype as an allele of the same locus [51,52], and secondly using the analysis of shared STR lineages (LS) between populations [53] to infer those contributions as well as to determine the unshared gene portion in the hybrid population. The Fujian and Taiwan Han samples were pooled so as to constitute a putative Han parental contributor and a pool of most samples of Austronesian speaking groups, namely TwMtA, Filipinos and Western Indonesians (i.e. all samples from Borneo, Sumatra and Java) as the other putative parental contributor. Seven STRs were used in these analyses (i.e. DYS19, DYS389I, DYS389II, DYS390, DYS391, DYS392 and DYS393. Both the SNP and STR datasets were used to perform AMOVA analyses of population structure using the Arlequin 3.5.1.2 software [54]. Multidimensional scaling (MDS) analysis was performed to represent patterns of genetic relationships between all groups in our data set. A Reynolds distance matrix was obtained from frequency distributions of SNP haplogroups with Arlequin, and used as input for MDS analysis. MDS plots were constructed using XLSTAT software version 7.5.2 [55]. Median joining (MJ) networks of Y-STR haplotypes (defined by DYS19, DYS389I, DYS389II, DYS390, DYS391, DYS392 and DYS393) for relevant SNP haplogroups and sub-haplogroups of the O clade were constructed from reduced median joining networks using the NETWORK 4.1.0.6 software [56]. We used a reduction of one and locus specific weights based on the relative mutation rates of the following Y-STR loci: DYS19 (weight of 5), DYS389I (4), DYS389II (3), DYS390 (2), DYS391 (2), DYS392 (20), and DYS393 (20) [57-60]. The frequency distribution of Y-chromosome SNP haplogroups detected in Taiwan, ISEA and Indochina is shown in Figure 2 and reported in detail in Additional file 1: Table S2, and are summarized is Additional file 1: Figures S1 and S2. Additional file 1: Figure S3 displays the variation of diversity measures according to latitude among TwMtA. The interpolation contour maps resulting from applying the Kriging method both to the frequency distributions of Y-SNP clades and their internal STR diversity are shown side-by-side in Figure 3. Forty seven out of the 81 genotyped Y-SNPs were observed in the derived state [61], thus defining 47 haplogroups observed in our samples, that belong to major clades C, D, F*, G, H, J, K*, L, N, O, P*, Q and R. MJ networks of major Y-SNP clades in SEA are shown in Figure 4. Age estimates of the diversity of these clades, based on the assumption that mutations in haplotypes accumulate in situ (i.e. no gene flow), are reported in Table 2 and Additional file 1: Table S3. The most prominent clade in Taiwan, O-M175 as a whole, represents almost 90% of Y chromosomes among the TwHan, about 95% among the TwPlt and more than 99% among the TwMtA (Additional file 1: Figures S1 and S2). Only one representative of the basal O*/O1*- M175 (xM119, P31, M122) was seen in each of the Luzon (Philippines), Fujian and TwPlt samples (Figure 2 and Additional file 1: Table S2). All other haplogroups of the O clade were observed at the derived state for the M119, P31 and M122 markers (Figure 2, Additional file 1: Table S2, and Additional file 1: Figures S1 and S2). Haplogroup O1a*-M119 (Figure 2 and Additional file 1: Table S2) is seen throughout Batan (42%), the Philippines (4% to 33%) and Indonesia (4% to 18%). It has a patchy distribution among TwMtA (3-33%) and only some southern TwMtA show frequencies greater than 10% (i.e. Puyuma, Paiwan and Yami). O1a*-M119 was not observed in our Amis sample, neither in the Bunun, Saisiyat and Thao and has a low frequency among TwHan (1.4% to 2%). O1a*-M119 was neither observed outside Taiwan, among the Kalimantan in Borneo nor in Sumatra. However, the contour map interpolation (which includes published data, see Additional file 1: Table S1) indicates its presence in Western Sumatra, towards the Indian Ocean (Figure 3). The internal STR diversity of O1a*-M119 decreases gradually from mainland China towards the south, although a second, somewhat lower peak of diversity is observed around the Moluccas. The MJ network of O1a*-M119 (Figure 4) clearly differentiates TwMtA and TwPlt from Indonesia (Indonesian STR haplotypes are all included in the lower left reticula- tion of the O1a*-M119 network), whereas most Filipinos O1a*-M119 haplotypes are shared or very similar to those found among TwMtA and TwPlt. Age estimates based on the amount of molecular variation for O1a*-M119 were higher among TwMtA (19.96 ± 5.93 Kya) and the Philippines (20.36 ± 5.65 Kya) than in Indonesia (4.14 ± 2.67 Kya) (Table 2 and Additional file 1: Table S3). O1a1*-P203, derived from O1a*-M119 (Figure 2 and Additional file 1: Table S2), is the most common haplogroup among the northern TwMtA and the Tsou (90%), while among southern TwMtA it represents about half of the Y chromosomes observed. It is also quite common in several TwPlt (e.g. Pazeh, Ketagalan and Siraya in various locations), but less frequent in the Philippines (13.7%) and Indonesia (16.3%), although it is observed in 36% of the Kalimantan in Borneo. It is also commonly seen in Han (22% and 14.2% in Fujian and TwHan, respectively), but uncommon in Thailand (2.7%), and was not observed in the Vietnamese (Hanoi) and the Akha. Accordingly, the contour map of the frequency variation of O1a1*-P203 shows two main locations of high frequency, in Taiwan and southwestern Borneo, and decreasing frequencies from these locations towards the Philippines, whereas the contour map of its internal diversity is more complex (Figure 3). The O1a1*-P203 MJ network has a marked star-like shape, with a central frequent haplotype detected throughout mainland and insular SEA (Figure 4). Interestingly, haplotypes observed in the Atayal branch off from this central node to form a distinct network, suggesting a founding event and a period of isolation in this population. The intriguing observation of Tsou O1a1*-P203 haplotypes deriving from those of Atayal evidenced by the seven STRs MJ network is no longer sustained when using more than 13 Y-STRs, as the two groups then split at an earlier founding level (13 Y-STR MJ network, data not shown). Age estimates of O1a1*-P203 in TwMtA (16.3 ± 5.9 Kya) and the Philippines (16.05 ± 3.6) were higher than estimates obtained for TwHan, Fujian or Western Indonesia (8.59 ± 3.3, 11.6 ± 6.1 and 7.28 ± 4.0, respectively) (Table 2 and Additional file 1: Table S3). Haplogroup O1a1a-M101 [46] is a para-group of O1a1*-P203 (Figure 2). It was tested on all O1a1*-P203 in our dataset but was not seen. Haplogroup O1a2-M50 (or O1a2-M110, Figure 2 and Additional file 1: Table S2), a sister haplogroup of O1a1*-P203, is frequently ...
Context 7
... TwPlt (e.g. Pazeh, Ketagalan and Siraya in various locations), but less frequent in the Philippines (13.7%) and Indonesia (16.3%), although it is observed in 36% of the Kalimantan in Borneo. It is also commonly seen in Han (22% and 14.2% in Fujian and TwHan, respectively), but uncommon in Thailand (2.7%), and was not observed in the Vietnamese (Hanoi) and the Akha. Accordingly, the contour map of the frequency variation of O1a1*-P203 shows two main locations of high frequency, in Taiwan and southwestern Borneo, and decreasing frequencies from these locations towards the Philippines, whereas the contour map of its internal diversity is more complex (Figure 3). The O1a1*-P203 MJ network has a marked star-like shape, with a central frequent haplotype detected throughout mainland and insular SEA (Figure 4). Interestingly, haplotypes observed in the Atayal branch off from this central node to form a distinct network, suggesting a founding event and a period of isolation in this population. The intriguing observation of Tsou O1a1*-P203 haplotypes deriving from those of Atayal evidenced by the seven STRs MJ network is no longer sustained when using more than 13 Y-STRs, as the two groups then split at an earlier founding level (13 Y-STR MJ network, data not shown). Age estimates of O1a1*-P203 in TwMtA (16.3 ± 5.9 Kya) and the Philippines (16.05 ± 3.6) were higher than estimates obtained for TwHan, Fujian or Western Indonesia (8.59 ± 3.3, 11.6 ± 6.1 and 7.28 ± 4.0, respectively) (Table 2 and Additional file 1: Table S3). Haplogroup O1a1a-M101 [46] is a para-group of O1a1*-P203 (Figure 2). It was tested on all O1a1*-P203 in our dataset but was not seen. Haplogroup O1a2-M50 (or O1a2-M110, Figure 2 and Additional file 1: Table S2), a sister haplogroup of O1a1*-P203, is frequently observed among southern TwMtA (from 18% to 28%) and in the Bunun (61%), variable in the Philippines (from 3% to 16.7%), but rather rare in Western Indonesia (Java and Sumatra, ~5.3%). Note however that a high frequency of this haplotype has been reported in South Nias (~80%), an island of the Indian Ocean in front of Sumatra [26,63,64]. O1a2-M50 was not seen in our dataset in the Yami and was rare in TwHan, Fujian, Thailand, Vietnam and Malaysia. However, it has been reported as prevalent among a few Daic-speaking groups from southern China and Hainan (3% to 25%) [26,65], thus explaining the decrease in frequency and molecular variation of O1a2-M50 from Taiwan to the Philippines and Indonesia (Figure 3). Age estimates of diversity of this haplogroup decrease from Taiwan (17.54 ± 3.56 Kya) to the Philippines (12.5 ± 5.9 Kya) and Western Indonesia (7.06 ± 2.63 Kya) (Table 2 and Additional file 1: Table S3). The star-like shape of the O1a2-M50 MJ network places in the central nodes the STR haplotypes observed among TwMtA, TwPlt, Filipinos and Indonesians (Figure 4). We also notice that the haplotype continuity observed along the Taiwan-Philippines-Indonesian pathway can be traced further away toward Madagascar and the Solomon Islands [7,66,67]. Haplogroup O2 is comprised of two derived branches (Figure 2). The first branch includes subtypes O2*-P31 observed in Han (4%), TwPlt (5.7%), the Philippines (Luzon, 4.7%) and Indonesia (3.25%), and O2a*-PK4, O2a1*-M95 and O2a1a-M88 mostly seen among groups speaking Daic languages in South China and Indochina, and in Indonesia (Additional file 1: Table S2). The second O2 branch includes subtypes of O2b-SRY 465 that are only found among Korean, Japanese and northern Chinese [11,68]. The distribution of the recently defined O2a*-PK4 [42] was reanalyzed by two separate laboratories (P. Underhill, Stanford University School of Medicine, CA, USA, personal communication and this study) using a total of 105 individuals (data not shown). The A to T transversion of PK4 [8,42] was confirmed to be ancestral to the M95 SNP (Figure 2). In our dataset, haplogroups O2a1-M95 and O2a1a-M88 occurred principally in Indochina (~20% and 20% for O2a1-M95 and O2a1a-M88, respectively, in Thailand, and 8% and 25% in Vietnam) and Indonesia (29% and 3%), and had a scattered distribution between Fujian, TwHan and TwPlt, with rather low frequencies. Among the TwMtA, the Yami (10% and 3.33% for O2a1-M95 and O2a1a-M88, respectively) and Bunun (0% and 37.5%) were the only Mountain tribes bearing haplogroup O2. While a study reported the presence of O2a1-M95 and O2a1a-M88 in Mindanao [7], only O2a1a-M88 (3%) was seen in our Philippines dataset (6.45% in Visayas, 3.42% on average for all the Philippines). We notice that the most frequent O2a1a-M88 Y-STR haplotype in the Bunun is also observed in China, among Daic-speaking populations, in Indochina and in Indonesia, but not among Solomon islanders, TwPlt or Han (Figure 4). In fact, Bunun O2a1a-M88 lineages do not belong to the MJ branch observed in the Solomon Islands or Madagascar. Eleven O3 lineages, out of the 19 described by Karafet et al. [8,26,39] were observed in this study (Figure 2 and Additional file 1: Table S2). The most widely distributed sub-haplogroups of O3 in our dataset are O3a1c*-IMS- JST002611, O3a2*-P201, O3a2b*-M7 and O3a2c*-P164. A rather high prevalence of haplogroup O3a1c*-IMS- JST002611 is observed in Fujian Han (~25%) and in the Taiwanese Minnan and Hakka (13% and 21%), as well as among the TwPlt (12%), whereas it occurs at an extremely low frequency in TwMtA (it was only observed in the Yami, at 3%), in the Philippines (0.8%) and in Indonesia (2%) (Figure 2 and Additional file 1: Table S2). Its high frequency among the Ivatan (25%) of the Batan archipelago north of the Philippines is however associated with a low age estimate (0.9 Kya) (Table 2 and Additional file 1: Table S3). The MJ network of this haplogroup is characterized by the presence of Han Y-chromosomes at most founding nodes (Figure 4). Accordingly, a location of high frequency of haplogroup O3a1c*-IMS-JST002611 emerges from the contour map, centered in the southeastern Chinese coast, with no clear clinal pattern of frequency variation across SEA (Figure 3). The contour map of O3a1c*-IMS-JST002611 diversity displays a high peak in Indonesia, but this is due to a few (only 4), molecularly distant STR-haplotypes which translates to an older but less precise age estimate (28.47 ± 13.76 Kya, Table 2 and Additional file 1: Table S3). Other commonly observed haplogroups in the O3 series are the paragroups of the O3a2*-P201 branch (Figure 2 and Additional file 1: Table S2). Note that mutation P201 has not been tested in the Daic-speaking groups of China and Hainan [27], in which the resolution of O3 * included O3*, O3a*, O3a1*, O3a1b, O3a1c, O3a2*, O3a2a, O3a2c*, O3a3 and O3a4 [42]. Here O3a2*-P201 did not exceed 2% in any pooled population group of our data set, and reached 4.35% in the Puyuma. These low frequencies and the high age estimates seen in Taiwan (i.e. Puyuma) (15 ± 3 Kya) and Western Indonesia (17 Kya ± 2 Kya; Additional file 1: Table S3) hint late to recent gene flow. The Kriging contour maps suggest two paths linking the patterns of diversity of O3a2*-P201, namely from mainland SEA towards Taiwan and from mainland SEA to Indonesia through the Indochinese peninsula (Figure 3). However, these latter results must be viewed with caution given the general low frequency of O3a2*-P201 over the whole region. Haplogroup O3a2b*-M7 is uncommon in the Fujian Han (less than 2%), in Taiwan (only observed at low frequencies in TwHan and TwPlt, not observed among TwMtA) and the Philippines, but was more frequent in Indonesia (8.54%) (Figure 2 and Additional file 1: Table S2). In this study we observe it also in Indochina (12.5% and 6.7% in Vietnam and Thailand, respectively), as well as in Malaysia (although this latter region is represented by a sample of only 8 Y chromosomes). Age estimates of O3a2b*-M7 diversity for Indonesia (22.18 ± 6.27 Kya) and TwPlt (16.56 ± 4.67) are somewhat older than that for Thailand (12.42 ± 3.45) (Additional file 1: Table S3). However, as already noted, these estimates are based on the assumption that no diversity is introduced in a clade by gene flow between populations. However, the scattered and generally derived location of Indonesian, TwPlt and Filipino haplotypes in the O3a2b*-M7 MJ network is consistent with repeated introductions of these haplotypes by gene flow from the mainland into the islands (Figure 4). Haplogroup O3a2c*-P164 is frequently observed among TwMtA on the east coast of Taiwan (Amis 35.9%, Puyuma 13%, Taroko 5%), in the Philippines (from 8% to 50%) and in western Indonesia (17.8%, Kalimantan 20%, and Sulawesi 11.8%), while it is rather uncommon in Indochina and among the Han (Figures 2 and 3, and Additional file 1: Table S2). Interestingly, the star-like MJ network of O3a2c*-P164 shows core nodes mostly comprised of Han, Korean/Japanese and Indochinese groups (Figure 3) and radiating sectors mainly comprised of either Korean/ Japanese and Tibet, or Indochinese, or delineating TwMtA. This structure supports the dispersal paths put forward in the contour map (Figure 4). In turn, derived haplogroups O3a2c1*-M134 and O3a2c1a-M133 are rarely seen among TwMtA. They often occur together and are more frequent among Han and TwPlt as well as in Sumatra, Borneo and the Visayas in the Philippines. Age estimates of the diversity of these two haplogroups all point to the early Neolithic period (i.e. < 10,000 Kya), with the notable exception of estimates for the Han groups (Table 2 and Additional file 1: Table S3). Accordingly, the contour maps of O3a2c1*-M134 display a general location of higher frequency and diversity related to Han populations, in mainland SEA (Figure 3). Finally, haplogroups C, D, F*, H, K*, N, P*, Q and R haplogroups were observed at low frequency in the region, with patchy distributions (Figure 2 and Additional file 1: Table S2). The plot of the multidimensional scaling (MDS) analysis of pairwise Reynolds genetic distances obtained from ...
Context 8
... data, see Additional file 1: Table S1) indicates its presence in Western Sumatra, towards the Indian Ocean (Figure 3). The internal STR diversity of O1a*-M119 decreases gradually from mainland China towards the south, although a second, somewhat lower peak of diversity is observed around the Moluccas. The MJ network of O1a*-M119 (Figure 4) clearly differentiates TwMtA and TwPlt from Indonesia (Indonesian STR haplotypes are all included in the lower left reticula- tion of the O1a*-M119 network), whereas most Filipinos O1a*-M119 haplotypes are shared or very similar to those found among TwMtA and TwPlt. Age estimates based on the amount of molecular variation for O1a*-M119 were higher among TwMtA (19.96 ± 5.93 Kya) and the Philippines (20.36 ± 5.65 Kya) than in Indonesia (4.14 ± 2.67 Kya) (Table 2 and Additional file 1: Table S3). O1a1*-P203, derived from O1a*-M119 (Figure 2 and Additional file 1: Table S2), is the most common haplogroup among the northern TwMtA and the Tsou (90%), while among southern TwMtA it represents about half of the Y chromosomes observed. It is also quite common in several TwPlt (e.g. Pazeh, Ketagalan and Siraya in various locations), but less frequent in the Philippines (13.7%) and Indonesia (16.3%), although it is observed in 36% of the Kalimantan in Borneo. It is also commonly seen in Han (22% and 14.2% in Fujian and TwHan, respectively), but uncommon in Thailand (2.7%), and was not observed in the Vietnamese (Hanoi) and the Akha. Accordingly, the contour map of the frequency variation of O1a1*-P203 shows two main locations of high frequency, in Taiwan and southwestern Borneo, and decreasing frequencies from these locations towards the Philippines, whereas the contour map of its internal diversity is more complex (Figure 3). The O1a1*-P203 MJ network has a marked star-like shape, with a central frequent haplotype detected throughout mainland and insular SEA (Figure 4). Interestingly, haplotypes observed in the Atayal branch off from this central node to form a distinct network, suggesting a founding event and a period of isolation in this population. The intriguing observation of Tsou O1a1*-P203 haplotypes deriving from those of Atayal evidenced by the seven STRs MJ network is no longer sustained when using more than 13 Y-STRs, as the two groups then split at an earlier founding level (13 Y-STR MJ network, data not shown). Age estimates of O1a1*-P203 in TwMtA (16.3 ± 5.9 Kya) and the Philippines (16.05 ± 3.6) were higher than estimates obtained for TwHan, Fujian or Western Indonesia (8.59 ± 3.3, 11.6 ± 6.1 and 7.28 ± 4.0, respectively) (Table 2 and Additional file 1: Table S3). Haplogroup O1a1a-M101 [46] is a para-group of O1a1*-P203 (Figure 2). It was tested on all O1a1*-P203 in our dataset but was not seen. Haplogroup O1a2-M50 (or O1a2-M110, Figure 2 and Additional file 1: Table S2), a sister haplogroup of O1a1*-P203, is frequently observed among southern TwMtA (from 18% to 28%) and in the Bunun (61%), variable in the Philippines (from 3% to 16.7%), but rather rare in Western Indonesia (Java and Sumatra, ~5.3%). Note however that a high frequency of this haplotype has been reported in South Nias (~80%), an island of the Indian Ocean in front of Sumatra [26,63,64]. O1a2-M50 was not seen in our dataset in the Yami and was rare in TwHan, Fujian, Thailand, Vietnam and Malaysia. However, it has been reported as prevalent among a few Daic-speaking groups from southern China and Hainan (3% to 25%) [26,65], thus explaining the decrease in frequency and molecular variation of O1a2-M50 from Taiwan to the Philippines and Indonesia (Figure 3). Age estimates of diversity of this haplogroup decrease from Taiwan (17.54 ± 3.56 Kya) to the Philippines (12.5 ± 5.9 Kya) and Western Indonesia (7.06 ± 2.63 Kya) (Table 2 and Additional file 1: Table S3). The star-like shape of the O1a2-M50 MJ network places in the central nodes the STR haplotypes observed among TwMtA, TwPlt, Filipinos and Indonesians (Figure 4). We also notice that the haplotype continuity observed along the Taiwan-Philippines-Indonesian pathway can be traced further away toward Madagascar and the Solomon Islands [7,66,67]. Haplogroup O2 is comprised of two derived branches (Figure 2). The first branch includes subtypes O2*-P31 observed in Han (4%), TwPlt (5.7%), the Philippines (Luzon, 4.7%) and Indonesia (3.25%), and O2a*-PK4, O2a1*-M95 and O2a1a-M88 mostly seen among groups speaking Daic languages in South China and Indochina, and in Indonesia (Additional file 1: Table S2). The second O2 branch includes subtypes of O2b-SRY 465 that are only found among Korean, Japanese and northern Chinese [11,68]. The distribution of the recently defined O2a*-PK4 [42] was reanalyzed by two separate laboratories (P. Underhill, Stanford University School of Medicine, CA, USA, personal communication and this study) using a total of 105 individuals (data not shown). The A to T transversion of PK4 [8,42] was confirmed to be ancestral to the M95 SNP (Figure 2). In our dataset, haplogroups O2a1-M95 and O2a1a-M88 occurred principally in Indochina (~20% and 20% for O2a1-M95 and O2a1a-M88, respectively, in Thailand, and 8% and 25% in Vietnam) and Indonesia (29% and 3%), and had a scattered distribution between Fujian, TwHan and TwPlt, with rather low frequencies. Among the TwMtA, the Yami (10% and 3.33% for O2a1-M95 and O2a1a-M88, respectively) and Bunun (0% and 37.5%) were the only Mountain tribes bearing haplogroup O2. While a study reported the presence of O2a1-M95 and O2a1a-M88 in Mindanao [7], only O2a1a-M88 (3%) was seen in our Philippines dataset (6.45% in Visayas, 3.42% on average for all the Philippines). We notice that the most frequent O2a1a-M88 Y-STR haplotype in the Bunun is also observed in China, among Daic-speaking populations, in Indochina and in Indonesia, but not among Solomon islanders, TwPlt or Han (Figure 4). In fact, Bunun O2a1a-M88 lineages do not belong to the MJ branch observed in the Solomon Islands or Madagascar. Eleven O3 lineages, out of the 19 described by Karafet et al. [8,26,39] were observed in this study (Figure 2 and Additional file 1: Table S2). The most widely distributed sub-haplogroups of O3 in our dataset are O3a1c*-IMS- JST002611, O3a2*-P201, O3a2b*-M7 and O3a2c*-P164. A rather high prevalence of haplogroup O3a1c*-IMS- JST002611 is observed in Fujian Han (~25%) and in the Taiwanese Minnan and Hakka (13% and 21%), as well as among the TwPlt (12%), whereas it occurs at an extremely low frequency in TwMtA (it was only observed in the Yami, at 3%), in the Philippines (0.8%) and in Indonesia (2%) (Figure 2 and Additional file 1: Table S2). Its high frequency among the Ivatan (25%) of the Batan archipelago north of the Philippines is however associated with a low age estimate (0.9 Kya) (Table 2 and Additional file 1: Table S3). The MJ network of this haplogroup is characterized by the presence of Han Y-chromosomes at most founding nodes (Figure 4). Accordingly, a location of high frequency of haplogroup O3a1c*-IMS-JST002611 emerges from the contour map, centered in the southeastern Chinese coast, with no clear clinal pattern of frequency variation across SEA (Figure 3). The contour map of O3a1c*-IMS-JST002611 diversity displays a high peak in Indonesia, but this is due to a few (only 4), molecularly distant STR-haplotypes which translates to an older but less precise age estimate (28.47 ± 13.76 Kya, Table 2 and Additional file 1: Table S3). Other commonly observed haplogroups in the O3 series are the paragroups of the O3a2*-P201 branch (Figure 2 and Additional file 1: Table S2). Note that mutation P201 has not been tested in the Daic-speaking groups of China and Hainan [27], in which the resolution of O3 * included O3*, O3a*, O3a1*, O3a1b, O3a1c, O3a2*, O3a2a, O3a2c*, O3a3 and O3a4 [42]. Here O3a2*-P201 did not exceed 2% in any pooled population group of our data set, and reached 4.35% in the Puyuma. These low frequencies and the high age estimates seen in Taiwan (i.e. Puyuma) (15 ± 3 Kya) and Western Indonesia (17 Kya ± 2 Kya; Additional file 1: Table S3) hint late to recent gene flow. The Kriging contour maps suggest two paths linking the patterns of diversity of O3a2*-P201, namely from mainland SEA towards Taiwan and from mainland SEA to Indonesia through the Indochinese peninsula (Figure 3). However, these latter results must be viewed with caution given the general low frequency of O3a2*-P201 over the whole region. Haplogroup O3a2b*-M7 is uncommon in the Fujian Han (less than 2%), in Taiwan (only observed at low frequencies in TwHan and TwPlt, not observed among TwMtA) and the Philippines, but was more frequent in Indonesia (8.54%) (Figure 2 and Additional file 1: Table S2). In this study we observe it also in Indochina (12.5% and 6.7% in Vietnam and Thailand, respectively), as well as in Malaysia (although this latter region is represented by a sample of only 8 Y chromosomes). Age estimates of O3a2b*-M7 diversity for Indonesia (22.18 ± 6.27 Kya) and TwPlt (16.56 ± 4.67) are somewhat older than that for Thailand (12.42 ± 3.45) (Additional file 1: Table S3). However, as already noted, these estimates are based on the assumption that no diversity is introduced in a clade by gene flow between populations. However, the scattered and generally derived location of Indonesian, TwPlt and Filipino haplotypes in the O3a2b*-M7 MJ network is consistent with repeated introductions of these haplotypes by gene flow from the mainland into the islands (Figure 4). Haplogroup O3a2c*-P164 is frequently observed among TwMtA on the east coast of Taiwan (Amis 35.9%, Puyuma 13%, Taroko 5%), in the Philippines (from 8% to 50%) and in western Indonesia (17.8%, Kalimantan 20%, and Sulawesi 11.8%), while it is rather uncommon in Indochina and among the Han (Figures 2 and 3, and Additional file 1: Table S2). Interestingly, the star-like MJ network of O3a2c*-P164 shows core nodes mostly comprised of Han, Korean/Japanese and Indochinese groups ...
Context 9
... haplogroups of the O clade were observed at the derived state for the M119, P31 and M122 markers (Figure 2, Additional file 1: Table S2, and Additional file 1: Figures S1 and S2). Haplogroup O1a*-M119 (Figure 2 and Additional file 1: Table S2) is seen throughout Batan (42%), the Philippines (4% to 33%) and Indonesia (4% to 18%). It has a patchy distribution among TwMtA (3-33%) and only some southern TwMtA show frequencies greater than 10% (i.e. Puyuma, Paiwan and Yami). O1a*-M119 was not observed in our Amis sample, neither in the Bunun, Saisiyat and Thao and has a low frequency among TwHan (1.4% to 2%). O1a*-M119 was neither observed outside Taiwan, among the Kalimantan in Borneo nor in Sumatra. However, the contour map interpolation (which includes published data, see Additional file 1: Table S1) indicates its presence in Western Sumatra, towards the Indian Ocean (Figure 3). The internal STR diversity of O1a*-M119 decreases gradually from mainland China towards the south, although a second, somewhat lower peak of diversity is observed around the Moluccas. The MJ network of O1a*-M119 (Figure 4) clearly differentiates TwMtA and TwPlt from Indonesia (Indonesian STR haplotypes are all included in the lower left reticula- tion of the O1a*-M119 network), whereas most Filipinos O1a*-M119 haplotypes are shared or very similar to those found among TwMtA and TwPlt. Age estimates based on the amount of molecular variation for O1a*-M119 were higher among TwMtA (19.96 ± 5.93 Kya) and the Philippines (20.36 ± 5.65 Kya) than in Indonesia (4.14 ± 2.67 Kya) (Table 2 and Additional file 1: Table S3). O1a1*-P203, derived from O1a*-M119 (Figure 2 and Additional file 1: Table S2), is the most common haplogroup among the northern TwMtA and the Tsou (90%), while among southern TwMtA it represents about half of the Y chromosomes observed. It is also quite common in several TwPlt (e.g. Pazeh, Ketagalan and Siraya in various locations), but less frequent in the Philippines (13.7%) and Indonesia (16.3%), although it is observed in 36% of the Kalimantan in Borneo. It is also commonly seen in Han (22% and 14.2% in Fujian and TwHan, respectively), but uncommon in Thailand (2.7%), and was not observed in the Vietnamese (Hanoi) and the Akha. Accordingly, the contour map of the frequency variation of O1a1*-P203 shows two main locations of high frequency, in Taiwan and southwestern Borneo, and decreasing frequencies from these locations towards the Philippines, whereas the contour map of its internal diversity is more complex (Figure 3). The O1a1*-P203 MJ network has a marked star-like shape, with a central frequent haplotype detected throughout mainland and insular SEA (Figure 4). Interestingly, haplotypes observed in the Atayal branch off from this central node to form a distinct network, suggesting a founding event and a period of isolation in this population. The intriguing observation of Tsou O1a1*-P203 haplotypes deriving from those of Atayal evidenced by the seven STRs MJ network is no longer sustained when using more than 13 Y-STRs, as the two groups then split at an earlier founding level (13 Y-STR MJ network, data not shown). Age estimates of O1a1*-P203 in TwMtA (16.3 ± 5.9 Kya) and the Philippines (16.05 ± 3.6) were higher than estimates obtained for TwHan, Fujian or Western Indonesia (8.59 ± 3.3, 11.6 ± 6.1 and 7.28 ± 4.0, respectively) (Table 2 and Additional file 1: Table S3). Haplogroup O1a1a-M101 [46] is a para-group of O1a1*-P203 (Figure 2). It was tested on all O1a1*-P203 in our dataset but was not seen. Haplogroup O1a2-M50 (or O1a2-M110, Figure 2 and Additional file 1: Table S2), a sister haplogroup of O1a1*-P203, is frequently observed among southern TwMtA (from 18% to 28%) and in the Bunun (61%), variable in the Philippines (from 3% to 16.7%), but rather rare in Western Indonesia (Java and Sumatra, ~5.3%). Note however that a high frequency of this haplotype has been reported in South Nias (~80%), an island of the Indian Ocean in front of Sumatra [26,63,64]. O1a2-M50 was not seen in our dataset in the Yami and was rare in TwHan, Fujian, Thailand, Vietnam and Malaysia. However, it has been reported as prevalent among a few Daic-speaking groups from southern China and Hainan (3% to 25%) [26,65], thus explaining the decrease in frequency and molecular variation of O1a2-M50 from Taiwan to the Philippines and Indonesia (Figure 3). Age estimates of diversity of this haplogroup decrease from Taiwan (17.54 ± 3.56 Kya) to the Philippines (12.5 ± 5.9 Kya) and Western Indonesia (7.06 ± 2.63 Kya) (Table 2 and Additional file 1: Table S3). The star-like shape of the O1a2-M50 MJ network places in the central nodes the STR haplotypes observed among TwMtA, TwPlt, Filipinos and Indonesians (Figure 4). We also notice that the haplotype continuity observed along the Taiwan-Philippines-Indonesian pathway can be traced further away toward Madagascar and the Solomon Islands [7,66,67]. Haplogroup O2 is comprised of two derived branches (Figure 2). The first branch includes subtypes O2*-P31 observed in Han (4%), TwPlt (5.7%), the Philippines (Luzon, 4.7%) and Indonesia (3.25%), and O2a*-PK4, O2a1*-M95 and O2a1a-M88 mostly seen among groups speaking Daic languages in South China and Indochina, and in Indonesia (Additional file 1: Table S2). The second O2 branch includes subtypes of O2b-SRY 465 that are only found among Korean, Japanese and northern Chinese [11,68]. The distribution of the recently defined O2a*-PK4 [42] was reanalyzed by two separate laboratories (P. Underhill, Stanford University School of Medicine, CA, USA, personal communication and this study) using a total of 105 individuals (data not shown). The A to T transversion of PK4 [8,42] was confirmed to be ancestral to the M95 SNP (Figure 2). In our dataset, haplogroups O2a1-M95 and O2a1a-M88 occurred principally in Indochina (~20% and 20% for O2a1-M95 and O2a1a-M88, respectively, in Thailand, and 8% and 25% in Vietnam) and Indonesia (29% and 3%), and had a scattered distribution between Fujian, TwHan and TwPlt, with rather low frequencies. Among the TwMtA, the Yami (10% and 3.33% for O2a1-M95 and O2a1a-M88, respectively) and Bunun (0% and 37.5%) were the only Mountain tribes bearing haplogroup O2. While a study reported the presence of O2a1-M95 and O2a1a-M88 in Mindanao [7], only O2a1a-M88 (3%) was seen in our Philippines dataset (6.45% in Visayas, 3.42% on average for all the Philippines). We notice that the most frequent O2a1a-M88 Y-STR haplotype in the Bunun is also observed in China, among Daic-speaking populations, in Indochina and in Indonesia, but not among Solomon islanders, TwPlt or Han (Figure 4). In fact, Bunun O2a1a-M88 lineages do not belong to the MJ branch observed in the Solomon Islands or Madagascar. Eleven O3 lineages, out of the 19 described by Karafet et al. [8,26,39] were observed in this study (Figure 2 and Additional file 1: Table S2). The most widely distributed sub-haplogroups of O3 in our dataset are O3a1c*-IMS- JST002611, O3a2*-P201, O3a2b*-M7 and O3a2c*-P164. A rather high prevalence of haplogroup O3a1c*-IMS- JST002611 is observed in Fujian Han (~25%) and in the Taiwanese Minnan and Hakka (13% and 21%), as well as among the TwPlt (12%), whereas it occurs at an extremely low frequency in TwMtA (it was only observed in the Yami, at 3%), in the Philippines (0.8%) and in Indonesia (2%) (Figure 2 and Additional file 1: Table S2). Its high frequency among the Ivatan (25%) of the Batan archipelago north of the Philippines is however associated with a low age estimate (0.9 Kya) (Table 2 and Additional file 1: Table S3). The MJ network of this haplogroup is characterized by the presence of Han Y-chromosomes at most founding nodes (Figure 4). Accordingly, a location of high frequency of haplogroup O3a1c*-IMS-JST002611 emerges from the contour map, centered in the southeastern Chinese coast, with no clear clinal pattern of frequency variation across SEA (Figure 3). The contour map of O3a1c*-IMS-JST002611 diversity displays a high peak in Indonesia, but this is due to a few (only 4), molecularly distant STR-haplotypes which translates to an older but less precise age estimate (28.47 ± 13.76 Kya, Table 2 and Additional file 1: Table S3). Other commonly observed haplogroups in the O3 series are the paragroups of the O3a2*-P201 branch (Figure 2 and Additional file 1: Table S2). Note that mutation P201 has not been tested in the Daic-speaking groups of China and Hainan [27], in which the resolution of O3 * included O3*, O3a*, O3a1*, O3a1b, O3a1c, O3a2*, O3a2a, O3a2c*, O3a3 and O3a4 [42]. Here O3a2*-P201 did not exceed 2% in any pooled population group of our data set, and reached 4.35% in the Puyuma. These low frequencies and the high age estimates seen in Taiwan (i.e. Puyuma) (15 ± 3 Kya) and Western Indonesia (17 Kya ± 2 Kya; Additional file 1: Table S3) hint late to recent gene flow. The Kriging contour maps suggest two paths linking the patterns of diversity of O3a2*-P201, namely from mainland SEA towards Taiwan and from mainland SEA to Indonesia through the Indochinese peninsula (Figure 3). However, these latter results must be viewed with caution given the general low frequency of O3a2*-P201 over the whole region. Haplogroup O3a2b*-M7 is uncommon in the Fujian Han (less than 2%), in Taiwan (only observed at low frequencies in TwHan and TwPlt, not observed among TwMtA) and the Philippines, but was more frequent in Indonesia (8.54%) (Figure 2 and Additional file 1: Table S2). In this study we observe it also in Indochina (12.5% and 6.7% in Vietnam and Thailand, respectively), as well as in Malaysia (although this latter region is represented by a sample of only 8 Y chromosomes). Age estimates of O3a2b*-M7 diversity for Indonesia (22.18 ± 6.27 Kya) and TwPlt (16.56 ± 4.67) are somewhat older than that for Thailand (12.42 ± 3.45) (Additional file 1: Table S3). However, as already noted, these estimates are based on the assumption that no diversity is ...
Context 10
... we used data from DYS389II by subtracting DYS389I from DYS389II [46]. The Y-chomosome SNP dataset was used to obtain frequency distributions of haplogroups (clades and sub-clades) in the population samples by mere counting, and the unbiased gene diversity index, h , and its standard error were calculated using the formulas given by Nei [47]. Contour maps of interpolated spatial frequency variations of the most relevant clades in East Asia were constructed by applying the Kriging algorithm in Surfer 8.0 (Golden software). Similarly, the internal diversity of each relevant Y-SNP clade was estimated on the basis of its STR variation by computing the average variance in repeat size over STR loci (the rho statistic), following the method of Zhivotovsky et al. [48], and the spatial variation of this statistic was also interpolated with Surfer 8.0. Thirteen Y-STR loci (DYS19, DYS389 I/II, DYS390, DYS391, DYS392, DYS393, DYS385a/b, DYS437, DY438, DYS439, DYS635 and Y GATA-H4) were used to estimate the age of the variation within each SNP haplogroup following the modified coalescence method of Zhivotovsky et al. [48-50], assuming a generation time of 25 years and a single mutation rate of 0.00069 per locus per generation. However, the results were very similar to those obtained only with the seven most commonly used STRs in the data retrieved from the literature. Accordingly, the same seven Y-STR loci, or only five of them in the worst cases (noted in italic here after), were used to accommodate the literature data (DYS19, DYS389 I , DYS389 II, DYS390, DYS391, DYS392 , and DYS393 ). Age estimates of STR variation of haplogroups comprised less than 10 individuals were also calculated but results are to be considered with caution. Gene contribution estimates between populations were inferred by two methods; firstly the coalescent approach of Admix version 2.0, which infers contributions from parental populations according to STR haplotype frequencies considering each haplotype as an allele of the same locus [51,52], and secondly using the analysis of shared STR lineages (LS) between populations [53] to infer those contributions as well as to determine the unshared gene portion in the hybrid population. The Fujian and Taiwan Han samples were pooled so as to constitute a putative Han parental contributor and a pool of most samples of Austronesian speaking groups, namely TwMtA, Filipinos and Western Indonesians (i.e. all samples from Borneo, Sumatra and Java) as the other putative parental contributor. Seven STRs were used in these analyses (i.e. DYS19, DYS389I, DYS389II, DYS390, DYS391, DYS392 and DYS393. Both the SNP and STR datasets were used to perform AMOVA analyses of population structure using the Arlequin 3.5.1.2 software [54]. Multidimensional scaling (MDS) analysis was performed to represent patterns of genetic relationships between all groups in our data set. A Reynolds distance matrix was obtained from frequency distributions of SNP haplogroups with Arlequin, and used as input for MDS analysis. MDS plots were constructed using XLSTAT software version 7.5.2 [55]. Median joining (MJ) networks of Y-STR haplotypes (defined by DYS19, DYS389I, DYS389II, DYS390, DYS391, DYS392 and DYS393) for relevant SNP haplogroups and sub-haplogroups of the O clade were constructed from reduced median joining networks using the NETWORK 4.1.0.6 software [56]. We used a reduction of one and locus specific weights based on the relative mutation rates of the following Y-STR loci: DYS19 (weight of 5), DYS389I (4), DYS389II (3), DYS390 (2), DYS391 (2), DYS392 (20), and DYS393 (20) [57-60]. The frequency distribution of Y-chromosome SNP haplogroups detected in Taiwan, ISEA and Indochina is shown in Figure 2 and reported in detail in Additional file 1: Table S2, and are summarized is Additional file 1: Figures S1 and S2. Additional file 1: Figure S3 displays the variation of diversity measures according to latitude among TwMtA. The interpolation contour maps resulting from applying the Kriging method both to the frequency distributions of Y-SNP clades and their internal STR diversity are shown side-by-side in Figure 3. Forty seven out of the 81 genotyped Y-SNPs were observed in the derived state [61], thus defining 47 haplogroups observed in our samples, that belong to major clades C, D, F*, G, H, J, K*, L, N, O, P*, Q and R. MJ networks of major Y-SNP clades in SEA are shown in Figure 4. Age estimates of the diversity of these clades, based on the assumption that mutations in haplotypes accumulate in situ (i.e. no gene flow), are reported in Table 2 and Additional file 1: Table S3. The most prominent clade in Taiwan, O-M175 as a whole, represents almost 90% of Y chromosomes among the TwHan, about 95% among the TwPlt and more than 99% among the TwMtA (Additional file 1: Figures S1 and S2). Only one representative of the basal O*/O1*- M175 (xM119, P31, M122) was seen in each of the Luzon (Philippines), Fujian and TwPlt samples (Figure 2 and Additional file 1: Table S2). All other haplogroups of the O clade were observed at the derived state for the M119, P31 and M122 markers (Figure 2, Additional file 1: Table S2, and Additional file 1: Figures S1 and S2). Haplogroup O1a*-M119 (Figure 2 and Additional file 1: Table S2) is seen throughout Batan (42%), the Philippines (4% to 33%) and Indonesia (4% to 18%). It has a patchy distribution among TwMtA (3-33%) and only some southern TwMtA show frequencies greater than 10% (i.e. Puyuma, Paiwan and Yami). O1a*-M119 was not observed in our Amis sample, neither in the Bunun, Saisiyat and Thao and has a low frequency among TwHan (1.4% to 2%). O1a*-M119 was neither observed outside Taiwan, among the Kalimantan in Borneo nor in Sumatra. However, the contour map interpolation (which includes published data, see Additional file 1: Table S1) indicates its presence in Western Sumatra, towards the Indian Ocean (Figure 3). The internal STR diversity of O1a*-M119 decreases gradually from mainland China towards the south, although a second, somewhat lower peak of diversity is observed around the Moluccas. The MJ network of O1a*-M119 (Figure 4) clearly differentiates TwMtA and TwPlt from Indonesia (Indonesian STR haplotypes are all included in the lower left reticula- tion of the O1a*-M119 network), whereas most Filipinos O1a*-M119 haplotypes are shared or very similar to those found among TwMtA and TwPlt. Age estimates based on the amount of molecular variation for O1a*-M119 were higher among TwMtA (19.96 ± 5.93 Kya) and the Philippines (20.36 ± 5.65 Kya) than in Indonesia (4.14 ± 2.67 Kya) (Table 2 and Additional file 1: Table S3). O1a1*-P203, derived from O1a*-M119 (Figure 2 and Additional file 1: Table S2), is the most common haplogroup among the northern TwMtA and the Tsou (90%), while among southern TwMtA it represents about half of the Y chromosomes observed. It is also quite common in several TwPlt (e.g. Pazeh, Ketagalan and Siraya in various locations), but less frequent in the Philippines (13.7%) and Indonesia (16.3%), although it is observed in 36% of the Kalimantan in Borneo. It is also commonly seen in Han (22% and 14.2% in Fujian and TwHan, respectively), but uncommon in Thailand (2.7%), and was not observed in the Vietnamese (Hanoi) and the Akha. Accordingly, the contour map of the frequency variation of O1a1*-P203 shows two main locations of high frequency, in Taiwan and southwestern Borneo, and decreasing frequencies from these locations towards the Philippines, whereas the contour map of its internal diversity is more complex (Figure 3). The O1a1*-P203 MJ network has a marked star-like shape, with a central frequent haplotype detected throughout mainland and insular SEA (Figure 4). Interestingly, haplotypes observed in the Atayal branch off from this central node to form a distinct network, suggesting a founding event and a period of isolation in this population. The intriguing observation of Tsou O1a1*-P203 haplotypes deriving from those of Atayal evidenced by the seven STRs MJ network is no longer sustained when using more than 13 Y-STRs, as the two groups then split at an earlier founding level (13 Y-STR MJ network, data not shown). Age estimates of O1a1*-P203 in TwMtA (16.3 ± 5.9 Kya) and the Philippines (16.05 ± 3.6) were higher than estimates obtained for TwHan, Fujian or Western Indonesia (8.59 ± 3.3, 11.6 ± 6.1 and 7.28 ± 4.0, respectively) (Table 2 and Additional file 1: Table S3). Haplogroup O1a1a-M101 [46] is a para-group of O1a1*-P203 (Figure 2). It was tested on all O1a1*-P203 in our dataset but was not seen. Haplogroup O1a2-M50 (or O1a2-M110, Figure 2 and Additional file 1: Table S2), a sister haplogroup of O1a1*-P203, is frequently observed among southern TwMtA (from 18% to 28%) and in the Bunun (61%), variable in the Philippines (from 3% to 16.7%), but rather rare in Western Indonesia (Java and Sumatra, ~5.3%). Note however that a high frequency of this haplotype has been reported in South Nias (~80%), an island of the Indian Ocean in front of Sumatra [26,63,64]. O1a2-M50 was not seen in our dataset in the Yami and was rare in TwHan, Fujian, Thailand, Vietnam and Malaysia. However, it has been reported as prevalent among a few Daic-speaking groups from southern China and Hainan (3% to 25%) [26,65], thus explaining the decrease in frequency and molecular variation of O1a2-M50 from Taiwan to the Philippines and Indonesia (Figure 3). Age estimates of diversity of this haplogroup decrease from Taiwan (17.54 ± 3.56 Kya) to the Philippines (12.5 ± 5.9 Kya) and Western Indonesia (7.06 ± 2.63 Kya) (Table 2 and Additional file 1: Table S3). The star-like shape of the O1a2-M50 MJ network places in the central nodes the STR haplotypes observed among TwMtA, TwPlt, Filipinos and Indonesians (Figure 4). We also notice that the haplotype continuity observed along the Taiwan-Philippines-Indonesian pathway can be traced further away toward Madagascar and the Solomon Islands ...
Context 11
... but results are to be considered with caution. Gene contribution estimates between populations were inferred by two methods; firstly the coalescent approach of Admix version 2.0, which infers contributions from parental populations according to STR haplotype frequencies considering each haplotype as an allele of the same locus [51,52], and secondly using the analysis of shared STR lineages (LS) between populations [53] to infer those contributions as well as to determine the unshared gene portion in the hybrid population. The Fujian and Taiwan Han samples were pooled so as to constitute a putative Han parental contributor and a pool of most samples of Austronesian speaking groups, namely TwMtA, Filipinos and Western Indonesians (i.e. all samples from Borneo, Sumatra and Java) as the other putative parental contributor. Seven STRs were used in these analyses (i.e. DYS19, DYS389I, DYS389II, DYS390, DYS391, DYS392 and DYS393. Both the SNP and STR datasets were used to perform AMOVA analyses of population structure using the Arlequin 3.5.1.2 software [54]. Multidimensional scaling (MDS) analysis was performed to represent patterns of genetic relationships between all groups in our data set. A Reynolds distance matrix was obtained from frequency distributions of SNP haplogroups with Arlequin, and used as input for MDS analysis. MDS plots were constructed using XLSTAT software version 7.5.2 [55]. Median joining (MJ) networks of Y-STR haplotypes (defined by DYS19, DYS389I, DYS389II, DYS390, DYS391, DYS392 and DYS393) for relevant SNP haplogroups and sub-haplogroups of the O clade were constructed from reduced median joining networks using the NETWORK 4.1.0.6 software [56]. We used a reduction of one and locus specific weights based on the relative mutation rates of the following Y-STR loci: DYS19 (weight of 5), DYS389I (4), DYS389II (3), DYS390 (2), DYS391 (2), DYS392 (20), and DYS393 (20) [57-60]. The frequency distribution of Y-chromosome SNP haplogroups detected in Taiwan, ISEA and Indochina is shown in Figure 2 and reported in detail in Additional file 1: Table S2, and are summarized is Additional file 1: Figures S1 and S2. Additional file 1: Figure S3 displays the variation of diversity measures according to latitude among TwMtA. The interpolation contour maps resulting from applying the Kriging method both to the frequency distributions of Y-SNP clades and their internal STR diversity are shown side-by-side in Figure 3. Forty seven out of the 81 genotyped Y-SNPs were observed in the derived state [61], thus defining 47 haplogroups observed in our samples, that belong to major clades C, D, F*, G, H, J, K*, L, N, O, P*, Q and R. MJ networks of major Y-SNP clades in SEA are shown in Figure 4. Age estimates of the diversity of these clades, based on the assumption that mutations in haplotypes accumulate in situ (i.e. no gene flow), are reported in Table 2 and Additional file 1: Table S3. The most prominent clade in Taiwan, O-M175 as a whole, represents almost 90% of Y chromosomes among the TwHan, about 95% among the TwPlt and more than 99% among the TwMtA (Additional file 1: Figures S1 and S2). Only one representative of the basal O*/O1*- M175 (xM119, P31, M122) was seen in each of the Luzon (Philippines), Fujian and TwPlt samples (Figure 2 and Additional file 1: Table S2). All other haplogroups of the O clade were observed at the derived state for the M119, P31 and M122 markers (Figure 2, Additional file 1: Table S2, and Additional file 1: Figures S1 and S2). Haplogroup O1a*-M119 (Figure 2 and Additional file 1: Table S2) is seen throughout Batan (42%), the Philippines (4% to 33%) and Indonesia (4% to 18%). It has a patchy distribution among TwMtA (3-33%) and only some southern TwMtA show frequencies greater than 10% (i.e. Puyuma, Paiwan and Yami). O1a*-M119 was not observed in our Amis sample, neither in the Bunun, Saisiyat and Thao and has a low frequency among TwHan (1.4% to 2%). O1a*-M119 was neither observed outside Taiwan, among the Kalimantan in Borneo nor in Sumatra. However, the contour map interpolation (which includes published data, see Additional file 1: Table S1) indicates its presence in Western Sumatra, towards the Indian Ocean (Figure 3). The internal STR diversity of O1a*-M119 decreases gradually from mainland China towards the south, although a second, somewhat lower peak of diversity is observed around the Moluccas. The MJ network of O1a*-M119 (Figure 4) clearly differentiates TwMtA and TwPlt from Indonesia (Indonesian STR haplotypes are all included in the lower left reticula- tion of the O1a*-M119 network), whereas most Filipinos O1a*-M119 haplotypes are shared or very similar to those found among TwMtA and TwPlt. Age estimates based on the amount of molecular variation for O1a*-M119 were higher among TwMtA (19.96 ± 5.93 Kya) and the Philippines (20.36 ± 5.65 Kya) than in Indonesia (4.14 ± 2.67 Kya) (Table 2 and Additional file 1: Table S3). O1a1*-P203, derived from O1a*-M119 (Figure 2 and Additional file 1: Table S2), is the most common haplogroup among the northern TwMtA and the Tsou (90%), while among southern TwMtA it represents about half of the Y chromosomes observed. It is also quite common in several TwPlt (e.g. Pazeh, Ketagalan and Siraya in various locations), but less frequent in the Philippines (13.7%) and Indonesia (16.3%), although it is observed in 36% of the Kalimantan in Borneo. It is also commonly seen in Han (22% and 14.2% in Fujian and TwHan, respectively), but uncommon in Thailand (2.7%), and was not observed in the Vietnamese (Hanoi) and the Akha. Accordingly, the contour map of the frequency variation of O1a1*-P203 shows two main locations of high frequency, in Taiwan and southwestern Borneo, and decreasing frequencies from these locations towards the Philippines, whereas the contour map of its internal diversity is more complex (Figure 3). The O1a1*-P203 MJ network has a marked star-like shape, with a central frequent haplotype detected throughout mainland and insular SEA (Figure 4). Interestingly, haplotypes observed in the Atayal branch off from this central node to form a distinct network, suggesting a founding event and a period of isolation in this population. The intriguing observation of Tsou O1a1*-P203 haplotypes deriving from those of Atayal evidenced by the seven STRs MJ network is no longer sustained when using more than 13 Y-STRs, as the two groups then split at an earlier founding level (13 Y-STR MJ network, data not shown). Age estimates of O1a1*-P203 in TwMtA (16.3 ± 5.9 Kya) and the Philippines (16.05 ± 3.6) were higher than estimates obtained for TwHan, Fujian or Western Indonesia (8.59 ± 3.3, 11.6 ± 6.1 and 7.28 ± 4.0, respectively) (Table 2 and Additional file 1: Table S3). Haplogroup O1a1a-M101 [46] is a para-group of O1a1*-P203 (Figure 2). It was tested on all O1a1*-P203 in our dataset but was not seen. Haplogroup O1a2-M50 (or O1a2-M110, Figure 2 and Additional file 1: Table S2), a sister haplogroup of O1a1*-P203, is frequently observed among southern TwMtA (from 18% to 28%) and in the Bunun (61%), variable in the Philippines (from 3% to 16.7%), but rather rare in Western Indonesia (Java and Sumatra, ~5.3%). Note however that a high frequency of this haplotype has been reported in South Nias (~80%), an island of the Indian Ocean in front of Sumatra [26,63,64]. O1a2-M50 was not seen in our dataset in the Yami and was rare in TwHan, Fujian, Thailand, Vietnam and Malaysia. However, it has been reported as prevalent among a few Daic-speaking groups from southern China and Hainan (3% to 25%) [26,65], thus explaining the decrease in frequency and molecular variation of O1a2-M50 from Taiwan to the Philippines and Indonesia (Figure 3). Age estimates of diversity of this haplogroup decrease from Taiwan (17.54 ± 3.56 Kya) to the Philippines (12.5 ± 5.9 Kya) and Western Indonesia (7.06 ± 2.63 Kya) (Table 2 and Additional file 1: Table S3). The star-like shape of the O1a2-M50 MJ network places in the central nodes the STR haplotypes observed among TwMtA, TwPlt, Filipinos and Indonesians (Figure 4). We also notice that the haplotype continuity observed along the Taiwan-Philippines-Indonesian pathway can be traced further away toward Madagascar and the Solomon Islands [7,66,67]. Haplogroup O2 is comprised of two derived branches (Figure 2). The first branch includes subtypes O2*-P31 observed in Han (4%), TwPlt (5.7%), the Philippines (Luzon, 4.7%) and Indonesia (3.25%), and O2a*-PK4, O2a1*-M95 and O2a1a-M88 mostly seen among groups speaking Daic languages in South China and Indochina, and in Indonesia (Additional file 1: Table S2). The second O2 branch includes subtypes of O2b-SRY 465 that are only found among Korean, Japanese and northern Chinese [11,68]. The distribution of the recently defined O2a*-PK4 [42] was reanalyzed by two separate laboratories (P. Underhill, Stanford University School of Medicine, CA, USA, personal communication and this study) using a total of 105 individuals (data not shown). The A to T transversion of PK4 [8,42] was confirmed to be ancestral to the M95 SNP (Figure 2). In our dataset, haplogroups O2a1-M95 and O2a1a-M88 occurred principally in Indochina (~20% and 20% for O2a1-M95 and O2a1a-M88, respectively, in Thailand, and 8% and 25% in Vietnam) and Indonesia (29% and 3%), and had a scattered distribution between Fujian, TwHan and TwPlt, with rather low frequencies. Among the TwMtA, the Yami (10% and 3.33% for O2a1-M95 and O2a1a-M88, respectively) and Bunun (0% and 37.5%) were the only Mountain tribes bearing haplogroup O2. While a study reported the presence of O2a1-M95 and O2a1a-M88 in Mindanao [7], only O2a1a-M88 (3%) was seen in our Philippines dataset (6.45% in Visayas, 3.42% on average for all the Philippines). We notice that the most frequent O2a1a-M88 Y-STR haplotype in the Bunun is also observed in China, among Daic-speaking populations, in Indochina and in Indonesia, but not among Solomon islanders, TwPlt ...
Context 12
... distribution of Y-chromosome SNP haplogroups detected in Taiwan, ISEA and Indochina is shown in Figure 2 and reported in detail in Additional file 1: Table S2, and are summarized is Additional file 1: Figures S1 and S2. Additional file 1: Figure S3 displays the variation of diversity measures according to latitude among TwMtA. The interpolation contour maps resulting from applying the Kriging method both to the frequency distributions of Y-SNP clades and their internal STR diversity are shown side-by-side in Figure 3. Forty seven out of the 81 genotyped Y-SNPs were observed in the derived state [61], thus defining 47 haplogroups observed in our samples, that belong to major clades C, D, F*, G, H, J, K*, L, N, O, P*, Q and R. MJ networks of major Y-SNP clades in SEA are shown in Figure 4. Age estimates of the diversity of these clades, based on the assumption that mutations in haplotypes accumulate in situ (i.e. no gene flow), are reported in Table 2 and Additional file 1: Table S3. The most prominent clade in Taiwan, O-M175 as a whole, represents almost 90% of Y chromosomes among the TwHan, about 95% among the TwPlt and more than 99% among the TwMtA (Additional file 1: Figures S1 and S2). Only one representative of the basal O*/O1*- M175 (xM119, P31, M122) was seen in each of the Luzon (Philippines), Fujian and TwPlt samples (Figure 2 and Additional file 1: Table S2). All other haplogroups of the O clade were observed at the derived state for the M119, P31 and M122 markers (Figure 2, Additional file 1: Table S2, and Additional file 1: Figures S1 and S2). Haplogroup O1a*-M119 (Figure 2 and Additional file 1: Table S2) is seen throughout Batan (42%), the Philippines (4% to 33%) and Indonesia (4% to 18%). It has a patchy distribution among TwMtA (3-33%) and only some southern TwMtA show frequencies greater than 10% (i.e. Puyuma, Paiwan and Yami). O1a*-M119 was not observed in our Amis sample, neither in the Bunun, Saisiyat and Thao and has a low frequency among TwHan (1.4% to 2%). O1a*-M119 was neither observed outside Taiwan, among the Kalimantan in Borneo nor in Sumatra. However, the contour map interpolation (which includes published data, see Additional file 1: Table S1) indicates its presence in Western Sumatra, towards the Indian Ocean (Figure 3). The internal STR diversity of O1a*-M119 decreases gradually from mainland China towards the south, although a second, somewhat lower peak of diversity is observed around the Moluccas. The MJ network of O1a*-M119 (Figure 4) clearly differentiates TwMtA and TwPlt from Indonesia (Indonesian STR haplotypes are all included in the lower left reticula- tion of the O1a*-M119 network), whereas most Filipinos O1a*-M119 haplotypes are shared or very similar to those found among TwMtA and TwPlt. Age estimates based on the amount of molecular variation for O1a*-M119 were higher among TwMtA (19.96 ± 5.93 Kya) and the Philippines (20.36 ± 5.65 Kya) than in Indonesia (4.14 ± 2.67 Kya) (Table 2 and Additional file 1: Table S3). O1a1*-P203, derived from O1a*-M119 (Figure 2 and Additional file 1: Table S2), is the most common haplogroup among the northern TwMtA and the Tsou (90%), while among southern TwMtA it represents about half of the Y chromosomes observed. It is also quite common in several TwPlt (e.g. Pazeh, Ketagalan and Siraya in various locations), but less frequent in the Philippines (13.7%) and Indonesia (16.3%), although it is observed in 36% of the Kalimantan in Borneo. It is also commonly seen in Han (22% and 14.2% in Fujian and TwHan, respectively), but uncommon in Thailand (2.7%), and was not observed in the Vietnamese (Hanoi) and the Akha. Accordingly, the contour map of the frequency variation of O1a1*-P203 shows two main locations of high frequency, in Taiwan and southwestern Borneo, and decreasing frequencies from these locations towards the Philippines, whereas the contour map of its internal diversity is more complex (Figure 3). The O1a1*-P203 MJ network has a marked star-like shape, with a central frequent haplotype detected throughout mainland and insular SEA (Figure 4). Interestingly, haplotypes observed in the Atayal branch off from this central node to form a distinct network, suggesting a founding event and a period of isolation in this population. The intriguing observation of Tsou O1a1*-P203 haplotypes deriving from those of Atayal evidenced by the seven STRs MJ network is no longer sustained when using more than 13 Y-STRs, as the two groups then split at an earlier founding level (13 Y-STR MJ network, data not shown). Age estimates of O1a1*-P203 in TwMtA (16.3 ± 5.9 Kya) and the Philippines (16.05 ± 3.6) were higher than estimates obtained for TwHan, Fujian or Western Indonesia (8.59 ± 3.3, 11.6 ± 6.1 and 7.28 ± 4.0, respectively) (Table 2 and Additional file 1: Table S3). Haplogroup O1a1a-M101 [46] is a para-group of O1a1*-P203 (Figure 2). It was tested on all O1a1*-P203 in our dataset but was not seen. Haplogroup O1a2-M50 (or O1a2-M110, Figure 2 and Additional file 1: Table S2), a sister haplogroup of O1a1*-P203, is frequently observed among southern TwMtA (from 18% to 28%) and in the Bunun (61%), variable in the Philippines (from 3% to 16.7%), but rather rare in Western Indonesia (Java and Sumatra, ~5.3%). Note however that a high frequency of this haplotype has been reported in South Nias (~80%), an island of the Indian Ocean in front of Sumatra [26,63,64]. O1a2-M50 was not seen in our dataset in the Yami and was rare in TwHan, Fujian, Thailand, Vietnam and Malaysia. However, it has been reported as prevalent among a few Daic-speaking groups from southern China and Hainan (3% to 25%) [26,65], thus explaining the decrease in frequency and molecular variation of O1a2-M50 from Taiwan to the Philippines and Indonesia (Figure 3). Age estimates of diversity of this haplogroup decrease from Taiwan (17.54 ± 3.56 Kya) to the Philippines (12.5 ± 5.9 Kya) and Western Indonesia (7.06 ± 2.63 Kya) (Table 2 and Additional file 1: Table S3). The star-like shape of the O1a2-M50 MJ network places in the central nodes the STR haplotypes observed among TwMtA, TwPlt, Filipinos and Indonesians (Figure 4). We also notice that the haplotype continuity observed along the Taiwan-Philippines-Indonesian pathway can be traced further away toward Madagascar and the Solomon Islands [7,66,67]. Haplogroup O2 is comprised of two derived branches (Figure 2). The first branch includes subtypes O2*-P31 observed in Han (4%), TwPlt (5.7%), the Philippines (Luzon, 4.7%) and Indonesia (3.25%), and O2a*-PK4, O2a1*-M95 and O2a1a-M88 mostly seen among groups speaking Daic languages in South China and Indochina, and in Indonesia (Additional file 1: Table S2). The second O2 branch includes subtypes of O2b-SRY 465 that are only found among Korean, Japanese and northern Chinese [11,68]. The distribution of the recently defined O2a*-PK4 [42] was reanalyzed by two separate laboratories (P. Underhill, Stanford University School of Medicine, CA, USA, personal communication and this study) using a total of 105 individuals (data not shown). The A to T transversion of PK4 [8,42] was confirmed to be ancestral to the M95 SNP (Figure 2). In our dataset, haplogroups O2a1-M95 and O2a1a-M88 occurred principally in Indochina (~20% and 20% for O2a1-M95 and O2a1a-M88, respectively, in Thailand, and 8% and 25% in Vietnam) and Indonesia (29% and 3%), and had a scattered distribution between Fujian, TwHan and TwPlt, with rather low frequencies. Among the TwMtA, the Yami (10% and 3.33% for O2a1-M95 and O2a1a-M88, respectively) and Bunun (0% and 37.5%) were the only Mountain tribes bearing haplogroup O2. While a study reported the presence of O2a1-M95 and O2a1a-M88 in Mindanao [7], only O2a1a-M88 (3%) was seen in our Philippines dataset (6.45% in Visayas, 3.42% on average for all the Philippines). We notice that the most frequent O2a1a-M88 Y-STR haplotype in the Bunun is also observed in China, among Daic-speaking populations, in Indochina and in Indonesia, but not among Solomon islanders, TwPlt or Han (Figure 4). In fact, Bunun O2a1a-M88 lineages do not belong to the MJ branch observed in the Solomon Islands or Madagascar. Eleven O3 lineages, out of the 19 described by Karafet et al. [8,26,39] were observed in this study (Figure 2 and Additional file 1: Table S2). The most widely distributed sub-haplogroups of O3 in our dataset are O3a1c*-IMS- JST002611, O3a2*-P201, O3a2b*-M7 and O3a2c*-P164. A rather high prevalence of haplogroup O3a1c*-IMS- JST002611 is observed in Fujian Han (~25%) and in the Taiwanese Minnan and Hakka (13% and 21%), as well as among the TwPlt (12%), whereas it occurs at an extremely low frequency in TwMtA (it was only observed in the Yami, at 3%), in the Philippines (0.8%) and in Indonesia (2%) (Figure 2 and Additional file 1: Table S2). Its high frequency among the Ivatan (25%) of the Batan archipelago north of the Philippines is however associated with a low age estimate (0.9 Kya) (Table 2 and Additional file 1: Table S3). The MJ network of this haplogroup is characterized by the presence of Han Y-chromosomes at most founding nodes (Figure 4). Accordingly, a location of high frequency of haplogroup O3a1c*-IMS-JST002611 emerges from the contour map, centered in the southeastern Chinese coast, with no clear clinal pattern of frequency variation across SEA (Figure 3). The contour map of O3a1c*-IMS-JST002611 diversity displays a high peak in Indonesia, but this is due to a few (only 4), molecularly distant STR-haplotypes which translates to an older but less precise age estimate (28.47 ± 13.76 Kya, Table 2 and Additional file 1: Table S3). Other commonly observed haplogroups in the O3 series are the paragroups of the O3a2*-P201 branch (Figure 2 and Additional file 1: Table S2). Note that mutation P201 has not been tested in the Daic-speaking groups of China and Hainan [27], in which the resolution of O3 * included O3*, O3a*, O3a1*, O3a1b, O3a1c, O3a2*, O3a2a, O3a2c*, O3a3 and O3a4 ...
Context 13
... Borneo, and decreasing frequencies from these locations towards the Philippines, whereas the contour map of its internal diversity is more complex (Figure 3). The O1a1*-P203 MJ network has a marked star-like shape, with a central frequent haplotype detected throughout mainland and insular SEA (Figure 4). Interestingly, haplotypes observed in the Atayal branch off from this central node to form a distinct network, suggesting a founding event and a period of isolation in this population. The intriguing observation of Tsou O1a1*-P203 haplotypes deriving from those of Atayal evidenced by the seven STRs MJ network is no longer sustained when using more than 13 Y-STRs, as the two groups then split at an earlier founding level (13 Y-STR MJ network, data not shown). Age estimates of O1a1*-P203 in TwMtA (16.3 ± 5.9 Kya) and the Philippines (16.05 ± 3.6) were higher than estimates obtained for TwHan, Fujian or Western Indonesia (8.59 ± 3.3, 11.6 ± 6.1 and 7.28 ± 4.0, respectively) (Table 2 and Additional file 1: Table S3). Haplogroup O1a1a-M101 [46] is a para-group of O1a1*-P203 (Figure 2). It was tested on all O1a1*-P203 in our dataset but was not seen. Haplogroup O1a2-M50 (or O1a2-M110, Figure 2 and Additional file 1: Table S2), a sister haplogroup of O1a1*-P203, is frequently observed among southern TwMtA (from 18% to 28%) and in the Bunun (61%), variable in the Philippines (from 3% to 16.7%), but rather rare in Western Indonesia (Java and Sumatra, ~5.3%). Note however that a high frequency of this haplotype has been reported in South Nias (~80%), an island of the Indian Ocean in front of Sumatra [26,63,64]. O1a2-M50 was not seen in our dataset in the Yami and was rare in TwHan, Fujian, Thailand, Vietnam and Malaysia. However, it has been reported as prevalent among a few Daic-speaking groups from southern China and Hainan (3% to 25%) [26,65], thus explaining the decrease in frequency and molecular variation of O1a2-M50 from Taiwan to the Philippines and Indonesia (Figure 3). Age estimates of diversity of this haplogroup decrease from Taiwan (17.54 ± 3.56 Kya) to the Philippines (12.5 ± 5.9 Kya) and Western Indonesia (7.06 ± 2.63 Kya) (Table 2 and Additional file 1: Table S3). The star-like shape of the O1a2-M50 MJ network places in the central nodes the STR haplotypes observed among TwMtA, TwPlt, Filipinos and Indonesians (Figure 4). We also notice that the haplotype continuity observed along the Taiwan-Philippines-Indonesian pathway can be traced further away toward Madagascar and the Solomon Islands [7,66,67]. Haplogroup O2 is comprised of two derived branches (Figure 2). The first branch includes subtypes O2*-P31 observed in Han (4%), TwPlt (5.7%), the Philippines (Luzon, 4.7%) and Indonesia (3.25%), and O2a*-PK4, O2a1*-M95 and O2a1a-M88 mostly seen among groups speaking Daic languages in South China and Indochina, and in Indonesia (Additional file 1: Table S2). The second O2 branch includes subtypes of O2b-SRY 465 that are only found among Korean, Japanese and northern Chinese [11,68]. The distribution of the recently defined O2a*-PK4 [42] was reanalyzed by two separate laboratories (P. Underhill, Stanford University School of Medicine, CA, USA, personal communication and this study) using a total of 105 individuals (data not shown). The A to T transversion of PK4 [8,42] was confirmed to be ancestral to the M95 SNP (Figure 2). In our dataset, haplogroups O2a1-M95 and O2a1a-M88 occurred principally in Indochina (~20% and 20% for O2a1-M95 and O2a1a-M88, respectively, in Thailand, and 8% and 25% in Vietnam) and Indonesia (29% and 3%), and had a scattered distribution between Fujian, TwHan and TwPlt, with rather low frequencies. Among the TwMtA, the Yami (10% and 3.33% for O2a1-M95 and O2a1a-M88, respectively) and Bunun (0% and 37.5%) were the only Mountain tribes bearing haplogroup O2. While a study reported the presence of O2a1-M95 and O2a1a-M88 in Mindanao [7], only O2a1a-M88 (3%) was seen in our Philippines dataset (6.45% in Visayas, 3.42% on average for all the Philippines). We notice that the most frequent O2a1a-M88 Y-STR haplotype in the Bunun is also observed in China, among Daic-speaking populations, in Indochina and in Indonesia, but not among Solomon islanders, TwPlt or Han (Figure 4). In fact, Bunun O2a1a-M88 lineages do not belong to the MJ branch observed in the Solomon Islands or Madagascar. Eleven O3 lineages, out of the 19 described by Karafet et al. [8,26,39] were observed in this study (Figure 2 and Additional file 1: Table S2). The most widely distributed sub-haplogroups of O3 in our dataset are O3a1c*-IMS- JST002611, O3a2*-P201, O3a2b*-M7 and O3a2c*-P164. A rather high prevalence of haplogroup O3a1c*-IMS- JST002611 is observed in Fujian Han (~25%) and in the Taiwanese Minnan and Hakka (13% and 21%), as well as among the TwPlt (12%), whereas it occurs at an extremely low frequency in TwMtA (it was only observed in the Yami, at 3%), in the Philippines (0.8%) and in Indonesia (2%) (Figure 2 and Additional file 1: Table S2). Its high frequency among the Ivatan (25%) of the Batan archipelago north of the Philippines is however associated with a low age estimate (0.9 Kya) (Table 2 and Additional file 1: Table S3). The MJ network of this haplogroup is characterized by the presence of Han Y-chromosomes at most founding nodes (Figure 4). Accordingly, a location of high frequency of haplogroup O3a1c*-IMS-JST002611 emerges from the contour map, centered in the southeastern Chinese coast, with no clear clinal pattern of frequency variation across SEA (Figure 3). The contour map of O3a1c*-IMS-JST002611 diversity displays a high peak in Indonesia, but this is due to a few (only 4), molecularly distant STR-haplotypes which translates to an older but less precise age estimate (28.47 ± 13.76 Kya, Table 2 and Additional file 1: Table S3). Other commonly observed haplogroups in the O3 series are the paragroups of the O3a2*-P201 branch (Figure 2 and Additional file 1: Table S2). Note that mutation P201 has not been tested in the Daic-speaking groups of China and Hainan [27], in which the resolution of O3 * included O3*, O3a*, O3a1*, O3a1b, O3a1c, O3a2*, O3a2a, O3a2c*, O3a3 and O3a4 [42]. Here O3a2*-P201 did not exceed 2% in any pooled population group of our data set, and reached 4.35% in the Puyuma. These low frequencies and the high age estimates seen in Taiwan (i.e. Puyuma) (15 ± 3 Kya) and Western Indonesia (17 Kya ± 2 Kya; Additional file 1: Table S3) hint late to recent gene flow. The Kriging contour maps suggest two paths linking the patterns of diversity of O3a2*-P201, namely from mainland SEA towards Taiwan and from mainland SEA to Indonesia through the Indochinese peninsula (Figure 3). However, these latter results must be viewed with caution given the general low frequency of O3a2*-P201 over the whole region. Haplogroup O3a2b*-M7 is uncommon in the Fujian Han (less than 2%), in Taiwan (only observed at low frequencies in TwHan and TwPlt, not observed among TwMtA) and the Philippines, but was more frequent in Indonesia (8.54%) (Figure 2 and Additional file 1: Table S2). In this study we observe it also in Indochina (12.5% and 6.7% in Vietnam and Thailand, respectively), as well as in Malaysia (although this latter region is represented by a sample of only 8 Y chromosomes). Age estimates of O3a2b*-M7 diversity for Indonesia (22.18 ± 6.27 Kya) and TwPlt (16.56 ± 4.67) are somewhat older than that for Thailand (12.42 ± 3.45) (Additional file 1: Table S3). However, as already noted, these estimates are based on the assumption that no diversity is introduced in a clade by gene flow between populations. However, the scattered and generally derived location of Indonesian, TwPlt and Filipino haplotypes in the O3a2b*-M7 MJ network is consistent with repeated introductions of these haplotypes by gene flow from the mainland into the islands (Figure 4). Haplogroup O3a2c*-P164 is frequently observed among TwMtA on the east coast of Taiwan (Amis 35.9%, Puyuma 13%, Taroko 5%), in the Philippines (from 8% to 50%) and in western Indonesia (17.8%, Kalimantan 20%, and Sulawesi 11.8%), while it is rather uncommon in Indochina and among the Han (Figures 2 and 3, and Additional file 1: Table S2). Interestingly, the star-like MJ network of O3a2c*-P164 shows core nodes mostly comprised of Han, Korean/Japanese and Indochinese groups (Figure 3) and radiating sectors mainly comprised of either Korean/ Japanese and Tibet, or Indochinese, or delineating TwMtA. This structure supports the dispersal paths put forward in the contour map (Figure 4). In turn, derived haplogroups O3a2c1*-M134 and O3a2c1a-M133 are rarely seen among TwMtA. They often occur together and are more frequent among Han and TwPlt as well as in Sumatra, Borneo and the Visayas in the Philippines. Age estimates of the diversity of these two haplogroups all point to the early Neolithic period (i.e. < 10,000 Kya), with the notable exception of estimates for the Han groups (Table 2 and Additional file 1: Table S3). Accordingly, the contour maps of O3a2c1*-M134 display a general location of higher frequency and diversity related to Han populations, in mainland SEA (Figure 3). Finally, haplogroups C, D, F*, H, K*, N, P*, Q and R haplogroups were observed at low frequency in the region, with patchy distributions (Figure 2 and Additional file 1: Table S2). The plot of the multidimensional scaling (MDS) analysis of pairwise Reynolds genetic distances obtained from high definition Y-SNP haplogroup frequencies in our dataset is shown in Figure 5. Most population samples are located in the center of the plot, with Austronesian speaking groups from Indonesia and the Philippines surrounded by populations from Vietnam, Thailand and Malaysia (differentiating towards the upper-left part of the plot), most TwMtA differentiating towards the bottom- right, and the Han, both from the mainland and from Taiwan, differentiating towards the upper-left along the x axis. The ...
Context 14
... speaking populations. In turn, TwPlt Y-SNP haplogroups are predominantly shared with Han populations (Figure 2 and Additional file 1: Table S2) and expectedly the “ ancestral Han ” contribution estimated with Admix and LS is high (62% and 79% respectively). However, the effective relative frequency of parental Y-STR haplotypes contributed by each parental group, as estimated through the LS method (shown in brackets in Table 3), reached a lower level than anticipated, with only 12% of TwPlt Y-STR variation attributed to the “ ancestral Han ” gene pool and only 3% to the “ ancestral Austronesian ” . This indicates that 79% (i.e. 44/ 209) of the remaining Y-STR variation is unique to the TwPlt. Such a large amount of unshared variation could only have been acquired after a long period of settlement in isolation from other groups, much longer than 400 years, date at which Han Chinese (Minnan and Hakka) migrated to Taiwan from southeast China [1]. Actually, on the basis of MJ networks constructed using only the unshared haplotypes showing continuity in the background of their respective haplogroups we tentatively estimated this period to ~3 to 8 Kya. Analysis of molecular variance (AMOVA) was performed first using Y-SNP haplogroups and then Y-STRs haplotypes (Table 4). The TwMtA group shows the lowest SNP variance within populations (70.3%), consistent with the low gene diversity observed in this group of populations, ranging from 0.10 to 0.7, or averaging to 0.60 ± 0.02 over- all (Additional file 1: Table S2). Accordingly, the TwMtA also display the highest SNP variance due to differentiation between tribes (29.7%) thus explaining the scattered location of the Taiwan tribes observed in the MDS plot (Figure 5 and Additional file 1: Figure S4). In contrast to TwMtA, high SNP variation between individuals within populations is found for Western Indonesia (94.30%), the Philippines (98.10%) and TwPlt (96.24%), and the levels of differentiation among populations within groups are correspondingly low, ranging from non-significant for Filipino populations to less that 6% among western Indonesians, even though populations in Western Indonesia and the Philippines are broadly dispersed over many isolated regions of ISEA. On another other hand, the high variance seen within TwPlt populations (96.24%) was expected as they are heavily admixed groups. When testing genetic differentiation among four separate geographical groups, namely TwMtA, Philippines, Western Indonesia and Han (representing mainland China), or three distinct language family assortments (Formosan, Malayo-Polynesian and Sino-Tibetan), we observe that the variation between groups is large (18% and 18.55% respectively, P < 0.001) and did not show much difference between geographic or linguistic groupings. This pattern remains the same when including Indochina as a fifth geographic region or as a fourth linguistic group. Very little changes from the Y-SNP results were observed when using Y-STRs to perform AMOVA computations on the individual groups of populations, with the variance due to differences between individuals within populations being extremely high for TwPlt, Philippines and Western Indonesia (above 96%) and much lower for TwMtA (68%). However, with Y-STRs we observe that the component of the variance due to differences between groups of populations, although significant, is always lower than that due to differences among populations within these groups, thus indicating that, contrarily to Y-SNPs, Y-STRs fail to detect a population structure associated with geography or linguistics. Our data, obtained through the genotyping of 81 high Y-SNP definition markers to determine the fine distribution of Y-chromosome haplogroups O in ISEA, revealed a high level of population structure in the region including Taiwan, the Philippines and Indonesia, mainly defined by variable distributions of haplogroups belonging to the O clade [9,27,70]. We also genotyped 17 Y-specific STR markers, in order to gain insights into the distribution of the Y-chromosome variation in this region of the world. Since only one population from the mainland east coast of China (Fujian Han) was analyzed, data from other populations of SEA were obtained from the literature (Additional file 1: Table S1). The O clade, to which belong 95% of Y-chromosomes in Taiwan, reflects an ancestral relationship to the early modern human settlement in East Asia [28-30] with haplogroup, O1 being mostly seen in Southeast China, Taiwan and ISEA, haplogroup O2a predominantly confined to southeast China and Indochina, and haplogroup O3 broadly present in mainland China. The Y-SNP gene diversity among TwMtA was found generally low (from 0.1 to 0.7, Additional file 1: Table S2). Except for the presence of haplogroups O2a1a-M88 predominantly seen in Bunun (37.5%) and O3a2c-P164 in Puyuma and Amis, the low diversity found in TwMtA is principally associated with the molecular variation of haplogroups within O1 (Figure 2 and Additional file 1: Table S2 and Additional file 1: Figure S1). In sharp contrast with the TwMtA, non-aboriginal groups in Taiwan, namely Minnan, Hakka and the general mixed Han Taiwanese (MiscHan), as well as the TwPlt groups all present high levels of gene diversity, and a total of 26 distinct branches of the O clade (including the O1, O2 and O3 clades) were observed in these groups at variable frequencies. These results suggest substantial admixture among plain tribe groups in Taiwan, not among mountain tribes. Fast genetic drift in TwMtA, due to small population size and isolation is a likely explanation for these observations, and a founder event linked to the initial settlement of the island is also plausible. On another hand, larger population sizes in TwPlt, possibly related to gene flow and admixture events, would have resulted in the higher levels of diversity in TwPlt observed nowadays. Al- ternatively, the possibility exists that our village-focused sampling method for TwMtA might have biased our results towards underestimation of the actual diversity present in these groups. Similarly, we cannot exclude that very recent migration to main urban centers from where most of our sampling comes from must have contributed toward the greater diversity observed in the Philippines and western Indonesia. However, to our credit, the scattered location of TwMtA groups observed in the MDS plot (Figure 5 and Additional file 1: Figure 3), which strongly suggests fast genetic drift, matches similar patterns reported elsewhere for Taiwan and also for Melanesia [67,71,72]. Furthermore, a similar scattered pattern was also observed with polymorphisms in HLA loci [73], see Figure 3 in it), as well as with a classical marker [74], thus further supporting the idea that the genetic history of TwMtA was characterized by drift occurring in small isolated groups. In a recent study of the diversity of Taiwan mtDNA complete genomes, Ko et al. [75] observed a decreasing pattern of diversity in TwMtA populations from north to south, although the significance of the decrease was not reported. A similar tendency for diversity to decrease towards southern Taiwan was also found for the classical GM genetic polymorphism but did not reach statistical significance [74]. Interestingly, for the non-recombining Y chromosome, we find here an opposite pattern in that gene diversity increases from north to south in a significant fashion (Additional file 1: Figure S3, first plot, P = 0.0142). This increase is concomitant with a tendency for decreasing frequency of haplogroup O1a1* − P203 and increasing Y-STR diversity in this haplogroup, but both these patterns are statistically not significant (Additional file 1: Figure S3). Although the contrasting results between mtDNA and autosomal markers, on one hand, and the Y chromosome on the other are for the time being inconclu- sive, they nevertheless could hint to a differentiated demographic history of men and women in TwMtA. Haplogroup O1a1*-P203 was seen at high frequencies, ranging from 40% to 60% in all southern and eastern TwMtA (Paiwan, Puyuma, Paiwan, Rukai, Amis and Yami), and was above 87% in all northern and central mountain tribes (Atayal, Taroko, Saisiyat, Thao and Tsou). Conversely, O1a1*-P203 rarely exceeds 20% in the Philippines and Indonesia (Figures 2 and 3, and Additional file 1: Table S2 and Additional file 1: Figure S2), but it has been reported as common in south and east China where it most likely originated [42]. We note also that the para-haplogroup O1a1a-M101 was not observed in our dataset covering ISEA, and thus extend in this way knowledge from previous reports indicating that it must be rare in most regions of continental East Asia [30,42,62,65]. With respect to O1a1*-P203, it is possible that this haplogroup reached Indonesia through gene flow from the mainland, via the Indochinese peninsula. We note however that O1a1*-P203 is uncommon in Indochina. Moreover, the presence of O1a1*-P203 among the Korean [76], the Han from Fujian and TwHan (Additional file 1: Table S2) rather argue for a common origin in an ancient dispersal from the mainland (Figure 3). A second, major O-clade haplogroup observed in Taiwan is O1a2-M50, where it is frequent in several Aborigine populations, especially so among southern TwMtA, but rather rare in the Han populations of the island. As this haplogroup has been reported as frequent (from 3% to 25%) among some Daic-speaking groups from southern China and Hainan, it was suggested that its expansion throughout ISEA followed an OOT model of migration reaching sequentially, Taiwan, the Philippines and Indonesia [26,65]. Our results support this hypothesis by demon- strating a decrease in frequency and molecular variation of O1a2-M50 from Taiwan to the Philippines and Indonesia (Figure 4, Table 4 and Additional file 1: Table S2). The pattern of STR variation is compatible with a late Paleolithic origin of this haplogroup in ...
Context 15
... observed in the Solomon Islands or Madagascar. Eleven O3 lineages, out of the 19 described by Karafet et al. [8,26,39] were observed in this study (Figure 2 and Additional file 1: Table S2). The most widely distributed sub-haplogroups of O3 in our dataset are O3a1c*-IMS- JST002611, O3a2*-P201, O3a2b*-M7 and O3a2c*-P164. A rather high prevalence of haplogroup O3a1c*-IMS- JST002611 is observed in Fujian Han (~25%) and in the Taiwanese Minnan and Hakka (13% and 21%), as well as among the TwPlt (12%), whereas it occurs at an extremely low frequency in TwMtA (it was only observed in the Yami, at 3%), in the Philippines (0.8%) and in Indonesia (2%) (Figure 2 and Additional file 1: Table S2). Its high frequency among the Ivatan (25%) of the Batan archipelago north of the Philippines is however associated with a low age estimate (0.9 Kya) (Table 2 and Additional file 1: Table S3). The MJ network of this haplogroup is characterized by the presence of Han Y-chromosomes at most founding nodes (Figure 4). Accordingly, a location of high frequency of haplogroup O3a1c*-IMS-JST002611 emerges from the contour map, centered in the southeastern Chinese coast, with no clear clinal pattern of frequency variation across SEA (Figure 3). The contour map of O3a1c*-IMS-JST002611 diversity displays a high peak in Indonesia, but this is due to a few (only 4), molecularly distant STR-haplotypes which translates to an older but less precise age estimate (28.47 ± 13.76 Kya, Table 2 and Additional file 1: Table S3). Other commonly observed haplogroups in the O3 series are the paragroups of the O3a2*-P201 branch (Figure 2 and Additional file 1: Table S2). Note that mutation P201 has not been tested in the Daic-speaking groups of China and Hainan [27], in which the resolution of O3 * included O3*, O3a*, O3a1*, O3a1b, O3a1c, O3a2*, O3a2a, O3a2c*, O3a3 and O3a4 [42]. Here O3a2*-P201 did not exceed 2% in any pooled population group of our data set, and reached 4.35% in the Puyuma. These low frequencies and the high age estimates seen in Taiwan (i.e. Puyuma) (15 ± 3 Kya) and Western Indonesia (17 Kya ± 2 Kya; Additional file 1: Table S3) hint late to recent gene flow. The Kriging contour maps suggest two paths linking the patterns of diversity of O3a2*-P201, namely from mainland SEA towards Taiwan and from mainland SEA to Indonesia through the Indochinese peninsula (Figure 3). However, these latter results must be viewed with caution given the general low frequency of O3a2*-P201 over the whole region. Haplogroup O3a2b*-M7 is uncommon in the Fujian Han (less than 2%), in Taiwan (only observed at low frequencies in TwHan and TwPlt, not observed among TwMtA) and the Philippines, but was more frequent in Indonesia (8.54%) (Figure 2 and Additional file 1: Table S2). In this study we observe it also in Indochina (12.5% and 6.7% in Vietnam and Thailand, respectively), as well as in Malaysia (although this latter region is represented by a sample of only 8 Y chromosomes). Age estimates of O3a2b*-M7 diversity for Indonesia (22.18 ± 6.27 Kya) and TwPlt (16.56 ± 4.67) are somewhat older than that for Thailand (12.42 ± 3.45) (Additional file 1: Table S3). However, as already noted, these estimates are based on the assumption that no diversity is introduced in a clade by gene flow between populations. However, the scattered and generally derived location of Indonesian, TwPlt and Filipino haplotypes in the O3a2b*-M7 MJ network is consistent with repeated introductions of these haplotypes by gene flow from the mainland into the islands (Figure 4). Haplogroup O3a2c*-P164 is frequently observed among TwMtA on the east coast of Taiwan (Amis 35.9%, Puyuma 13%, Taroko 5%), in the Philippines (from 8% to 50%) and in western Indonesia (17.8%, Kalimantan 20%, and Sulawesi 11.8%), while it is rather uncommon in Indochina and among the Han (Figures 2 and 3, and Additional file 1: Table S2). Interestingly, the star-like MJ network of O3a2c*-P164 shows core nodes mostly comprised of Han, Korean/Japanese and Indochinese groups (Figure 3) and radiating sectors mainly comprised of either Korean/ Japanese and Tibet, or Indochinese, or delineating TwMtA. This structure supports the dispersal paths put forward in the contour map (Figure 4). In turn, derived haplogroups O3a2c1*-M134 and O3a2c1a-M133 are rarely seen among TwMtA. They often occur together and are more frequent among Han and TwPlt as well as in Sumatra, Borneo and the Visayas in the Philippines. Age estimates of the diversity of these two haplogroups all point to the early Neolithic period (i.e. < 10,000 Kya), with the notable exception of estimates for the Han groups (Table 2 and Additional file 1: Table S3). Accordingly, the contour maps of O3a2c1*-M134 display a general location of higher frequency and diversity related to Han populations, in mainland SEA (Figure 3). Finally, haplogroups C, D, F*, H, K*, N, P*, Q and R haplogroups were observed at low frequency in the region, with patchy distributions (Figure 2 and Additional file 1: Table S2). The plot of the multidimensional scaling (MDS) analysis of pairwise Reynolds genetic distances obtained from high definition Y-SNP haplogroup frequencies in our dataset is shown in Figure 5. Most population samples are located in the center of the plot, with Austronesian speaking groups from Indonesia and the Philippines surrounded by populations from Vietnam, Thailand and Malaysia (differentiating towards the upper-left part of the plot), most TwMtA differentiating towards the bottom- right, and the Han, both from the mainland and from Taiwan, differentiating towards the upper-left along the x axis. The heavily sinisized TwPlt surround Han populations with Yulin and Papora on the left part of the Han, and the Pazeh and Siraya (from Tainan, Pingtung, and Hwalian) getting close to southern TwMtA and most likely representing less sinisized groups than other TwPlt. A relationship with the frequencies of O clades in these populations (Additional file 1: Figures S1 and S2) is evidenced by the fact that O2 haplogroups prevail in Indochinese populations located in the upper-left quad- rant of the MDS plot, whereas O1 haplogroups become more and more frequent in Indonesian, Filipino, south TwMtA and north TwMtA populations, which differentiate towards the bottom of the plot. Bunun, Akha, Atayal and Tsou are clearly four distinct outliers in the plot, due to their particularly low diversity. Indeed, only 3 haplogroups were observed among 56 Bunun chromosomes, of which O1a2-M50 at 61% and O2a1a-M88 at 37.5%, and this translates in a gene diversity level of only 0.49 (Additional file 1: Table S2). Also, only 3 haplogroups were observed among 41 Tsou and 52 Atayal chromosomes, of which O1a1*-P203 at 90% (amongst the highest frequencies observed in ISEA for this haplogroup), with a gene diversity of 0.18 for the two groups. As for the Akha population, its differentiation from all other SEA populations is driven by the unique presence of haplogroup Q-M242 (56%), which was mainly unobserved in our dataset, except for a very low frequency (<1%) in TwHan. The axis of differentiation constituted by TwMtA in the lower-right part of the MDS plot is mostly driven by the northern tribal groups (Taroko, Thao, Saisiyat and Atayal) and underlines their low gene diversity ( h is 0.10, 0.23, 0.23 and 0.18, respectively) due to the high frequency of haplogroup O1a1*-P203 (Additional file 1: Figure S1 and Additional file 1: Table S2). In a second MDS analysis, we included earlier published data (Additional file 1: Table S1) to investigate the genetic relationships of populations represented in our dataset to other East Asian groups. The resulting 2-dimensional plot is provided in Additional file 1: Figure S4. The range and variation of Y chromosome diversity present in Taiwan is well exposed in this second MDS, which echoes the first one reported in Figure 5. Taiwanese tribes, starting first with the Amis, then the southern TwMtA tribes and fi- nally the northern TwMtA tribes spread away towards the lower-right part of the plot, from a lower central cluster principally comprising TwPlt, the Philippines, Western Indonesia, some Han from southern China and a few Daic-speaking groups. The latter also form a second axis of differentiation including mainly Daic-speaking groups from China, as well as Indochinese populations that spreads towards the lower right-part of the plot. With respect to Figure 5, the outlier location of the Bunun is again indicative of a high level of genetic drift in this population, consistent with its low level of gene diversity. Interestingly enough, however, the position of the Bunun in the MDS, at midway between the axis of differentiation of TwMtA and that of the Daic-speaking groups from southern China, is explained by the high frequencies of haplogroups O1a2- M50 (67%, the highest frequency seen in Taiwan) and O2a1a-M88 (37%), both of which are commonly found among Daic populations in south China [27]. Contribution from two putative parental groups, namely “ ancestral Han ” and “ ancestral Austronesians ” , to a hybrid population, respectively, TwMtA, TwPlt, Filipinos and Indonesians, estimated by using Admix 2.0 and through the STR lineage sharing method (LS) are reported in Table 3. Frequencies of Y-STR lineages for Admix and LS were obtained in the background of their respective Y-SNP haplogroup. Results between Admix and LS (Table 3) correlate well and suggest that all putatively hybrid groups but the TwPlt received an important contribution from “ ancestral Austronesian ” speaking populations. In turn, TwPlt Y-SNP haplogroups are predominantly shared with Han populations (Figure 2 and Additional file 1: Table S2) and expectedly the “ ancestral Han ” contribution estimated with Admix and LS is high (62% and 79% respectively). However, the effective relative frequency of parental Y-STR haplotypes contributed by each parental group, as estimated through the LS ...
Context 16
... decrease towards southern Taiwan was also found for the classical GM genetic polymorphism but did not reach statistical significance [74]. Interestingly, for the non-recombining Y chromosome, we find here an opposite pattern in that gene diversity increases from north to south in a significant fashion (Additional file 1: Figure S3, first plot, P = 0.0142). This increase is concomitant with a tendency for decreasing frequency of haplogroup O1a1* − P203 and increasing Y-STR diversity in this haplogroup, but both these patterns are statistically not significant (Additional file 1: Figure S3). Although the contrasting results between mtDNA and autosomal markers, on one hand, and the Y chromosome on the other are for the time being inconclu- sive, they nevertheless could hint to a differentiated demographic history of men and women in TwMtA. Haplogroup O1a1*-P203 was seen at high frequencies, ranging from 40% to 60% in all southern and eastern TwMtA (Paiwan, Puyuma, Paiwan, Rukai, Amis and Yami), and was above 87% in all northern and central mountain tribes (Atayal, Taroko, Saisiyat, Thao and Tsou). Conversely, O1a1*-P203 rarely exceeds 20% in the Philippines and Indonesia (Figures 2 and 3, and Additional file 1: Table S2 and Additional file 1: Figure S2), but it has been reported as common in south and east China where it most likely originated [42]. We note also that the para-haplogroup O1a1a-M101 was not observed in our dataset covering ISEA, and thus extend in this way knowledge from previous reports indicating that it must be rare in most regions of continental East Asia [30,42,62,65]. With respect to O1a1*-P203, it is possible that this haplogroup reached Indonesia through gene flow from the mainland, via the Indochinese peninsula. We note however that O1a1*-P203 is uncommon in Indochina. Moreover, the presence of O1a1*-P203 among the Korean [76], the Han from Fujian and TwHan (Additional file 1: Table S2) rather argue for a common origin in an ancient dispersal from the mainland (Figure 3). A second, major O-clade haplogroup observed in Taiwan is O1a2-M50, where it is frequent in several Aborigine populations, especially so among southern TwMtA, but rather rare in the Han populations of the island. As this haplogroup has been reported as frequent (from 3% to 25%) among some Daic-speaking groups from southern China and Hainan, it was suggested that its expansion throughout ISEA followed an OOT model of migration reaching sequentially, Taiwan, the Philippines and Indonesia [26,65]. Our results support this hypothesis by demon- strating a decrease in frequency and molecular variation of O1a2-M50 from Taiwan to the Philippines and Indonesia (Figure 4, Table 4 and Additional file 1: Table S2). The pattern of STR variation is compatible with a late Paleolithic origin of this haplogroup in SEA with migration and expansion in Taiwan (~17 Kya), followed by pre-Holocene passages to the Philippines (~12 Kya) and Indonesia (~7 Kya) (Figure 3). A remarkable result emerging from our dataset is the high frequency of the O2 clade, more specifically of haplogroup O2a1a-M88, found in the Bunun (37.5%, Additional file 1: Table S2). Except for the presence of O2a1a-M88, as well as O2a1*-M95 in the Yami, this is the only detected occurrence of the O2-clade in a TwMtA group. Of course, the absence of O2 in a sample does not exclude its presence in the population but it does suggest that it is likely less frequent. We observe that the Bunun share their most common Y-STR haplotype of O2a1a-M88 with the Daic- speaking groups and with the Indochinese and Indonesian populations, while they share none of the haplotypes found among Solomon islanders, TwPlt and Han (Figure 4). Actually, the Bunun STR haplotypes do not belong to the same branch as the Solomon Islands or Madagascar [7,66,67]. To our opinion, these results, together with the observed scarceness of the O2 clade among TwMtA and the Philippines, preclude the TwMtA as a plausible contender for the paternal origin of O2 haplogroups in Madagascar and the Solomon Islands, although the possibility that the O2 clade was lost by drift in most Austronesians from Taiwan and in the Philippines remains. Haplogroup O3 constitutes the molecularly most diversified clade observed in continental East Asia [7,8, 26-31,40,68]. The introduction of new Y-SNP markers for better assignment of O3 subtypes allowed us to demonstrate several characteristics that were not seen before [8,43,44]. Haplogroup O3a1c*-IMS-JST002611 has been reported in Japan, Korea, Tibet, south China and Indochina [62,65,76,77]. Its high prevalence observed in this study in Fujian Han and Taiwan Minnan and Hakka (14% to 26%), but very low occurrence among TwMtA (Yami, 3%) and in the Philippines (0.8%), likely represents a signature of Han-mediated gene flow. Indeed, the lack of a clear clinal pattern of frequency variation of this haplogroup between regions, the low frequency but high diversity observed in Indonesia (Figure 2 and Table 2) and the rather faint star-like shape of the MJ network showing numerous long branches and low frequency nodes (Figure 4) all suggest a recent spread of O3a1c* from mainland SEA. Thus the O3a1c* high frequency observed for the Ivatan (25%) of the Batan archipelago, north of the Philippines, associated to a low age estimate (0.9 Kya, Table 2 and Additional file 1: Table S3), is consistent with a recent introduction of this haplogroup by gene flow with TwHan or TwPlt. The presence of Han lineages at most nodes of the star-like MJ network supports this hypothesis (Figure 4). Both the frequency distribution of haplogroup O3a2b*- M7 (which is present at low frequencies in Taiwan, among the Han and TwPlt) and its MJ network are consistent with repeated introductions of this haplogroup by gene flow from mainland Southeast Asia into the islands (Figure 4). As already stated, we observe this haplogroup at frequencies of 7% or higher in mainland SEA. Previous studies also reported the presence of this haplogroup in SEA, in Yun- nan, Fujian, and among the She and Yao ethnic groups [30,40]. On another hand, haplogroup O3a2c*-P164, whose origins also likely trace back to mainland SEA, is quite frequent in Taiwan (especially so among the Amis), whereas the derived O3a2c1*-M134 and O3a2c1a-M133 are rather rare among TwMtA. In our dataset we observed that they often occurred together (Additional file 1: Table S2), and given the recent dates estimated for their STR di- versification (~8 to 6 ± 4 Kya ago) (Figure 3, Table 2 and Additional file 1: Table S2), it is likely that their spread southwards to Vietnam, Laos and Thailand, as well as to Taiwan, the Philippines and Indonesia, concomitant with a northward spread to Japan [26,77], occurred during Neolithic times. It has been proposed that, after the initial colonization approximately 50 Kya of SEA and Indochina by modern humans bringing along haplogroups C, M and S, the origin and spread of most haplogroups seen today in east Asia, Taiwan and ISEA could be retraced according to four stages (A to D) of a paternal demographic model [26]. Stages B, C and D correspond to gene dispersals tak- ing their origin in mainland Southeast Asia or Indochina and whose directions of flow form a pincer model, with a northern branch spreading through Taiwan and a southern branch through Indonesia. In this pincer model, the Philippines appear as a confluent region of genes acquired separately from Taiwan or Indonesia and more recently from the Asian mainland. The patterns of genetic diversity observed in this study are consistent with this scenario in that several haplogroups displaying a cline with lower Y-STR diversity over Western Indonesia (i.e. O1a*-M119, O1a1*-P203 and O1a2-M50) or lower diversity over Taiwan (O2a1*-M95) show diversity higher than expected in the Philippines, as attested by the corresponding age estimates (Table 2 and Additional file 1: Table S3). The northern branch postulated by the pincer model would have first reached Taiwan, thus introducing in the island haplogroups O1a*-M119, O1a1*-P203 and O1a2- M50. Further, these haplogroups also display a decreasing cline in frequency from Taiwan towards the Philippines and Western Indonesia (Figure 3). The MJ networks associated with the O1 clade are the only ones in which Y-STR haplotypes of TwMtA are observed among the f0 founder nodes (Figure 4). Along with the very low frequency of non-O1 clades among TwMtA, these results suggest that haplogroups O1a1 and O1a2 represent the earliest traces of the Austronesian-agriculturist dispersal to Taiwan. Furthermore, TwMtA haplotypes of other haplogroups (O2-P31/O2a-Pk4/O2a1-M95; O2a1a- M88; O3a2b-M7 and O3a2c-P164) were rarely seen among the f1 founder nodes, consistent with the hypothesis that later direct east – west sea passages occurred between mainland East Asia and Taiwan. Thus, the Y-SNP haplogroups frequency distributions and their STR diversity observed today in TwMtA populations give support to the northern branch of the pincer model (the Taiwanese branch). This Austronesian-agriculturist dispersal most likely expanded principally within the boundaries of present-day Taiwan and the Philippines before reaching Western Indonesia which was already populated by indigenous hunter – gatherers, possibly of Asiatic origin [64]. Populations that first went south from Southeast Asia along or from the Indochinese peninsula, Malaysia, western Indonesia (Sumatra, Java, and Borneo) and the Philippines represent the southern branch of the pincer model. This dispersal would include haplogroups O1a1*- P203, O2a1-M95/M88, O3a*-M324, O3a1a-M121, O3a1c*- IMS-JST002611, O3a2*-P201 , O3a2a-M159, O3a2b*-M7, O3a2c*-P164 and O3a2c1a-M133 [14,26,27,67,78]. For most of these haplogroups, it is currently held that they first expanded and diversified within the boundaries of present-day southeast China, Indochina and Indonesia, and they are considered as involved in a ...
Context 17
... the contour maps of O3a2c1*-M134 display a general location of higher frequency and diversity related to Han populations, in mainland SEA (Figure 3). Finally, haplogroups C, D, F*, H, K*, N, P*, Q and R haplogroups were observed at low frequency in the region, with patchy distributions (Figure 2 and Additional file 1: Table S2). The plot of the multidimensional scaling (MDS) analysis of pairwise Reynolds genetic distances obtained from high definition Y-SNP haplogroup frequencies in our dataset is shown in Figure 5. Most population samples are located in the center of the plot, with Austronesian speaking groups from Indonesia and the Philippines surrounded by populations from Vietnam, Thailand and Malaysia (differentiating towards the upper-left part of the plot), most TwMtA differentiating towards the bottom- right, and the Han, both from the mainland and from Taiwan, differentiating towards the upper-left along the x axis. The heavily sinisized TwPlt surround Han populations with Yulin and Papora on the left part of the Han, and the Pazeh and Siraya (from Tainan, Pingtung, and Hwalian) getting close to southern TwMtA and most likely representing less sinisized groups than other TwPlt. A relationship with the frequencies of O clades in these populations (Additional file 1: Figures S1 and S2) is evidenced by the fact that O2 haplogroups prevail in Indochinese populations located in the upper-left quad- rant of the MDS plot, whereas O1 haplogroups become more and more frequent in Indonesian, Filipino, south TwMtA and north TwMtA populations, which differentiate towards the bottom of the plot. Bunun, Akha, Atayal and Tsou are clearly four distinct outliers in the plot, due to their particularly low diversity. Indeed, only 3 haplogroups were observed among 56 Bunun chromosomes, of which O1a2-M50 at 61% and O2a1a-M88 at 37.5%, and this translates in a gene diversity level of only 0.49 (Additional file 1: Table S2). Also, only 3 haplogroups were observed among 41 Tsou and 52 Atayal chromosomes, of which O1a1*-P203 at 90% (amongst the highest frequencies observed in ISEA for this haplogroup), with a gene diversity of 0.18 for the two groups. As for the Akha population, its differentiation from all other SEA populations is driven by the unique presence of haplogroup Q-M242 (56%), which was mainly unobserved in our dataset, except for a very low frequency (<1%) in TwHan. The axis of differentiation constituted by TwMtA in the lower-right part of the MDS plot is mostly driven by the northern tribal groups (Taroko, Thao, Saisiyat and Atayal) and underlines their low gene diversity ( h is 0.10, 0.23, 0.23 and 0.18, respectively) due to the high frequency of haplogroup O1a1*-P203 (Additional file 1: Figure S1 and Additional file 1: Table S2). In a second MDS analysis, we included earlier published data (Additional file 1: Table S1) to investigate the genetic relationships of populations represented in our dataset to other East Asian groups. The resulting 2-dimensional plot is provided in Additional file 1: Figure S4. The range and variation of Y chromosome diversity present in Taiwan is well exposed in this second MDS, which echoes the first one reported in Figure 5. Taiwanese tribes, starting first with the Amis, then the southern TwMtA tribes and fi- nally the northern TwMtA tribes spread away towards the lower-right part of the plot, from a lower central cluster principally comprising TwPlt, the Philippines, Western Indonesia, some Han from southern China and a few Daic-speaking groups. The latter also form a second axis of differentiation including mainly Daic-speaking groups from China, as well as Indochinese populations that spreads towards the lower right-part of the plot. With respect to Figure 5, the outlier location of the Bunun is again indicative of a high level of genetic drift in this population, consistent with its low level of gene diversity. Interestingly enough, however, the position of the Bunun in the MDS, at midway between the axis of differentiation of TwMtA and that of the Daic-speaking groups from southern China, is explained by the high frequencies of haplogroups O1a2- M50 (67%, the highest frequency seen in Taiwan) and O2a1a-M88 (37%), both of which are commonly found among Daic populations in south China [27]. Contribution from two putative parental groups, namely “ ancestral Han ” and “ ancestral Austronesians ” , to a hybrid population, respectively, TwMtA, TwPlt, Filipinos and Indonesians, estimated by using Admix 2.0 and through the STR lineage sharing method (LS) are reported in Table 3. Frequencies of Y-STR lineages for Admix and LS were obtained in the background of their respective Y-SNP haplogroup. Results between Admix and LS (Table 3) correlate well and suggest that all putatively hybrid groups but the TwPlt received an important contribution from “ ancestral Austronesian ” speaking populations. In turn, TwPlt Y-SNP haplogroups are predominantly shared with Han populations (Figure 2 and Additional file 1: Table S2) and expectedly the “ ancestral Han ” contribution estimated with Admix and LS is high (62% and 79% respectively). However, the effective relative frequency of parental Y-STR haplotypes contributed by each parental group, as estimated through the LS method (shown in brackets in Table 3), reached a lower level than anticipated, with only 12% of TwPlt Y-STR variation attributed to the “ ancestral Han ” gene pool and only 3% to the “ ancestral Austronesian ” . This indicates that 79% (i.e. 44/ 209) of the remaining Y-STR variation is unique to the TwPlt. Such a large amount of unshared variation could only have been acquired after a long period of settlement in isolation from other groups, much longer than 400 years, date at which Han Chinese (Minnan and Hakka) migrated to Taiwan from southeast China [1]. Actually, on the basis of MJ networks constructed using only the unshared haplotypes showing continuity in the background of their respective haplogroups we tentatively estimated this period to ~3 to 8 Kya. Analysis of molecular variance (AMOVA) was performed first using Y-SNP haplogroups and then Y-STRs haplotypes (Table 4). The TwMtA group shows the lowest SNP variance within populations (70.3%), consistent with the low gene diversity observed in this group of populations, ranging from 0.10 to 0.7, or averaging to 0.60 ± 0.02 over- all (Additional file 1: Table S2). Accordingly, the TwMtA also display the highest SNP variance due to differentiation between tribes (29.7%) thus explaining the scattered location of the Taiwan tribes observed in the MDS plot (Figure 5 and Additional file 1: Figure S4). In contrast to TwMtA, high SNP variation between individuals within populations is found for Western Indonesia (94.30%), the Philippines (98.10%) and TwPlt (96.24%), and the levels of differentiation among populations within groups are correspondingly low, ranging from non-significant for Filipino populations to less that 6% among western Indonesians, even though populations in Western Indonesia and the Philippines are broadly dispersed over many isolated regions of ISEA. On another other hand, the high variance seen within TwPlt populations (96.24%) was expected as they are heavily admixed groups. When testing genetic differentiation among four separate geographical groups, namely TwMtA, Philippines, Western Indonesia and Han (representing mainland China), or three distinct language family assortments (Formosan, Malayo-Polynesian and Sino-Tibetan), we observe that the variation between groups is large (18% and 18.55% respectively, P < 0.001) and did not show much difference between geographic or linguistic groupings. This pattern remains the same when including Indochina as a fifth geographic region or as a fourth linguistic group. Very little changes from the Y-SNP results were observed when using Y-STRs to perform AMOVA computations on the individual groups of populations, with the variance due to differences between individuals within populations being extremely high for TwPlt, Philippines and Western Indonesia (above 96%) and much lower for TwMtA (68%). However, with Y-STRs we observe that the component of the variance due to differences between groups of populations, although significant, is always lower than that due to differences among populations within these groups, thus indicating that, contrarily to Y-SNPs, Y-STRs fail to detect a population structure associated with geography or linguistics. Our data, obtained through the genotyping of 81 high Y-SNP definition markers to determine the fine distribution of Y-chromosome haplogroups O in ISEA, revealed a high level of population structure in the region including Taiwan, the Philippines and Indonesia, mainly defined by variable distributions of haplogroups belonging to the O clade [9,27,70]. We also genotyped 17 Y-specific STR markers, in order to gain insights into the distribution of the Y-chromosome variation in this region of the world. Since only one population from the mainland east coast of China (Fujian Han) was analyzed, data from other populations of SEA were obtained from the literature (Additional file 1: Table S1). The O clade, to which belong 95% of Y-chromosomes in Taiwan, reflects an ancestral relationship to the early modern human settlement in East Asia [28-30] with haplogroup, O1 being mostly seen in Southeast China, Taiwan and ISEA, haplogroup O2a predominantly confined to southeast China and Indochina, and haplogroup O3 broadly present in mainland China. The Y-SNP gene diversity among TwMtA was found generally low (from 0.1 to 0.7, Additional file 1: Table S2). Except for the presence of haplogroups O2a1a-M88 predominantly seen in Bunun (37.5%) and O3a2c-P164 in Puyuma and Amis, the low diversity found in TwMtA is principally associated with the molecular variation of haplogroups within O1 (Figure 2 and Additional file 1: Table S2 and Additional file 1: Figure S1). In sharp contrast with ...

Citations

... Based on data from Trejaut et al. (2014), the O2a-M133 mutation dates to around nine thousand years for the Taiwanese aboriginals and thirteen thousand years for the Taiwanese Han. Additionally, Ning et al. (2016) date O2a-F114 to around eight thousand years. ...
Preprint
Full-text available
In the last five years the quest to explain the correlation between genetic and linguistic diversity has employed a methodology called palaeogenomic modeling. Such models were published in prestigious science journals including Nature. They have also been reported in mainstream media such as the BBC. Furthermore, the metrics data reflect that they are cited frequently in scholarly journals. These models, however, are flagrantly inconsistent with the archaeological record. They employ the wrong genetic marker and not enough data. I strongly believe that the palaeogenomic modeling “fad” will soon dissipate because of this these deficiencies. My research stands ready to yield desperately needed models of language prehistory that are highly reliable. Researchers will have, for the first time, a robust methodology for exploring the correlation between genetic and linguistic diversity.
... In populations from Polynesia and Remote Oceania, O3-M122, C2a-M208, S-M230, and M-M4 are the four most frequent paternal lineages[26,27].O1a-M119, O2a-M95, O3-M122, and K Ã-M9 are the predominant lineages in western Austronesians[25,27]. The haplogroup O3-M122 only represents a minor portion of the paternal gene pool of Taiwan's aborigines[28]. The general distribution of haplogroup O3-M122 in Austronesian populations suggests a strong founder effect in this lineage during the diffusion of Austronesian populations, especially in Remote Oceania[27]. According to previous-studies, the major components of haplogroup O3-M122 in Austronesian populations-were shown to be O3 Ã-M122+, M324+, P201+, P164+, 002611-, M7-, and M134[26][27][28][29][30][31][32][33][34][35]. ...
... The general distribution of haplogroup O3-M122 in Austronesian populations suggests a strong founder effect in this lineage during the diffusion of Austronesian populations, especially in Remote Oceania[27]. According to previous-studies, the major components of haplogroup O3-M122 in Austronesian populations-were shown to be O3 Ã-M122+, M324+, P201+, P164+, 002611-, M7-, and M134[26][27][28][29][30][31][32][33][34][35]. In this paper, we named this haplogroup as " O3a2b Ã-P164(xM134) ". ...
... However, no unique Y-chromosome polymorphic markers have been found for O3-M122 in Austronesian populations until 2015[36]. Haplogroup O3a2b Ã-P164(xM134) is one of the predominant paternal lineages in the Ami population from Taiwan and most Austronesian populations from Remote Oceania[28,31]. Hence, we can conclude that haplogroup O3a2b Ã-P164(xM134) is an important lineage to understand the origin of Austronesian populations. However, few Y-chromosome single nucleotide polymorphisms (Y-SNP) have been determined for this haplogroup and the internal structure of this haplogroup remains unknown. ...
Article
Full-text available
Austronesian diffusion is considered one of the greatest dispersals in human history; it led to the peopling of an extremely vast region, ranging from Madagascar in the Indian Ocean to Easter Island in Remote Oceania. The Y-chromosome haplogroup O3a2b*-P164(xM134), a predominant paternal lineage of Austronesian populations, is found at high frequencies in Polynesian populations. However, the internal phylogeny of this haplogroup remains poorly investigated. In this study, we analyzed -seventeen Y-chromosome sequences of haplogroup O3a2b*-P164(xM134) and generated a revised phylogenetic tree of this lineage based on 310 non-private Y-chromosome polymorphisms. We discovered that all available O3a2b*-P164(xM134) samples belong to the newly defined haplogroup O3a2b2-N6 and samples from Austronesian populations belong to the sublineage O3a2b2a2-F706. Additionally, we genotyped a series of Y-chromosome polymorphisms in a large collection of samples from China. We confirmed that the sublineage O3a2b2a2b-B451 is unique to Austronesian populations. We found that O3a2b2-N6 samples are widely distributed on the eastern coastal regions of Asia, from Korea to Vietnam. Furthermore, we propose- that the O3a2b2a2b-B451 lineage represents a genetic connection between ancestors of Austronesian populations and ancient populations in North China, where foxtail millet was domesticated about 11,000 years ago. The large number of newly defined Y-chromosome polymorphisms and the revised phylogenetic tree of O3a2b2-N6 will be helpful to explore the origin of proto-Austronesians and the early diffusion process of Austronesian populations.
... Additionally, it is interesting to observe how much admixture took place over the past 100 years among Han Chinese and other groups. Y-chromosomal short tandem repeats (Y-STR) is a useful tool for inferring genetic genealogy evolution [4] and ancient human migration trajectories and timing [5, 6] . The nonrecombinant region of the Y-chromosome may play a potential role in revealing the ethnic and regional representation of the Han Chinese population owing to its significant phylogeographic information content [7, 8]. ...
Article
Full-text available
In the present study, we investigated the genetic characteristics of 25 Y-chromosomal and 15 autosomal short tandem repeat (STR) loci in 305 unrelated Han Chinese male individuals from Liaoning Province using AmpFISTR® Yfiler® Plus and IdentifilerTM PCR amplification kits. Population comparison was performed between Liaoning Han population and different ethnic groups to better understand the genetic background of the Liaoning Han population. For Y-STR loci, the overall haplotype diversity was 0.9997 and the discrimination capacity was 0.9607. Gene diversity values ranged from 0.4525 (DYS391) to 0.9617 (DYS385). Rst and two multi-dimensional scaling plots showed that minor differences were observed when the Liaoning Han population was compared to the Jilin Han Chinese, Beijing Han Chinese, Liaoning Manchu, Liaoning Mongolian, Liaoning Xibe, Shandong Han Chinese, Jiangsu Han Chinese, Anhui Han Chinese, Guizhou Han Chinese and Liaoning Hui populations; by contrast, major differences were observed when the Shanxi Han Chinese, Yunnan Bai, Jiangxi Han Chinese, Guangdong Han Chinese, Liaoning Korean, Hunan Tujia, Guangxi Zhuang, Gansu Tibetan, Xishuangbanna Dai, South Korean, Japanese and Hunan Miao populations. For autosomal STR loci, DP ranged from 0.9621 (D2S1338) to 0.8177 (TPOX), with PE distributing from 0.7521 (D18S51) to 0.2988 (TH01). A population comparison was performed and no statistically significant differences were detected at any STR loci between Liaoning Han, China Dong, and Shaanxi Han populations. The results showed that the 25 Y-STR and 15 autosomal STR loci in the Liaoning Han population were valuable for forensic applications and human genetics, and Liaoning Han was an independent endogenous ethnicity with a unique subpopulation structure.
... This only serves to confound the already complicated task of reconstructing the colonization routes and timing of the dispersal of the earliest AMH across Southeast Asia and Australasia as well as their possible relationships to recent populations. Still, with recent advances in dating methods, several new field discoveries, the reexamination of existing but poorly characterized remains and genetic investigations of contemporary populations (e.g.,Capelli et al., 2001;Karafet et al., 2001;Hill et al., 2007;Soares et al., 2008Soares et al., , 2016Tumonggor et al., 2013;Trejaut et al., 2014) it is becoming clear that the earliest AMH settled East Asia by at least 45 ka (Mijares et al., 2010;Demeter et al., 2012;Barker et al., 2013;Hunt and Barker, 2014), and probably in the range of 80–120 ka (Hill et al., 2007;Soares et al., 2008Soares et al., , 2016Bae et al., 2014;Liu et al., 2015;Curnoe et al., 2016). Superimposed on the Late Pleistocene history of the region are more recent prehistoric migrations by agriculturalists such as Sino-Tibetan, Tai and Austroasiatic speaking people into mainland Southeast Asia and Austronesian speakers across oceanic Southeast Asia (Bellwood, 1997). ...
... However, as no evidence has been found for the Negrito occupation of Borneo, the former of hypothesis seems a more plausible explanation for its affinities at present.Jinam et al., 2012;Aghakhanian et al., 2015). Moreover, while the Austronesian speaking Indigenous people of Borneo today are suggested to have descended from agriculturalists that settled the area from Taiwan only 4–3 kyr (Bellwood, 1997;Xu et al., 2012), this has been challenged by a wide range of genetic studies covering mtDNA, Y-chromosome and other genomic markers (Capelli et al., 2001;Karafet et al., 2001;Hill et al., 2007;Soares et al., 2008Soares et al., , 2016Tumonggor et al., 2013;Trejaut et al., 2014). Instead, it seems more likely that Austronesian speakers from Taiwan and island Southeast Asia share a common origin going back to the Late Pleistocene with only a limited signal of the " Out-of-Taiwan " expansion during the Neolithic period. ...
Article
Full-text available
The Deep Skull from Niah Cave in Sarawak (Malaysia) is the oldest anatomically modern human recovered from island Southeast Asia. For more than 50 years its relevance to tracing the prehistory of the region has been controversial. The most widely held view, originating with Brothwell's 1960 description and analysis, is that the Niah individual is related to Indigenous Australians. Here we undertake a new assessment of the Deep Skull and consider its bearing on this question. In doing so, we provide a new and comprehensive description of the cranium including a reassessment of its ontogenetic age, sex, morphology, and affinities. We conclude that this individual was most likely to have been of advanced age and female, rather than an adolescent male as originally proposed. The morphological evidence strongly suggests that the Deep Skull samples the earliest modern humans to have settled Borneo, most likely originating on mainland East Asia. We also show that the affinities of the specimen are most likely to be with the contemporary indigenous people of Borneo, although, similarities to the population sometimes referred to as Philippine Negritos cannot be excluded. Finally, our research suggests that the widely supported “two-layer” hypothesis for the Pleistocene peopling of East/Southeast Asia is unlikely to apply to the earliest inhabitants of Borneo, in-line with the picture emerging from genetic studies of the contemporary people from the region.
... Significantly, they suggest that only few paternal lineages are associated with the Austronesian dispersal, and that the other major lineages date to earlier population movements. These results have been corroborated in other recent studies (Trejaut et al. 2014; Soares et al. 2016). In terms of mtDNA, although studies showed the existence of mtDNA lineages shared between Austronesian speakers of Formosan , Filipino and other ISEA populations (Trejaut et al. 2005; Tabbada et al. 2010), many have contradicted a demic " out-of-Taiwan " expansion due to the time frame (Trejaut et al. 2005; Hill et al. 2007; Soares et al. 2008 Soares et al. , 2016 ). ...
Article
Full-text available
In the original article, one of the co-author’s (Ken Khong Eng) given name has been published incorrectly. The correct given name should be Ken Khong. The original article has been corrected.
... Significantly, they suggest that only few paternal lineages are associated with the Austronesian dispersal, and that the other major lineages date to earlier population movements. These results have been corroborated in other recent studies (Trejaut et al. 2014; Soares et al. 2016). In terms of mtDNA, although studies showed the existence of mtDNA lineages shared between Austronesian speakers of Formosan , Filipino and other ISEA populations (Trejaut et al. 2005; Tabbada et al. 2010), many have contradicted a demic " out-of-Taiwan " expansion due to the time frame (Trejaut et al. 2005; Hill et al. 2007; Soares et al. 2008 Soares et al. , 2016 ). ...
Article
Full-text available
There has been a long-standing debate concerning the extent to which the spread of Neolithic ceramics and Malay-Polynesian languages in Island Southeast Asia (ISEA) were coupled to an agriculturally-driven demic dispersal out of Taiwan 4000 years ago (4 ka). We previously addressed this question by using founder analysis of mitochondrial DNA (mtDNA) control-region sequences to identify major lineage clusters most likely to have dispersed from Taiwan into ISEA, proposing that the dispersal had a relatively minor impact on the extant genetic structure of ISEA, and that the role of agriculture in the expansion of the Austronesian languages was therefore likely to have been relatively minor. Here we test these conclusions by sequencing whole mtDNAs from across Taiwan and ISEA, using their higher chronological precision to resolve the overall proportion that participated in the “out-of-Taiwan” mid-Holocene dispersal as opposed to earlier, postglacial expansions in the Early Holocene. We show that, in total, about 20% of mtDNA lineages in the modern ISEA pool result from the “out-of-Taiwan” dispersal, with most of the remainder signifying earlier processes, mainly due to sea-level rises after the Last Glacial Maximum. Notably, we show that every one of these founder clusters previously entered Taiwan from China, 6-7 ka, where rice-farming originated, and remained distinct from the indigenous Taiwanese population after the subsequent dispersal into ISEA.
... But, for ISEA, too, the picture is far from consistent with an " out-of-Taiwan " demic expansion. The largest surveys consistently suggest a far more complex picture than the two-layer model (Capelli et al. 2001; Hill et al. 2007; Karafet et al. 2010; Trejaut et al. 2014; Tumonggor et al. 2013). Sea-level rises probably shaped much of the genetic structure of ISEA (Hill et al. 2007; Karafet et al. 2010), with major dispersals originating in what is now the mainland [including mtDNA haplogroup B4a1a (Soares et al. 2011)] as well as across what is now ISEA [including haplogroup E (Soares et al. 2008)]. ...
Article
Full-text available
There are two very different interpretations of the prehistory of Island Southeast Asia (ISEA), with genetic evidence invoked in support of both. The “out-of-Taiwan” model proposes a major Late Holocene expansion of Neolithic Austronesian speakers from Taiwan. An alternative, proposing that Late Glacial/postglacial sea-level rises triggered largely autochthonous dispersals, accounts for some otherwise enigmatic genetic patterns, but fails to explain the Austronesian language dispersal. Combining mitochondrial DNA (mtDNA), Y-chromosome and genome-wide data, we performed the most comprehensive analysis of the region to date, obtaining highly consistent results across all three systems and allowing us to reconcile the models. We infer a primarily common ancestry for Taiwan/ISEA populations established before the Neolithic, but also detected clear signals of two minor Late Holocene migrations, probably representing Neolithic input from both Mainland Southeast Asia and South China, via Taiwan. This latter may therefore have mediated the Austronesian language dispersal, implying small-scale migration and language shift rather than large-scale expansion. Electronic supplementary material The online version of this article (doi:10.1007/s00439-015-1620-z) contains supplementary material, which is available to authorized users.
Article
Human Y-chromosomes are characterized by nonrecombination and uniparental inheritance, carrying traces of human history evolution and admixture. Large-scale population-specific genomic sources based on advanced sequencing technologies have revolutionized our understanding of human Y chromosome diversity and its anthropological and forensic applications. Here, we reviewed and meta-analyzed the Y chromosome genetic diversity of modern and ancient people from China and summarized the patterns of founding lineages of spatiotemporally different populations associated with their origin, expansion, and admixture. We emphasized the strong association between our identified founding lineages and language-related human dispersal events correlated with the Sino-Tibetan, Altaic, and southern Chinese multiple-language families related to the Hmong-Mien, Tai-Kadai, Austronesian, and Austro-Asiatic languages. We subsequently summarize the recent advances in translational applications in forensic and anthropological science, including paternal biogeographical ancestry inference (PBGAI), surname investigation, and paternal history reconstruction. Whole-Y sequencing or high-resolution panels with high coverage of terminal Y chromosome lineages are essential for capturing the genomic diversity of ethnolinguistically diverse East Asians. Generally, we emphasized the importance of including more ethnolinguistically diverse, underrepresented modern and spatiotemporally different ancient East Asians in human genetic research for a comprehensive understanding of the paternal genetic landscape of East Asians with a detailed time series and for the reconstruction of a reference database in the PBGAI, even including new technology innovations of Telomere-to-Telomere (T2T) for new genetic variation discovery.
Article
Full-text available
China is located in East Asia. With a high genetic and cultural diversity, human migration in China has always been a hot topic of genetics research. To explore the origins and migration routes of Chinese males, 3333 Chinese individuals (Han, Hui, Mongolia, Yi and Kyrgyz) with 27 Y-STRs and 143 Y-SNPs from published literature were analysed. Our data showed that there are five dominant haplogroups (O2-M122, O1-F265, C-M130, N-M231, R-M207) in China. Combining analysis of haplogroup frequencies, geographical positions and time with the most recent common ancestor (TMRCA), we found that haplogroups C-M130, N-M231 and R1-M173 and O1a-M175 probably migrated into China via the northern route. Interestingly, we found that haplogroup C*-M130 in China may originate in South Asia, whereas the major subbranches C2a-L1373 and C2b-F1067 migrated from northern China. The results of BATWING showed that the common ancestry of Y haplogroup in China can be traced back to 17 000 years ago, which was concurrent with global temperature increases after the Last Glacial Maximum.
Article
Full-text available
We analysed the forensic characteristics and substructure of the Handan Han population based on 36 Y-STR (short tandem repeat) and Y-SNP (single nucleotide polymorphism) markers. The two most dominant haplogroups in Handan Han, O2a2b1a1a1-F8 (17.95%) and O2a2b1a2a1a (21.51%), and their abundant downstream branches, reflected the strong expansion of the precursor of the Hans in Handan. The present results enrich the forensic database and explore the genetic relationships between Handan Han and other neighbouring and/or linguistically close populations, which suggests that the current concise overview of the Han intricate substructure remains oversimplified.