ArticlePDF Available

Y-chromosome diversity suggests southern origin and Paleolithic backwave migration of Austro- Asiatic speakers from eastern Asia to the Indian subcontinent OPEN

October 2015
Scientific Reports 5(1):1-8

October 2015
5(1):1-8

DOI:10.1038/srep15486

License
CC BY 4.0

Authors:

Xiaoming Zhang

Kunming Institute of Zoology CAS

Shiyu Liao

Hong Kong Baptist University

Xuebin Qi

Kunming Institute of Zoology CAS

Show all 17 authorsHide

Analyses of an Asian-specific Y-chromosome lineage (O2a1-M95)—the dominant paternal lineage in Austro-Asiatic (AA) speaking populations, who are found on both sides of the Bay of Bengal—led to two competing hypothesis of this group's geographic origin and migratory routes. One hypothesis posits the origin of the AA speakers in India and an eastward dispersal to Southeast Asia, while the other places an origin in Southeast Asia with westward dispersal to India. Here, we collected samples of AA-speaking populations from mainland Southeast Asia (MSEA) and southern China, and genotyped 16 Y-STRs of 343 males who belong to the O2a1-M95 lineage. Combining our samples with previous data, we analyzed both the Y-chromosome and mtDNA diversities. We generated a comprehensive picture of the O2a1-M95 lineage in Asia. We demonstrated that the O2a1-M95 lineage originated in the southern East Asia among the Daic-speaking populations ~20–40 thousand years ago and then dispersed southward to Southeast Asia after the Last Glacial Maximum before moving westward to the Indian subcontinent. This migration resulted in the current distribution of this Y-chromosome lineage in the AA-speaking populations. Further analysis of mtDNA diversity showed a different pattern, supporting a previously proposed sex-biased admixture of the AA-speaking populations in India. There is a broad consensus that modern humans originated in Africa and then migrated to Asia along a coastal route by way of the Indian subcontinent as early as 60 thousand years ago (KYA) 1–7. However, the later dispersion of this ancestral population across Asia is far less clear. Linguistic analyses have grouped Asian populations across eight language families in eastern

Geographic locations of the studied populations in Asia that contain the O2a1-M95 lineage. Populations are color-coded based on their language families. The figure was modified from our previous report 43 using Microsoft Powerpoint 2011 (Microsoft Corporation, USA).

…

Frequency distribution, Uh diversity and phylogenetic structure of the O2a1-M95 lineages among Asian populations. Contour map shows the frequency (A) and Y-STRs Uh diversity (B) of lineage O2a1-M95 in Asia. Colored dots indicate the geographic locations of the analysed populations that correspond with Fig. 1; Fig. 1; Bars indicate the frequency and Uh diversity spectrum respectively. ( C ) Phylogenetic network of Y-STRs haplotypes among O2a1-M95 populations generated from the following 14 Y-STRs: DYS19, DYS389 I, DYS389II, DYS390, DYS391, DYS392, DYS393, DYS437, DYS438, DYS439, DYS448, DYS458, DYS635 and GATA H4; Circles size is proportional to the number of samples. The contour maps were generated using Surfer10 (Golden Software Inc., Golden, USA), and the network was constructed using the Network package 4.6.1.3 (www.fluxus-engineering.com).

…

Comparison of coalescence ages of the O2a1-M95 lineages among diffenent geographic populations. The age of each geographic or linguistic group was calculated by taking the average of respective populations from supplementary Table S3.

…

NJ-tree constructed of Y-STRs variations among different language family populations. Different linguistic families are shown using different colors. Branch length values are indicated above the branch.

…

Map of principal component analysis (PCA) among Asian populations. Populations of East Asia and South Asia were grouped respectively by geograpghic region and language family. AA and TB- speaking populations closely clustered with DR anf IE populations in the lower left. The first and the second components explain 15.25% and 7.10% of the genetic variance, respectively.

…

Figures - uploaded by Xiaoming Zhang

Content may be subject to copyright.

Content uploaded by Xiaoming Zhang

Content may be subject to copyright.

SCIENTIFIC RepoRts | 5:15486 | DOI: 10.1038/srep15486

www.nature.com/scientificreports

Y-chromosome diversity suggests

southern origin and Paleolithic

backwave migration of Austro-

Asiatic speakers from eastern Asia

to the Indian subcontinent

Xiaoming Zhang1,*, Shiyu Liao2,*, Xuebin Qi1,*, Jiewei Liu1,8, Jatupol Kampuansai5,

Hui Zhang1, Zhaohui Yang3,4, Bun Serey6, Tuot Sovannary6, Long Bunnath6, Hong Seang

Aun6, Ham Samnom7, Daoroong Kangwanpong5, Hong Shi3,4 & Bing Su1,4

Analyses of an Asian-specic Y-chromosome lineage (O2a1-M95)—the dominant paternal lineage in

Austro-Asiatic (AA) speaking populations, who are found on both sides of the Bay of Bengal—led to

two competing hypothesis of this group’s geographic origin and migratory routes. One hypothesis

posits the origin of the AA speakers in India and an eastward dispersal to Southeast Asia, while

the other places an origin in Southeast Asia with westward dispersal to India. Here, we collected

samples of AA-speaking populations from mainland Southeast Asia (MSEA) and southern China, and

genotyped 16 Y-STRs of 343 males who belong to the O2a1-M95 lineage. Combining our samples

with previous data, we analyzed both the Y-chromosome and mtDNA diversities. We generated a

comprehensive picture of the O2a1-M95 lineage in Asia. We demonstrated that the O2a1-M95 lineage

originated in the southern East Asia among the Daic-speaking populations ~20–40 thousand years

ago and then dispersed southward to Southeast Asia after the Last Glacial Maximum before moving

westward to the Indian subcontinent. This migration resulted in the current distribution of this

Y-chromosome lineage in the AA-speaking populations. Further analysis of mtDNA diversity showed

a dierent pattern, supporting a previously proposed sex-biased admixture of the AA-speaking

populations in India.

ere is a broad consensus that modern humans originated in Africa and then migrated to Asia along a

coastal route by way of the Indian subcontinent as early as 60 thousand years ago (KYA)1–7. However, the

later dispersion of this ancestral population across Asia is far less clear. Linguistic analyses have grouped

Asian populations across eight language families in eastern Asia and South Asia: Altaic, Sino-Tibetan

1State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of

Sciences, Kunming 650223, China. 2School of Life Sciences, Anhui University, Hefei 230039, China. 3Institute of

Primate Translational Medicine, Kunming University of Science and Technology, Kunming 650500, China. 4Yunnan

Key Laboratory of Primate Biomedical Research, Kunming 650500, China. 5Department of Biology, Faculty of

Science, Chiang Mai University, Chiang Mai 50200, Thailand. 6Department of Geography and Land Management,

Royal University of Phnom Penh, Phnom Penh 12000, Cambodia. 7Capacity Development Facilitator for Handicap

International Federation and Freelance Research, Battambang 02358, Cambodia. 8Kunming College of Life

Science, University of Chinese Academy of Sciences, Beijing 100101, China. *These authors contributed equally to

this work. Correspondence and requests for materials should be addressed to H.S. (email: shih@kmust.edu.cn) or

B.Su. (email: sub@mail.kiz.ac.cn)

Received: 18 May 2015

Accepted: 28 September 2015

Published: 20 October 2015

OPEN

www.nature.com/scientificreports/

SCIENTIFIC RepoRts | 5:15486 | DOI: 10.1038/srep15486

(ST, split into Han and Tibeto-Burman (TB) sub-branches), Daic, Hmong-Mien (HM), Austro-Asiatic

(AA), Austronesian (AU), Dravidian (DR) and Indo-European (IE). With wide distribution in mainland

China and Siberia, both Altaic and ST form two northern language families, DR and IE comprise the two

main language families of the Indian subcontinent, while Daic, HM, AA and AU make up the southern

language families that are primarily distributed in southern China and Southeast Asia.

Trying to use linguistic families to map out the origin and migration patterns of human populations

in Asia has resulted in far less consensus. For example, of the southern language families, AA has a some-

what unique geographic distribution, with a wide distribution not only in southern China and Southeast

Asia, but also in India. Subsequently, AA is the eighth largest language family in the world in terms of

population size (104 millions)8 with two major branches: Munda in eastern, northeastern and central

India and Mon-Khmer, which stretches from northeastern India to the Andaman-Nicobar islands, Malay

Peninsula and vast Mekong delta in MSEA. AA is the rst language of many ethnic groups in Cambodia,

Vietnam, Laos, ailand, Burma and Malaysia, and serves as the main ocial language in Cambodia and

Vietnam. Taking these realities into account, decades of research has resulted in long-standing debate

about the geographic origin and prehistoric migratory route of the AA-speaking populations.

Similarly, analysis of genetic data to characterize the origin and migration history of AA-speaking

populations has led to two rival hypotheses9–15. Data from the maternal lineage (mtDNA) makes a clear

distinction between Munda-speakers in India and Mon-Khmer speakers in Southeast Asia, with a lack

of shared mtDNA haplogroups9,15–17. By contrast, data from the paternal lineage (Y-chromosome) indi-

cates a shared Asian-specic haplogroup (O2a1-M95) between the AA speakers from India (66.44% on

average) and from Southeast Asia (56.55% on average)9,10,12,13,18. Given the relatively young age (< 10

KYA) of the O2a1-M95 lineage estimated from the Y-chromosome short tandem repeats (Y-STRs) var-

iation in India, the migratory route of the AA speakers would likely begin in Southeast Asia and then

move to India11,12. However, the high mtDNA haplotype diversity in Munda-speaking populations14,15

and an independent estimate of an old coalescence age (~65 KYA) of the O2a1-M95 lineage in the Indian

AA-speaking populations10 suggests an Indian origin followed by a dispersal to Southeast Asia, possibly

before the Last Glacial Maximum (LGM, 19.0–26.5 KYA)19. is latter hypothesis seems to cope better

with the more widely agreed upon costal migration of modern humans from Africa to Asia by way of

the Indian subcontinent.

While both theories have certain peculiar merits, neither has dealt well with the large discrepancy

of the estimated ages of the O2a1-M95 lineage from dierent studies. One explanation for the marked

dierences in the estimate may be limited samplings of the AA speakers in India and/or dierent gen-

otyping approaches10,12. Fortunately, a recent study with a more extensive sampling of the AA speakers

in India and a few samples from Southeast Asia9 has claried some of these inconsistencies. rough

a genome-wide screening of 610K autosomal sequence variations and uniparental loci, Chaubey et al.

demonstrated an older coalescent time (average 22.4 ± 4.9 KYA) of the O2a1-M95 lineage in Southeast

Asia than that in India (average 15.9 ± 1.6 KYA), lending greater credence to the proposed westward

migration of the AA speakers from Southeast Asia to India. Chaubey et al. also proposed a sex-specic

admixture of the AA-speaking immigrants with local India populations by showing a dierent pattern

in the mtDNA lineage9.

Despite the data contributions from Chaubey et al. and numerous other studies on AA speakers, AA

populations from MSEA and southern China continue to be under- sampled and represented. Similarly,

no other southern populations have been included in these analysis to date, in spite of the high frequency

of O2a1-M95 in certain populations, such as among Daic-speaking populations that have a ~45% fre-

quency20–23. Complicating these oversights, existing genomic analysis also suers from some deciencies.

For example, the Illumina Human Hap 610K Chips were developed by covering sequence variations iden-

tied in limited world populations, which in turn limits its power to detect genetic relationships among

the hypothetically ancient AA populations. Given the sampling, methodology and technical limitations

inherent in the existing literature, basic questions—where did the O2a1-M95 carrying AA-speakers orig-

inally emerge, or when did it begin expanding into Asia—remain unanswered.

In this study, we collected samples of 21 AA-speaking populations from Cambodia, ailand and

southern China (totally, 646 males)(Fig.1). For individuals belonging to the O2a1-M95 lineage (343 of

the 646)18, we conducted genotyping of 16 Y-STRs. We also collected published data of the O2a1-M95

carriers from 107 populations (2,510 O2a1-M95 out of 7,693 male individuals in total) covering all the

geographic distributions of the AA speakers as well as the other major language families in eastern Asia

and India. To date, this data marks the most comprehensive collection of data of O2a1-M95 diversity.

Our analysis showed that the O2a1-M95 lineage initially originated in the southern part of eastern Asia

among the Daic-speaking populations around 20–40 KYA, followed by a southward dispersal to the

heartland of MSEA ~16 KYA, and then a westward migration to India ~ 10 KYA. Furthermore, analysis

of more than 20,000 mtDNA sequences, including these AA populations and other Asian populations,

demonstrated that the maternal lineage has a dierent pattern from the Y-chromosome for these AA

populations, supporting the proposed sex-biased admixture of the AA immigrants with local people in

the Indian subcontinent.

www.nature.com/scientificreports/

SCIENTIFIC RepoRts | 5:15486 | DOI: 10.1038/srep15486

Results

High O2a1-M95 frequencies in the AA populations from MSEA and southern China. e

O2a1-M95 lineage was reported to be highly prevalent in some AA populations in India, e.g., as high

as 67.53% and 74.00% respectively in Munda and Mon-Khmer populations9,10. We observed high

O2a1-M95 frequencies in AA populations not analyzed in previous studies from Cambodia (70.67%),

ailand (52.51%) and Southern China (30.00%) (Fig.2A, Table1 and supplementary Table S2)10–12,15.

In the Andaman-Nicobar islands, O2a1-M95 was also widespread (~45.18% on average) and is xed

(100%) in several populations, such as the Shompen and Onge9,10, likely due to a strong bottleneck eect

in these island populations, which is reected in other major Y-chromosome lineages (e.g. DE-YAP and

O3-M22)24–26. (Fig.2A and supplementary Table S2). Consistent with previous results, the collective data

shows that O2a1-M95 lineage is dominant in almost all AA populations, including those from MSEA

and southern China, making it an informative genetic marker for tracing the patrilineal prehistory of

the AA populations.

Dating the O2a1-M95 lineages of dierent Asian populations based on Y-STRs variations.

Previous studies have sampled few AA populations from MSEA and Southern China9,10,12. To ll the

sampling gap, we sampled a wide range of AA-speaking populations from Cambodia, ailand and

southern China18 and genotyped 16 Y-STRs loci for those samples belonging to the O2a1-M95 lineage

(Fig.1, Table1 and supplementary Table S1). Integrating these samples with the previous data, we dated

the O2a1-M95 lineages among dierent regional populations (Fig.3, supplementary Tables S3 and S4)

and observed that the O2a1-M95 lineage has the oldest time of most recent common ancestor (TMRCA)

among the populations in the southern part of mainland China and Taiwan (~20–40 KYA), most of

which are Daic speaking (Fig.3, Supplementary Tables S3 and S4). e average TMRCA for these Daic

and Austronesian populations from southern China is ~30 KYA, markedly older than those in MSEA

(~16 KYA), India (~10 KYA) or Island Southeast Asia (ISEA, ~11 KYA) (Fig.3, supplementary Tables

S3 and S4). e estimated coalescence ages for the AA speakers from MSEA, ISEA and India are similar

to those reported by Chaubey et al.9. At the same time, the estimated ages of O2a1-M95 lineages in the

Daic populations was consistent with the estimated ages of its sister lineages (O3-M122 and C-M130)

in the same geographic regions3,27, supporting the proposed antiquity of the Daic populations. ese

lines of evidence suggest that the O2a1-M95 lineage initially originated in the Daic populations living in

southern China, prior to a southward expansion to MSEA and later migrations to India and ISEA aer

the LGM (19.0–26.5 KYA)19.

Comparison of haplotype diversity of the O2a1-M95 lineages among dierent geographic

populations. In line with the estimated TMRCAs, the unbiased Y-STRs haplotype diversity of the

O2a1-M95 lineage are the highest in populations from southern China (~0.5017 on average), particu-

larly among the Daic populations, followed by those in MSEA (~0.3858), ISEA (~0.3680) and then India

(~0.3168) (Fig.2B and supplementary Table S5), which together match the proposed migratory routes

from southern China to MSEA, and then to ISEA and India. We further calculated the pairwise genetic

distances measured by Fst (supplementary Table S6) and constructed an un-rooted neighbor-joining (NJ)

tree based on the Y-STRs variations that showed populations clustered primarily along their respective

language families and not by geographic regions. is tree structure suggests a within language family

Figure 1. Geographic locations of the studied populations in Asia that contain the O2a1-M95 lineage.

Populations are color-coded based on their language families. e gure was modied from our previous

report43 using Microso Powerpoint 2011 (Microso Corporation, USA).

www.nature.com/scientificreports/

SCIENTIFIC RepoRts | 5:15486 | DOI: 10.1038/srep15486

genetic anity, though there were several interesting exceptions (Fig.4). e AA populations from India

clustered with the AA populations from Cambodia, not with the Dravidian and Indo-European speakers

from India. is grouping strongly supports the hypothesized shared genetic ancestry among the AA

populations, consistent with the previous observation by Chaubey et al.9. We also observed a lack of clear

geographic clustering in the Y-STRs based phylogenetic network of O2a1-M95 (Fig.2C), likely due to

continuous gene ows among the regional AA speakers9.

Interestingly, our analysis departed from several previous observations that had found a clear diver-

gence of the Andaman-Nicobar Island populations from the other AA speakers9. Here, we detected

shared Y-STRs haplotypes in these isolated island populations with some MSEA populations (Fig.2C),

which may not have been apparent in earlier studies due to a generalized under representation of MSEA

populations.

mtDNA diversity suggests a sex-biased migration of AA-speakers from MSEA to the Indian

subcontinent. To check the maternal side of the AA populations, we collected 21,470 mtDNA

sequences from 545 populations distributed in East Asia, Southeast Asia and South Asia (supplementary

Table S7) and analyzed the patterns of mtDNA diversity. Compared with the dominant occurrence of

the O2a1-M95 lineage (65.53% on average) and the high frequency (e.g., 44.57%) of other East Asian

specic lineages (NO, N, O, P and Q, supplementary Table S8) in the South Asia populations, we found

only ~16.46% mtDNA sequences belonging to the East Asian specic lineages (A, B, C, D, F, G, M9 and

M12) in South Asia (supplementary Figures S1, S2 and supplementary Table S9). PCA analysis using

mtDNA haplotype frequencies indicated a clustering pattern of geographic locations and not language

families, which is dierent from that of the Y-chromosome data. For example, AA and TB populations

from India clustered with Dravidian and Indo-European populations from India and not the other AA

populations from southern China and Southeast Asia (Fig.5). is discrepancy supports the notion that

the prehistoric migration of the AA-speakers from MSEA to India was likely sex-biased, conrming the

hypothetical sex-biased admixture of the India AA populations posited by Chaubey et al.9.

Figure 2. Frequency distribution, Uh diversity and phylogenetic structure of the O2a1-M95 lineages

among Asian populations. Contour map shows the frequency (A) and Y-STRs Uh diversity (B) of lineage

O2a1-M95 in Asia. Colored dots indicate the geographic locations of the analysed populations that

correspond with Fig.1; Bars indicate the frequency and Uh diversity spectrum respectively. (C) Phylogenetic

network of Y-STRs haplotypes among O2a1-M95 populations generated from the following 14 Y-STRs:

DYS19, DYS389 I, DYS389II, DYS390, DYS391, DYS392, DYS393, DYS437, DYS438, DYS439, DYS448,

DYS458, DYS635 and GATA H4; Circles size is proportional to the number of samples. e contour maps

were generated using Surfer10 (Golden Soware Inc., Golden, USA), and the network was constructed using

the Network package 4.6.1.3 (www.uxus-engineering.com).

www.nature.com/scientificreports/

SCIENTIFIC RepoRts | 5:15486 | DOI: 10.1038/srep15486

Discussion

roughout the previous studies, the two primary competing conceptions on the origin and prehistoric

migratory pattern of the AA populations have le considerable debate, in part due to not including a

wider geographic sampling. Here, we tested these rival hypotheses by systematically collecting AA sam-

ples from MSEA and southern China, and observed high frequencies of the O2a1-M95 lineages across all

the studied AA populations. is broader survey conrmed that this Y-chromosome lineage represents

a genetic signature of all AA populations, and can serve as an eective genetic marker for tracing the

prehistoric movements and origins of these populations.

NO. Population Region L ocation Linguistic Family Sub-Branch N

O2a1-M95

Counts %

1 Brao Cambodia Ratanakri Austro-Asiatic West Bahnaric 37 24 64.86

2Jarai Cambodia Ratanakri Austronesian Chamic 45 34 75.56

3 Kachac Cambodia Ratanakri Austro-Asiatic North Bahnaric 17 13 76.47

4 Khmer Cambodia Kratie Austro-Asiatic Khmer 34 18 52.94

5 Kravet Cambodia Ratanakri Austro-Asiatic West Bahnaric 24 12 50.00

6 Kreung Cambodia Ratanakri Austro-Asiatic West Bahnaric 22 14 63.64

7Kuy Cambodia Stung Treng Austro-Asiatic Katuic 37 34 91.89

8 Lao Cambodia Stung Treng Daic Kadai 27 14 51.85

9Lun Cambodia Ratanakri Austro-Asiatic West Bahnaric 13 12 92.31

10 Mel Cambodia Kratie Austro-Asiatic Monic 19 15 78.95

11 Phnong Cambodia Kratie Austro-Asiatic South Bahnaric 26 20 76.92

12 Stieng Cambodia Kratie Austro-Asiatic South Bahnaric 12 8 66.67

13 Tom pou n Cambodia Ratanakri Austro-Asiatic South Bahnaric 51 37 72.55

14 Kraol Cambodia Ratanakri Austro-Asiatic South Bahnaric 1 1 100%

14 Blang ailand Chiang Rai Austro-Asiatic Waic 7 5 71.43

15 Htin ailand Nan Austro-Asiatic Mal-Phrai 35 30 85.71

16 Lawa ailand Chiang Mai Austro-Asiatic Waic 41 14 34.15

17 Palaung ailand Chiang Mai Austro-Asiatic Palaung-Riang 16 3 18.75

18 Mon ailand Chiang Mai Austro-Asiatic Monic 2 0 0

19 Bulang China Yunnan Austro-Asiatic Wai c 55 17 30.91

20 Wa China Yunnan Austro-Asiatic Palaung-Riang 57 5 8.77

21 De’ang China Yunnan Austro-Asiatic Wai c 68 13 19.12

Tot a l 646 343 53.10

Table 1. Sampled populations from MSEA and southern China.

Figure 3. Comparison of coalescence ages of the O2a1-M95 lineages among dienent geographic

populations. e age of each geographic or linguistic group was calculated by taking the average of

respective populations from supplementary Table S3.

www.nature.com/scientificreports/

SCIENTIFIC RepoRts | 5:15486 | DOI: 10.1038/srep15486

e Y-chromosome data collected in this study does not support an Indian origin of the O2a1-M95 lin-

eage, but instead shows that O2a1-M95 carriers in India originated in southern China and then migrated

from MSEA to India around 10 KYA aer the LGM. Moreover, our broader analysis that included Daic

speaking populations from southern China showed that this population possessed the most diversied

O2a1-M95 lineage with an average coalescence age of ~30 KYA, making it the oldest of all known

O2a1-M95 carrying populations, and thereby supporting an initial origin of this Y-chromosome lineage

in Daic speakers who migrated southward to MESA and a later westward to India ~10 KYA. During the

preparation of this manuscript, Arunkumar et al. published a similar analysis of O2a1-M95 in Asian pop-

ulations28, and their data also favored an east-to-west migration although the estimated age of migration

was much younger than ours due to dierent mutation rates and methods used for age estimation. Our

analysis of mtDNA diversity suggests that aer dispersal to India, the O2a1-M95 carrying populations

widely absorbed the local maternal gene pool. In contrast to the well-known earliest migration of mod-

ern humans from Africa to eastern Asia by way of the Indian subcontinent, our data illustrates a back

wave and sex-biased migration of the AA speakers from MSEA to India aer the LGM, hinting at a far

more complex prehistory of Paleolithic human populations.

Figure 4. NJ-tree constructed of Y-STRs variations among dierent language family populations.

Dierent linguistic families are shown using dierent colors. Branch length values are indicated above the

branch.

Figure 5. Map of principal component analysis (PCA) among Asian populations. Populations of East

Asia and South Asia were grouped respectively by geograpghic region and language family. AA and TB-

speaking populations closely clustered with DR anf IE populations in the lower le. e rst and the second

components explain 15.25% and 7.10% of the genetic variance, respectively.

www.nature.com/scientificreports/

SCIENTIFIC RepoRts | 5:15486 | DOI: 10.1038/srep15486

Materials and Methods

Genotyping and data collection. For the 343 male samples that belong to the O2a1-M95 lineage

from our previous study (Table1)18, we genotyped 16 Y-STRs (DYS19/394, DYS388, DYS389 I, DYS389

II, DYS390, DYS391, DYS392, DYS393, DYS437, DYS438, DYS439, DYS448, DYS458, DYS461, DYS635

and GATA H4) with the methods described previously29,30. DYS389I (DYS389cd) was subtracted from

DYS389II and renamed 389ab because DYS389II contains the repeat number of DYS389I. To dissect the

origin and migratory patterns of the O2a1-M95 lineage, we collected all available O2a1-M95 Y-STRs

data, which covers 107 geographic populations (up to 2,510 samples carrying O2a1-M95) from East Asia,

Southeast Asia and South Asia9,12,20–24,26,31–38 (Fig.1, supplementary Tables S1 and S2).

Data analysis. We estimated the time of most recent common ancestor (TMRCA) of the O2a1-M95

lineage using Y-STRs variation in each population as described previously, with a 25-year generation

time and a mutation rate of 6.9 × 10−4 12,39 (supplementary Table S3). For comparison, when calculating

the ages we used three sets of loci for each population: a) the actual number of loci in the corresponding

references, b) a 7-loci set (DYS19, DYS389 I, DYS389 II, DYS390, DYS391, DYS392 and DYS393) and c)

a 6-loci set (DYS19, DYS389 I, DYS390, DYS391, DYS392 and DYS393), and the results from dierent

calculations are very similar for most populations (supplementary Table S4). e mean TMRCAs of a

geographic region are the average of its populations (Fig.3). We also estimated the unbiased haplotype

diversity of every population using GenAlEx 6.3. When estimating the age and diversity, O2a1-M95 pop-

ulations with less than 10 samples were either excluded or merged to other closely related populations.

In total, the coalescence ages and diversity of the O2a1-M95 lineages from 105 Asian populations were

calculated (supplementary Tables S3 and S5).

A median-joining network, resolved with the MP algorithm, was constructed using the Network

package 4.6.1.3 (www.uxus-engineering.com). e O2a1-M95 variance isofrequency maps based on fre-

quency and unbiased haplotype diversity were generated using Surfer10 (Golden Soware Inc., Golden,

USA), following the Kriging procedure. Average number of pairwise dierence of Y-STRs for the studied

populations was calculated using the Arlequin 3.540, and NJ-tree was constructed with MEGA 6.041.

We performed principal component analysis (PCA) based on the frequencies of mtDNA haplogroups

according to the method developed by Richards et al.42 with MVSP 3.13.

To compare the paternal and maternal gene pool between populations from East Asia and South Asia,

we analyzed ~21,470 mtDNA sequences among these populations published previously (Supplementary

Table S7).

References

1. Macaulay, V. et al. Single, rapid coastal settlement of Asia revealed by analysis of complete mitochondrial genomes. Science 308,

1034–1036, doi: 10.1126/science.1109792 (2005).

2. Su, B. et al. Y-Chromosome evidence for a northward migration of modern humans into Eastern Asia during the last Ice Age.

Am. J. Hum. Genet. 65, 1718–1724, doi: 10.1086/302680 (1999).

3. Shi, H. et al. Y-chromosome evidence of southern origin of the East Asian-specic haplogroup O3-M122. Am. J. Hum. Genet.

77, 408–419, doi: 10.1086/444436 (2005).

4. Consortium, H. P.-A. S. et al. Mapping human genetic diversity in Asia. Science 326, 1541–1545, doi: 10.1126/science.1177074

(2009).

5. Su, B. et al. Polynesian origins: insights from the Y chromosome. Proc Natl Acad Sci USA 97, 8225–8228 (2000).

6. Jin, L. & Su, B. Natives or immigrants: modern human origin in east Asia. Nat. ev. Genet. 1, 126–133, doi: 10.1038/35038565

(2000).

7. Zhang, X. et al. Analysis of mitochondrial genome diversity identies new and ancient maternal lineages in Cambodian

aborigines. Nat Commun 4, 2599, doi: 10.1038/ncomms3599 (2013).

8. Lewis, M. P. Ethnologue: languages of the world. Dallas (TX): SIL International. [Internet; cited 2010 Sep],< http://www.

ethnologue.com/> (2009).

9. Chaubey, G. et al. Population genetic structure in Indian Austroasiatic speaers: the role of landscape barriers and sex-specic

admixture. Mol. Biol. Evol. 28, 1013–1024, doi: 10.1093/molbev/msq288 (2011).

10. umar, V. et al. Y-chromosome evidence suggests a common paternal heritage of Austro-Asiatic populations. BMC Evol. Biol. 7,

47, doi: 10.1186/1471-2148-7-47 (2007).

11. Sahoo, S. et al. A prehistory of Indian Y chromosomes: evaluating demic diusion scenarios. Proc Natl Acad Sci USA 103,

843–848, doi: 10.1073/pnas.0507714103 (2006).

12. Sengupta, S. et al. Polarity and temporality of high-resolution y-chromosome distributions in India identify both indigenous and

exogenous expansions and reveal minor genetic inuence of Central Asian pastoralists. Am. J. Hum. Genet. 78, 202–221, doi:

10.1086/499411 (2006).

13. ivisild, T. et al. e genetic heritage of the earliest settlers persists both in Indian tribal and caste populations. Am. J. Hum.

Genet. 72, 313–332, doi: 10.1086/346068 (2003).

14. Charavarti, A. Human genetics: Tracing India’s invisible threads. Nature 461, 487–488, doi: 10.1038/461487a (2009).

15. Basu, A. et al. Ethnic India: a genomic view, with special reference to peopling and structure. Genome es. 13, 2277–2290, doi:

10.1101/gr.1413403 (2003).

16. Metspalu, M. et al. Most of the extant mtDNA boundaries in south and southwest Asia were liely shaped during the initial

settlement of Eurasia by anatomically modern humans. BMC Genet. 5, 26, doi: 10.1186/1471-2156-5-26 (2004).

17. eddy, B. M. et al. Austro-Asiatic tribes of Northeast India provide hitherto missing genetic lin between South and Southeast

Asia. PLoS One 2, e1141, doi: 10.1371/journal.pone.0001141 (2007).

18. Zhang, X. et al. An updated phylogeny of the human Y-chromosome lineage O2a-M95 with novel SNPs. PLoS One 9, e101020,

doi: 10.1371/journal.pone.0101020 (2014).

19. Clar, P. U. et al. e Last Glacial Maximum. Science 325, 710–714, doi: 10.1126/science.1172873 (2009).

20. Cai, X. et al. Human migration through bottlenecs from Southeast Asia into East Asia during Last Glacial Maximum revealed

by Y chromosomes. PLoS One 6, e24282, doi: 10.1371/journal.pone.0024282 (2011).

www.nature.com/scientificreports/

SCIENTIFIC RepoRts | 5:15486 | DOI: 10.1038/srep15486

21. Li, H. et al. Paternal genetic anity between Western Austronesians and Daic populations. BMC Evol. Biol. 8, 146, doi:

10.1186/1471-2148-8-146 (2008).

22. Li, D. et al. Paternal genetic structure of Hainan aborigines isolated at the entrance to East Asia. PLoS One 3, e2168, doi: 10.1371/

journal.pone.0002168 (2008).

23. Gan, . J. et al. Pinghua population as an exception of Han Chinese’s coherent genetic structure. J. Hum. Genet. 53, 303–313,

doi: 10.1007/s10038-008-0250-x (2008).

24. angaraj, . et al. Genetic anities of the Andaman Islanders, a vanishing human population. Curr. Biol. 13, 86–93 (2003).

25. Trivedi, . et al. Molecular insights into the origins of the Shompen, a declining population of the Nicobar archipelago. J. Hum.

Genet. 51, 217–226, doi: DOI 10.1007/s10038-005-0349-2 (2006).

26. Chandrasear, A. et al. YAP insertion signature in South Asia. Ann Hum Biol 34, 582–586, doi: 10.1080/03014460701556262

(2007).

27. Zhong, H. et al. Global distribution of Y-chromosome haplogroup C reveals the prehistoric migration routes of African exodus

and early settlement in East Asia. J. Hum. Genet. 55, 428–435, doi: 10.1038/jhg.2010.40 (2010).

28. GaneshPrasad, A. et al. A late Neolithic expansion of Y chromosomal haplogroup O2a1-M95 from east to west. Journal of

Systematics and Evolution. 1–15 doi: 10.1111/jse.12147 (2015).

29. Butler, J. M. et al. A novel multiplex for simultaneous amplication of 20 Y chromosome ST marers. Forensic Sci. Int. 129,

10–24 (2002).

30. ayser, M. et al. Evaluation of Y-chromosomal STs: a multicenter study. Int J Legal Med 110, 125–133, 141–129 (1997).

31. Deln, F. et al. e Y-chromosome landscape of the Philippines: extensive heterogeneity and varying genetic anities of Negrito

and non-Negrito groups. Eur J Hum Genet 19, 224–230, doi: 10.1038/ejhg.2010.162 (2011).

32. Hammer, M. F. et al. Dual origins of the Japanese: common ground for hunter-gatherer and farmer Y chromosomes. J. Hum.

Genet. 51, 47–58, doi: 10.1007/s10038-005-0322-0 (2006).

33. He, J. D. et al. Patrilineal perspective on the Austronesian diusion in Mainland Southeast Asia. PLoS One 7, e36437, doi:

10.1371/journal.pone.0036437 (2012).

34. arafet, T. M. et al. Major east-west division underlies Y chromosome stratication across Indonesia. Mol. Biol. Evol. 27,

1833–1844, doi: 10.1093/molbev/msq063 (2010).

35. utanan, W. et al. Genetic structure of the Mon-hmer speaing groups and their anity to the neighbouring Tai populations

in Northern ailand. BMC Genet. 12, 56, doi: 10.1186/1471-2156-12-56 (2011).

36. Nonaa, I., Minaguchi, . & Taezai, N. Y-chromosomal binary haplogroups in the Japanese population and their relationship

to 16 Y-ST polymorphisms. Ann Hum Genet 71, 480–495, doi: 10.1111/j.1469-1809.2006.00343.x (2007).

37. Xue, Y. et al. Male demography in East Asia: a north-south contrast in human population expansion times. Genetics 172,

2431–2439, doi: 10.1534/genetics.105.054270 (2006).

38. Trivedi, . et al. Molecular insights into the origins of the Shompen, a declining population of the Nicobar archipelago. J. Hum.

Genet. 51, 217–226, doi: 10.1007/s10038-005-0349-2 (2006).

39. Zhivotovsy, L. A. et al. e eective mutation rate at Y chromosome short tandem repeats, with application to human

population-divergence time. Am. J. Hum. Genet. 74, 50–61, doi: Doi 10.1086/380911 (2004).

40. Excoer, L., Laval, G. & Schneider, S. Arlequin (version 3.0): an integrated soware pacage for population genetics data

analysis. Evol Bioinform Online 1, 47–50 (2005).

41. Tamura, ., Stecher, G., Peterson, D., Filipsi, A. & umar, S. MEGA6: Molecular Evolutionary Genetics Analysis version 6.0.

Mol. Biol. Evol. 30, 2725–2729, doi: 10.1093/molbev/mst197 (2013).

42. ichards, M., Macaulay, V., Torroni, A. & Bandelt, H. J. In search of geographical patterns in European mitochondrial DNA. Am.

J. Hum. Genet. 71, 1168–1174, doi: 10.1086/342930 (2002).

43. Shi, H. et al. Genetic evidence of an East Asian origin and paleolithic northward migration of Y-chromosome haplogroup N.

PLoS One 8, e66102, doi: 10.1371/journal.pone.0066102 (2013).

Acknowledgements

We are grateful to all the volunteers for providing blood samples, and to Andrew Willden for editing

the manuscript. is study was supported by the National 973 Program of China (2012CB518202 to

X.Q.), the National Natural Science Foundation of China (31130051 and 91231203 to B.S., 31371268

and 91131001 to H.S. and 31371269 to X.Q.) and the Natural Science Foundation of Yunnan Province

(2010CI044 to H.S.).

Author Contributions

B.S. and H.S. designed the experiment; X.M.Z., X.B.Q., J.K., Z.H.Y., B.S., T.S., L.B., H.S.A., H.S., D.K.

and H.S. collected the samples; X.M.Z., S.Y.L. and X.B.Q. collected the data and conducted data analysis;

J.W.L. and H.Z. provided technical assistance in the experiments; X.M.Z., X.B.Q., H.S. and B.S. wrote

the manuscript.

Additional Information

Supplementary information accompanies this paper at http://www.nature.com/srep

Competing nancial interests: e authors declare no competing nancial interests.

How to cite this article: Zhang, X. et al. Y-chromosome diversity suggests southern origin

and Paleolithic backwave migration of Austro-Asiatic speakers from eastern Asia to the Indian

subcontinent. Sci. Rep. 5, 15486; doi: 10.1038/srep15486 (2015).

is work is licensed under a Creative Commons Attribution 4.0 International License. e

images or other third party material in this article are included in the article’s Creative Com-

mons license, unless indicated otherwise in the credit line; if the material is not included under the

Creative Commons license, users will need to obtain permission from the license holder to reproduce

the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/

Y-chromosome diversity suggests southern origin and Paleolithic backwave migration of Austro-Asiatic speakers from eastern Asia to the Indian subcontinent-supplement

Data

November 2015

Xiaoming Zhang · Shiyu Liao · Xuebin Qi · Jiewei Liu · Bing Su

Download

Y-chromosome diversity suggests southern origin and Paleolithic backwave migration of Austro-Asiatic speakers from eastern Asia to the Indian subcontinent-supplement

Data

November 2015

Xiaoming Zhang · Shiyu Liao · Xuebin Qi · Jiewei Liu · Bing Su

Download

Prehistoric human migration between Sundaland and South Asia was driven by sea-level rise

Article

Full-text available

Feb 2023

Rapid sea-level rise between the Last Glacial Maximum (LGM) and the mid-Holocene transformed the Southeast Asian coastal landscape, but the impact on human demography remains unclear. Here, we create a paleogeographic map, focusing on sea-level changes during the period spanning the LGM to the present-day and infer the human population history in Southeast and South Asia using 763 high-coverage whole-genome sequencing datasets from 59 ethnic groups. We show that sea-level rise, in particular meltwater pulses 1 A (MWP1A, ~14,500–14,000 years ago) and 1B (MWP1B, ~11,500–11,000 years ago), reduced land area by over 50% since the LGM, resulting in segregation of local human populations. Following periods of rapid sea-level rises, population pressure drove the migration of Malaysian Negritos into South Asia. Integrated paleogeographic and population genomic analysis demonstrates the earliest documented instance of forced human migration driven by sea-level rise.

The Origin and Dispersal of Austroasiatic Languages from the Perspectives of Linguistics, Archeology and Genetics

Article

Full-text available

Mar 2024
Hum Biol

The Austroasiatic (AA) languages are a large language family in Mainland Southeast and South Asia. Theoretical, methodological, and material constraints have limited research on the origin and dispersal of AA-speaking populations within historical-comparative linguistics. With the deepening of archaeological and genetic studies, interdisciplinary collaboration has become the key to solving this problem. Based on the latest achievements of linguistics, archaeology and molecular anthropology, we review the hypotheses and propose insights on the origin and dispersal of AA languages. The ancestors of the AA-speaking populations were suggested to be the rice farmers living in the Neolithic Age in southern China. During 3000–4500 BP, some of these ancestors who spoke Proto-AA migrated from southern China to northern Vietnam together with shouldered stone tools and domesticated rice. They mixed with local hunter-gatherers and expanded to the south of Mainland Southeast Asia (MSEA), giving rise to the Mon-Khmer, Aslian, and Nicobarese populations. They also spread to the Preprint version. Visit http://digitalcommons.wayne.edu/humbiol/ after publication to acquire the final version. northeast of India to form the Munda-speaking populations. Another group arrived near Dian Lake in Yunnan about 2500 BP, where they created the Bronze Drum culture with the Proto-Tai-Kadai (TK)-speaking populations and later spread eastward to northern Vietnam via Guangxi. Finally, the Proto-AA-speaking people who remained in southern China mixed with the Proto-TK-speaking groups from Fujian and Guangdong, leading to a language shift, which we hypothesized was one of the main reasons for the “disappearance” of the AA in southern China.

Parallel signatures of Mycobacterium tuberculosis and human Y-chromosome phylogeography support the Two Layer model of East Asian population history

Article

Full-text available

Oct 2023

The Two Layer hypothesis is fast becoming the favoured narrative describing East Asian population history. Under this model, hunter-gatherer groups who initially peopled East Asia via a route south of the Himalayas were assimilated by agriculturalist migrants who arrived via a northern route across Eurasia. A lack of ancient samples from tropical East Asia limits the resolution of this model. We consider insight afforded by patterns of variation within the human pathogen Mycobacterium tuberculosis (Mtb) by analysing its phylogeographic signatures jointly with the human Y-chromosome. We demonstrate the Y-chromosome lineages enriched in the traditionally hunter-gatherer groups associated with East Asia’s first layer of peopling to display deep roots, low long-term effective population size, and diversity patterns consistent with a southern entry route. These characteristics mirror those of the evolutionarily ancient Mtb lineage 1. The remaining East Asian Y-chromosome lineage is almost entirely absent from traditionally hunter-gatherer groups and displays spatial and temporal characteristics which are incompatible with a southern entry route, and which link it to the development of agriculture in modern-day China. These characteristics mirror those of the evolutionarily modern Mtb lineage 2. This model paves the way for novel host-pathogen coevolutionary research hypotheses in East Asia.

Guomo open-air site (15-12 ka) in Guangxi Zhuang Autonomous Region, southern China: A new cobble-based industry for rethinking the definition of "Hoabinhian"

Article

May 2023
JASR

Cobbles during the final Pleistocene-early Holocene transition: An original lithic assemblage from Maomaodong rockshelter, Guizhou Province, southwest China

Article

Dec 2022

The persistence of the cobble-tool tradition in South China and Mainland Southeast Asia (MSEA) is a major characteristic of the Paleolithic culture in this region, and researchers have long recognized this phenomenon since the 1940s (H. Movius). However, the cobble-tool tradition is not without changes and diversity; the most significant is the emergence of the Hoabinhian phenomenon in Southwest China and MSEA during the final Late Pleistocene. The Hoabinhian tools could well illustrate the presence of variability among the cobble tools produced by modern humans. However, the technological variability of lithic industries on a larger scale remains elusive because only a few sites have been studied with a technological method and provided more detailed information than the definition of ‘simple chopper-chopping tool’ assemblages, which disguised diverse local facts and various knapping strategies. Here we expose an original technological behavior on cobbles discovered at Maomaodong rockshelter, Guizhou Province, southwest China. The lithic assemblage is characterized by the cobble-split flaking dating to the final Pleistocene-early Holocene transition. On the one hand, dominated by flaking and small flake tools rather than shaping and large/heavy tools, Maomaodong lithic assemblage could represent a new knapping strategy among the ‘cobble/pebble tradition’ in southern China and MSEA. On the other hand, macroscopic and diachronic observations of the lithic industries on the Yunnan-Guizhou Plateau indicate that the lithic technology at Maomaodong is also a continuity of the local core-flaking tradition. The originality of Maomaodong lies in the reconciliation of the two traditions (i.e., cobble/pebble-tool and core-flaking traditions), making it different from both.

The Archaeological Record of Indian Ocean Engagements: Bay of Bengal (5000 bc–500 ad)

Chapter

Jun 2018

Sunil Gupta

This handbook is currently in development, with individual articles publishing online in advance of print publication. At this time, we cannot add information about unpublished articles in this handbook, however the table of contents will continue to grow as additional articles pass through the review process and are added to the site. Please note that the online publication date for this handbook is the date that the first article in the title was published online. For more information, please read the site FAQs.

Sex-Biased Population Admixture Mediated Subsistence Strategy Transition of Heishuiguo People in Han Dynasty Hexi Corridor

Article

Full-text available

Mar 2022

The Hexi Corridor was an important arena for culture exchange and human migration between ancient China and Central and Western Asia. During the Han Dynasty (202 BCE–220 CE), subsistence strategy along the corridor shifted from pastoralism to a mixed pastoralist-agriculturalist economy. Yet the drivers of this transition remain poorly understood. In this study, we analyze the Y-chromosome and mtDNA of 31 Han Dynasty individuals from the Heishuiguo site, located in the center of the Hexi Corridor. A high-resolution analysis of 485 Y-SNPs and mitogenomes was performed, with the Heishuiguo population classified into Early Han and Late Han groups. It is revealed that (1) when dissecting genetic lineages, the Yellow River Basin origin haplogroups (i.e., Oα-M117, Oβ-F46, Oγ-IMS-JST002611, and O2-P164+, M134-) reached relatively high frequencies for the paternal gene pools, while haplogroups of north East Asian origin (e.g., D4 and D5) dominated on the maternal side; (2) in interpopulation comparison using PCA and Fst heatmap, the Heishuiguo population shifted from Southern-Northern Han cline to Northern-Northwestern Han/Hui cline with time, indicating genetic admixture between Yellow River immigrants and natives. By comparison, in maternal mtDNA views, the Heishuiguo population was closely clustered with certain Mongolic-speaking and Northwestern Han populations and exhibited genetic continuity through the Han Dynasty, which suggests that Heishuiguo females originated from local or neighboring regions. Therefore, a sex-biased admixture pattern is observed in the Heishuiguo population. Additionally, genetic contour maps also reveal the same male-dominated migration from the East to Hexi Corridor during the Han Dynasty. This is also consistent with historical records, especially excavated bamboo slips. Combining historical records, archeological findings, stable isotope analysis, and paleoenvironmental studies, our uniparental genetic investigation on the Heishuiguo population reveals how male-dominated migration accompanied with lifestyle adjustments brought by these eastern groups may be the main factor affecting the subsistence strategy transition along the Han Dynasty Hexi Corridor.

Evolutionary profiles and complex admixture landscape in East Asia: New insights from modern and ancient Y chromosome variation perspectives

Article

Apr 2024

Human Y-chromosomes are characterized by nonrecombination and uniparental inheritance, carrying traces of human history evolution and admixture. Large-scale population-specific genomic sources based on advanced sequencing technologies have revolutionized our understanding of human Y chromosome diversity and its anthropological and forensic applications. Here, we reviewed and meta-analyzed the Y chromosome genetic diversity of modern and ancient people from China and summarized the patterns of founding lineages of spatiotemporally different populations associated with their origin, expansion, and admixture. We emphasized the strong association between our identified founding lineages and language-related human dispersal events correlated with the Sino-Tibetan, Altaic, and southern Chinese multiple-language families related to the Hmong-Mien, Tai-Kadai, Austronesian, and Austro-Asiatic languages. We subsequently summarize the recent advances in translational applications in forensic and anthropological science, including paternal biogeographical ancestry inference (PBGAI), surname investigation, and paternal history reconstruction. Whole-Y sequencing or high-resolution panels with high coverage of terminal Y chromosome lineages are essential for capturing the genomic diversity of ethnolinguistically diverse East Asians. Generally, we emphasized the importance of including more ethnolinguistically diverse, underrepresented modern and spatiotemporally different ancient East Asians in human genetic research for a comprehensive understanding of the paternal genetic landscape of East Asians with a detailed time series and for the reconstruction of a reference database in the PBGAI, even including new technology innovations of Telomere-to-Telomere (T2T) for new genetic variation discovery.

Under the name of “Lua”: Revisiting Genetic Heterogeneity and Population Ancestry of Austroasiatic speakers in Northern Thailand Through Genomic Analysis

Preprint

Full-text available

Apr 2024

Austroasiatic (AA)-speaking populations in northern Thailand are of significant interest due to their status as indigenous descendants and their location at the crossroads of AA prehistoric distribution across Southern China, the Indian Subcontinent, and Mainland Southeast Asia. However, the complexity of ethnic identification can result in inaccuracies regarding the origin and migration history of these populations. To address this, we conducted a genome-wide SNP analysis on 89 individuals from two Lavue- and three Lwa-endonym populations and combined them with previously published data to elucidate the genetic diversity and clustering of AA groups in northern Thailand. Our findings align with linguistic classifications, revealing distinct genetic structure among the three branches of the Mon-Khmer subfamily within the AA family: Monic, Khmuic, and Palaungic. Although the term “Lua” ethnicity is used confusingly to identify ethnic groups belonging to both Khmuic and Palaungic branches, our genomic data clarifies that the Khmuic-speaking Lua living on the eastern side of the region show genetic differentiation from the Palaungic-speaking Lavue and Lwa populations living on the western side. Within the Palaungic branch, the Dara-ang population stands out as genetically distinct, reflecting remnants of ancient ancestry. The Lavue populations, mainly inhabiting mountainous areas, exhibit a genetic makeup unique to the AA family, with a close genetic relationship to the Karenic subgroup of the ST family. Conversely, the Lwa and Blang populations, residing in lowland river valleys, display genetic signatures resulting from admixture with Tai-Kadai-speaking ethnic groups.

Das Y-Chromosom und die Vorgeschichte der Sprache

Book

Full-text available

Sep 2023

Michael St. Clair

Die englische Originalausgabe dieser Monografie erschien 2021 unter den Titel The Prehistory of Language: A Triangulated Y-Chromosome-Based Perspective. Ich bin Linguist und habe diese Übersetzung für meine Kollegen aus dem Sprachbereich angefertigt. Dennoch hoffe ich, dass andere akademische Forscher sich für diese Arbeit interessieren werden, insbesondere Genetiker, Archäologen, Anthropologen und Geowissenschaftler. Diejenigen, die ein allgemeines Interesse an Sprache und Genetik haben, sind ebenfalls herzlich eingeladen, meine Monografie zu lesen. In den letzten vierzig Jahren haben Forscher dank der Sequenzierungstechnologie die molekulargenetische Variation genutzt, um die menschliche Evolutionsgeschichte zu erforschen. Einige haben versucht, diese neue Forschungsrichtung noch weiter auszudehnen mit der Idee, dass genetische Werkzeuge die Vorgeschichte der Sprache erklären können. Da wir unsere Gene und unsere Muttersprache von unseren Eltern geerbt haben, sollten genetische und sprachliche Variationen gut miteinander korrelieren. Die Entschlüsselung der sprachlichen Vorgeschichte anhand genetischer Daten erfordert jedoch die Klärung mehrerer Fragen. Sollen wir die heutige DNA oder die alte DNA oder beides verwenden? Sollen wir mitochondriale, Y-Chromosomen- oder autosomale Marker verwenden? Sollten wir Modelle der Sprachvorgeschichte mit statistischen Methoden erstellen? Oder sollten wir Modelle mit einer Synthese aus archäologischen und paläoklimatologischen Daten erstellen? Ich schlage vor, dass wir eine triangulierte Y-Chromosom-basierte Modellierung als methodische Lösung für die Entschlüsselung der Vorgeschichte der Sprache mit genetischen Werkzeugen verwenden. In meiner Forschung wurden mindestens 110 sprachlich informative Y-Chromosom-Mutationen identifiziert. Die Evolutionsgeschichte dieser Mutationen deutet darauf hin, dass die Geschichte der Sprache vor etwa 100 000 Jahren begann, als der Homo sapiens aus Afrika auswanderte. Nachfolgende Migrationen sowie kulturelle und evolutionäre Anpassungen erklären dann die Ausbreitung der Sprache in alle Teile der Welt. Zu dieser Ausbreitung gehören der Mungo-See-Mensch in Australien, die Mammutsteppen Eurasiens, die feuchte Phase der Sahara-Wüste, die bidirektionale Migration von Rentierzüchtern entlang des Polarkreises, der Ackerbau entlang der Flüsse des Amazonas-Regenwaldes, die Einführung des Reisanbaus in Südasien, Malaria in den Tropen und Hypoxie auf dem tibetischen Plateau.

Arlequin (version 3.0): An integrated software package for population genetics data analysis

Article

Full-text available

Nov 2017
EVOL BIOINFORM

Arlequin ver 3.0 is a software package integrating several basic and advanced methods for population genetics data analysis, like the computation of standard genetic diversity indices, the estimation of allele and haplotype frequencies, tests of departure from linkage equilibrium, departure from selective neutrality and demographic equilibrium, estimation or parameters from past population expansions, and thorough analyses of population subdivision under the AMOVA framework. Arlequin 3 introduces a completely new graphical interface written in C++, a more robust semantic analysis of input files, and two new methods: a Bayesian estimation of gametic phase from multi-locus genotypes, and an estimation of the parameters of an instantaneous spatial expansion from DNA sequence polymorphism. Arlequin can handle several data types like DNA sequences, microsatellite data, or standard multilocus genotypes. A Windows version of the software is freely available on http://cmpg.unibe.ch/software/arlequin3.

A late Neolithic expansion of Y chromosomal haplogroup O2a1-M95 from east to west: Late Neolithic expansion of O2a1-M95

Article

Full-text available

Feb 2015

The origin and dispersal of Y-Chromosomal haplogroup O2a1-M95, distributed across the Austro Asiatic speaking belt of East and South Asia, are yet to be fully understood. Various studies have suggested either an East Indian or Southeast Asian origin of O2a1-M95. We addressed the issue of antiquity and dispersal of O2a1-M95 by sampling 8748 men from India, Laos and China and compared them to 3307 samples from other intervening regions taken from the literature. Analyses of haplogroup frequency and Y-STR data on a total 2413 O2a1-M95 chromosomes revealed that the Laos samples possessed the highest frequencies of O2a1-M95 (74% with >0.5) and its ancestral haplogroups (O2*-P31, O*-M175) as well as a higher proportion of samples with 14STR-median haplotype (17 samples in 14 populations), deep coalescence time (5.7 ± 0.3 Kya) and consorted O2a1-M95 expansion evidenced from STR evolution. All these suggested Laos to carry a deep antiquity of O2a1-M95 among the study regions. A serial decrease in expansion time from east to west: 5.7 ± 0.3 Kya in Laos, 5.2 ± 0.6 in Northeast India and 4.3 ± 0.2 in East India, suggested a late Neolithic east to west spread of the lineage O2a1-M95 from Laos.

An Updated Phylogeny of the Human Y-Chromosome Lineage O2a-M95 with Novel SNPs

Article

Full-text available

Jun 2014
PLOS ONE

Though the Y-chromosome O2a-M95 lineage is one of the major haplogroups present in eastern Asian populations, especially among Austro-Asiatic speaking populations from Southwestern China and mainland Southeast Asia, to date its phylogeny lacks structure due to only one downstream SNP marker (M88) assigned to the lineage. A recent array-capture-based Y chromosome sequencing of Asian samples has yielded a variety of novel SNPs purportedly belonging to the O2a-M95 lineage, but their phylogenetic positions have yet to be determined. In this study, we sampled 646 unrelated males from 22 Austro-Asiatic speaking populations from Cambodia, Thailand and Southwestern China, and genotyped 12 SNP makers among the sampled populations, including 10 of the newly reported markers. Among the 646 males, 343 belonged to the O2a-M95 lineage, confirming the supposed dominance of this Y chromosome lineage in Austro-Asiatic speaking populations. We further characterized the phylogeny of O2a-M95 by defining 5 sub-branches: O2a1*-M95, O2a1a-F789, O2a1b*-F1252, O2a1b1*-M88 and O2a1b1a -F761. This updated phylogeny not only improves the resolution of this lineage, but also allows for greater tracing of the prehistory of human populations in eastern Asia and the Pacific, which may yield novel insights into the patterns of language diversification and population movement in these regions.

MEGA6: Molecular Evolutionary Genetics Analysis Version 6.0

Article

Full-text available

Oct 2013
MOL BIOL EVOL

We announce the release of an advanced version of the Molecular Evolutionary Genetics Analysis (MEGA) software, which currently contains facilities for building sequence alignments, inferring phylogenetic histories, and conducting molecular evolutionary analysis. In version 6.0, MEGA now enables the inference of timetrees, as it implements our RelTime method for estimating divergence times for all branching points in a phylogeny. A new Timetree Wizard in MEGA6 facilitates this timetree inference by providing a graphical user interface (GUI) to specify the phylogeny and calibration constraints step-by-step. This version also contains enhanced algorithms to search for the optimal trees under evolutionary criteria and implements a more advanced memory management that can double the size of sequence data sets to which MEGA can be applied. Both GUI and command-line versions of MEGA6 can be downloaded from www.megasoftware.net free of charge.

Analysis of mitochondrial genome diversity identifies new and ancient maternal lineages in Cambodian aborigines

Article

Full-text available

Oct 2013

Cambodia harbours a variety of aboriginal (and presumably ancient) populations that have largely been ignored in studies of genetic diversity. Here we investigate the matrilineal gene pool of 1,054 Cambodians from 14 geographic populations. Using mitochondrial whole-genome sequencing, we identify eight new mitochondrial DNA haplogroups, all of which are either newly defined basal haplogroups or basal sub-branches. Most of the new basal haplogroups have very old coalescence ages, ranging from ~55,000 to ~68,000 years, suggesting that present-day Cambodian aborigines still carry ancient genetic polymorphisms in their maternal lineages, and most of the common Cambodian haplogroups probably originated locally before expanding to the surrounding areas during prehistory. Moreover, we observe a relatively close relationship between Cambodians and populations from the Indian subcontinent, supporting the earliest costal route of migration of modern humans from Africa into mainland Southeast Asia by way of the Indian subcontinent some 60,000 years ago.

Genetic Evidence of an East Asian Origin and Paleolithic Northward Migration of Y-chromosome Haplogroup N

Article

Full-text available

Jun 2013
PLOS ONE

The Y-chromosome haplogroup N-M231 (Hg N) is distributed widely in eastern and central Asia, Siberia, as well as in eastern and northern Europe. Previous studies suggested a counterclockwise prehistoric migration of Hg N from eastern Asia to eastern and northern Europe. However, the root of this Y chromosome lineage and its detailed dispersal pattern across eastern Asia are still unclear. We analyzed haplogroup profiles and phylogeographic patterns of 1,570 Hg N individuals from 20,826 males in 359 populations across Eurasia. We first genotyped 6,371 males from 169 populations in China and Cambodia, and generated data of 360 Hg N individuals, and then combined published data on 1,210 Hg N individuals from Japanese, Southeast Asian, Siberian, European and Central Asian populations. The results showed that the sub-haplogroups of Hg N have a distinct geographical distribution. The highest Y-STR diversity of the ancestral Hg N sub-haplogroups was observed in the southern part of mainland East Asia, and further phylogeographic analyses supports an origin of Hg N in southern China. Combined with previous data, we propose that the early northward dispersal of Hg N started from southern China about 21 thousand years ago (kya), expanding into northern China 12-18 kya, and reaching further north to Siberia about 12-14 kya before a population expansion and westward migration into Central Asia and eastern/northern Europe around 8.0-10.0 kya. This northward migration of Hg N likewise coincides with retreating ice sheets after the Last Glacial Maximum (22-18 kya) in mainland East Asia.

ARLEQUIN ver. 3.0: an integrated software package for population genetics data analysis

Article

Full-text available

Nov 2004
EVOL BIOINFORM

Human Migration Through Bottlenecks From Southeast Asia Into East Asia During Last Glacial Maximum Revealed by Y Chromosomes

Article

Aug 2011

Molecular anthropological studies of the populations in and around East Asia have resulted in the discovery that most of the Y-chromosome lineages of East Asians came from Southeast Asia. However, very few Southeast Asian populations had been investigated, and therefore, little was known about the purported migrations from Southeast Asia into East Asia and their roles in shaping the genetic structure of East Asian populations. Here, we present the Y-chromosome data from 1,652 individuals belonging to 47 Mon-Khmer (MK) and Hmong-Mien (HM) speaking populations that are distributed primarily across Southeast Asia and extend into East Asia. Haplogroup O3a3b-M7, which appears mainly in MK and HM, indicates a strong tie between the two groups. The short tandem repeat network of O3a3b-M7 displayed a hierarchical expansion structure (annual ring shape), with MK haplotypes being located at the original point, and the HM and the Tibeto-Burman haplotypes distributed further away from core of the network. Moreover, the East Asian dominant haplogroup O3a3c1-M117 shows a network structure similar to that of O3a3b-M7. These patterns indicate an early unidirectional diffusion from Southeast Asia into East Asia, which might have resulted from the genetic drift of East Asian ancestors carrying these two haplogroups through many small bottle-necks formed by the complicated landscape between Southeast Asia and East Asia. The ages of O3a3b-M7 and O3a3c1-M117 were estimated to be approximately 19 thousand years, followed by the emergence of the ancestors of HM lineages out of MK and the unidirectional northward migrations into East Asia.

Ethnologue: Languages of the World

Book

Jan 2009

Melvyn Paul Lewis

Mapping Human Genetic Diversity in Asia

Article

Jan 2009

Asia harbors substantial cultural and linguistic diversity, but the geographic structure of genetic variation across the continent remains enigmatic. Here we report a large-scale survey of autosomal variation from a broad geographic sample of Asian human populations. Our results show that genetic ancestry is strongly correlated with linguistic affiliations as well as geography. Most populations show relatedness within ethnic/linguistic groups, despite prevalent gene flow among populations. More than 90% of East Asian (EA) haplotypes could be found in either Southeast Asian (SEA) or Central-South Asian (CSA) populations and show clinal structure with haplotype diversity decreasing from south to north. Furthermore, 50% of EA haplotypes were found in SEA only and 5% were found in CSA only, indicating that SEA was a major geographic source of EA populations.

Y-chromosome diversity suggests southern origin and Paleolithic backwave migration of Austro- Asiatic speakers from eastern Asia to the Indian subcontinent OPEN

Abstract and Figures

Supplementary resources (2)

Recommended publications

Three phases for the early peopling of Hainan Island viewed from mitochondrial DNA

An Updated Phylogeny of the Human Y-Chromosome Lineage O2a-M95 with Novel SNPs

Genetic Evidence of an East Asian Origin and Paleolithic Northward Migration of Y-chromosome Haplogr...

Analysis of mitochondrial genome diversity identifies new and ancient maternal lineages in Cambodian...

Extended Y Chromosome Investigation Suggests Postglacial Migrations of Modern Humans into East Asia...