ArticlePDF Available

Genetic Structure of the Paternal Lineage of the Roma People

Authors:
  • Hungarian Institute for Forensic Sciences Institute of Forensic Genetics
  • National Center of Health Development
  • J. Selye University in Komarno - Selye János Egyetem Univerzita J. Selyeho v Komárne

Abstract and Figures

According to written sources, Roma (Romanies, Gypsies) arrived in the Balkans around 1,000 years ago from India and have subsequently spread through several parts of Europe. Genetic data, particularly from the Y chromosome, have supported this model, and can potentially refine it. We now provide an analysis of Y-chromosomal markers from five Roma and two non-Roma populations (N = 787) in order to investigate the genetic relatedness of the Roma population groups to one another, and to gain further understanding of their likely Indian origins, the genetic contribution of non-Roma males to the Roma populations, and the early history of their splits and migrations in Europe. The two main sources of the Roma paternal gene pool were identified as South Asian and European. The reduced diversity and expansion of H1a-M82 lineages in all Roma groups imply shared descent from a single paternal ancestor in the Indian subcontinent. The Roma paternal gene pool also contains a specific subset of E1b1b1a-M78 and J2a2-M67 lineages, implying admixture during early settlement in the Balkans and the subsequent influx into the Carpathian Basin. Additional admixture, evident in the low and moderate frequencies of typical European haplogroups I1-M253, I2a-P37.2, I2b-M223, R1b1-P25, and R1a1-M198, has occurred in a more population-specific manner.
Content may be subject to copyright.
Genetic Structure of the Paternal Lineage
of the Roma People
Horolma Pamjav,
1
* Andrea Zala´n,
1
Judit Be´ res,
2
Melinda Nagy,
3
and Yuet Meng Chang
4
1
Institute of Forensic Medicine, Network of Forensic Science Institutes, Ministry of Administration and Justice,
Budapest, Hungary
2
National Centre for Healthcare Audit and Improvement, Budapest, Hungary
3
Department of Biology, J. Selye University, Koma
´rno, Slovakia
4
Victoria Police Forensic Services Department, Melbourne, Victoria, Australia
KEY WORDS Y-STRs; Y-SNPs; Roma; human demographic history
ABSTRACT According to written sources, Roma
(Romanies, Gypsies) arrived in the Balkans around
1,000 years ago from India and have subsequently
spread through several parts of Europe. Genetic data,
particularly from the Y chromosome, have supported
this model, and can potentially refine it. We now provide
an analysis of Y-chromosomal markers from five Roma
and two non-Roma populations (N5787) in order to
investigate the genetic relatedness of the Roma popula-
tion groups to one another, and to gain further under-
standing of their likely Indian origins, the genetic contri-
bution of non-Roma males to the Roma populations, and
the early history of their splits and migrations in
Europe. The two main sources of the Roma paternal
gene pool were identified as South Asian and European.
The reduced diversity and expansion of H1a-M82
lineages in all Roma groups imply shared descent from
a single paternal ancestor in the Indian subcontinent.
The Roma paternal gene pool also contains a specific
subset of E1b1b1a-M78 and J2a2-M67 lineages, implying
admixture during early settlement in the Balkans
and the subsequent influx into the Carpathian
Basin. Additional admixture, evident in the low and
moderate frequencies of typical European haplogroups
I1-M253, I2a-P37.2, I2b-M223, R1b1-P25, and R1a1-
M198, has occurred in a more population-specific
manner. Am J Phys Anthropol 000:000–000, 2011. V
V
C2011
Wiley-Liss, Inc.
The Roma (also known as Gypsies or Romanies) are,
according to historical sources, a population of Indian
origin who migrated to Europe 1,000 years ago, when
they first arrived in the Balkans. Eight to ten million
Roma live in Europe today, with the largest number con-
centrated in Central and Southeastern Europe (Fraser,
1992). The Roma subsequently expanded to the Carpa-
thian Basin in two large waves. The Carpathian Roma
arrived in the 15th–16th centuries; the Vlax Roma came
in the 19th century. The Carpathian Roma now speak
Hungarian, while the Vlach Roma speak both Hungarian
and Romani languages. There is a firm boundary
between these groups in terms of marriage: individuals
belonging to each group can only marry within that
group (Szuhay, 2005). Roma are the largest minority
group in Hungary and constitute about 6–8% of the
Hungarian population (Vajda, 1997). On the basis of
these historical and social data, they thus form a genetic
isolate expected to experience limited inter-population
gene flow, demonstrate a relatively small population
size, and perhaps strong genetic drift. These predictions
can be tested using a genetic approach.
Genetic studies based on mitochondrial DNA (mtDNA)
and Y-chromosomal haplotypes/haplogroups have been
carried out on a number of European, including Hungar-
ian and Roma groups (Fu¨redi et al., 1999; Gresham et al.,
2001; Irwin et al., 2007, Pericic
´et al., 2005a; Egyed et al.,
2007; Nagy et al., 2007; Gusma
˜o et al., 2008). These stud-
ies have demonstrated the particular informativeness of
Y-chromosomal markers. Nevertheless, they have been
limited in size and phylogenetic resolution.
We have therefore undertaken a larger survey of Y-
chromosomal variation in the Roma people in Europe.
We genotyped 12 Y-STRs and 51 Y-SNPs in five Roma
and two non-Roma populations to investigate the follow-
ing areas: i) to examine Y-chromosomal data in the con-
text of the other European Roma and Malaysian Indian
populations, another migrant group originating in India,
to explore common Indian origins; ii) to identify paternal
lineages of the Roma people marking a genetic contribu-
tion of non-Roma males to the present-day Roma gene
pool; and iii) to try to disentangle the early history of
splits and migrations of the Romani population groups
in Europe. This study is expected to provide insights
relevant to migration into and within the Carpathian
Basin, and to identify how Roma population groups have
evolved in recent times as well as how the genetic varia-
tion is distributed in the contemporary Roma gene pool.
Additional Supporting Information may be found in the online
version of this article.
*Correspondence to: Horolma Pamjav, Budapest, Hungary, PO
Box. 216., H-1536. E-mail: phorolma@hotmail.com
Received 1 April 2010; accepted 19 October 2010
DOI 10.1002/ajpa.21454
Published online in Wiley Online Library
(wileyonlinelibrary.com).
V
V
C2011 WILEY-LISS, INC.
AMERICAN JOURNAL OF PHYSICAL ANTHROPOLOGY 000:000–000 (2011)
MATERIALS AND METHODS
Ethnic composition of the samples
The samples for this study were all male and include
Hungarian Roma (HuRo) (N5107) from different regions
of Hungary, who represent ‘‘mixed’’ Hungarian Roma. In
addition, we tested the Romungro Roma group (N519)
from Taktako
¨z (TaRo) and two Vlachian Roma groups
from Tiszavasva
´ri (TiRo; N529) and Tokaj (ToRo; N5
39) in Eastern Hungary, and the Carpathian Roma group
(SloRo; N562) from Slovakia. As a reference population
(Vo
¨lgyi et al., 2009), published Hungarian host population
data (Hu; N5230) were used; these samples had not
been selected according to subpopulation, and so are
expected to consist of 6–8% Roma and 92–94% non-Roma
like the general population. To investigate Asian origins,
Malaysian Indians (MaIn; N5301) were included. Ma-
laysian Indians are descendants of migrants from south-
ern India during the British colonization of Malaysia.
Published data from additional populations were included
in some analyses as described below.
DNA isolation and PCR amplification
Genomic DNA was extracted from blood samples using
the QIAmp Blood Mini Kit (Qiagen GmbH, Germany).
For saliva samples, an organic extraction method was
used for DNA isolation as described previously by Comey
et al. (1994).
Testing of Y-STR and Y-SNP markers
DNA was amplified with the PowerPlex Y (Promega,
USA) amplification kit including 12 Y-STR loci according
to the manufacturer’s instructions. Fragment sizes and
allele designations were determined with a 310 Genetic
Analyzer (Applied Biosystems, USA) using GeneScan
3.1.2 and Genotyper 2.5.2 software.
Y-SNP markers were tested to identify haplogroups as
described in Bı
´ro
´et al., (2009). A complete list of primers
and Taqman probes for binary markers is included in
Supporting Information (Table S1). The nomenclature of
haplogroups followed Jobling and Tyler-Smith (2003)
and Karafet et al. (2008).
Statistical analysis
Haplotype and haplogroup frequencies and their
diversity values were calculated as before (Nei, 1973).
For most purposes, haplogroups were combined into
the groups Y(xC-T), C, DE, F(xH,J2,K-T), H, J2,
K(xL,O,P-R), L, O, P(xR1a1), R1a1 and R2, so that pub-
lished sources could be used for comparison (Ramana et
al., 2001; Wells et al., 2001; Kivisild et al., 2003; Cinnio-
glu et al., 2004; Cordaux et al., 2004; Pericic
´et al.,
2005a; Sengupta et al., 2006; Thanseem et al., 2006;
Firasat et al., 2007; Gusma
˜o et al., 2008; Klaric
´et al.,
2009; Sharma et al., 2009). AMOVA analysis and popula-
tion pairwise genetic distances (Fst and Rst) were calcu-
lated using Arlequin 2.0 software (Schneider et al.,
2000). A multidimensional scaling (MDS) plot was
constructed with ViSta 7.9.2.4 software. Networks were
constructed using the Network 4.5.1.0 program (Bandelt
et al., 1999). Repeats of the locus DYS389I were sub-
tracted from the locus DYS389II, so that DYS389I and
DYS389b were used, according to common practice. The
rho statistic within the network program was used to
estimate the time to the most recent common ancestor
(TMRCA) of haplotypes within haplogroup H1a-M82.
The STR mutation rate was assumed to be 6.9 310
24
/
locus/25 years (Zhivotovsky et al., 2004).
RESULTS
Y-STR and Y-SNP analysis
Y-chromosomal haplotype and haplogroup frequencies,
and diversity values for the populations studied, are
shown in Table 1. Twelve Y-STR loci were analyzed in a
total of 787 unrelated males, belonging to five Roma and
two non-Roma populations. The most common Roma
haplotype was found 57 times and belonged to hap-
logroup H1a-M82. Fifty-one binary Y markers were also
analyzed in all samples. For the Hungarian reference
population, these data were already available (Vo
¨lgyi et
al., 2009); for the other 557 samples, additional typing
was performed. In all, 33 haplogroups were detected in
the populations investigated (Table 1). The haplotype
and haplogroup of each individual is listed in Supporting
Information Table S2. All these data have been submit-
ted to YHRD.
Phylogenetic analysis
Based on 10 Y-STR loci (DYS385 excluded), networks
were constructed within each haplogroup. We now con-
sider each haplogroup, and the variation within it.
Haplogroup H1a-M82. Figure 1A shows an MJ
network of 169 H1a-M82 haplotypes from the seven
populations investigated here and Iberian Roma from
Gusma
˜o et al. (2008). It includes a core haplotype cluster
shared by 94 individuals (55.6%) from eight populations.
Figure 1B depicts the H1a-M82 network of the Roma
groups alone. It has a star-like structure, where a large
number of individuals belong to a modal haplotype that
is shared by all Roma population groups. This modal
haplotype includes 79 individuals, 78% of all H1a-M82
chromosomes examined. The remaining haplotypes differ
from the modal haplotype by one or two mutational
steps. This cluster thus represents a set of closely
related haplotypes of the Roma, in which each haplotype
is likely to have descended from the modal haplotype,
irrespective of its current geographical location.
Based on our calculation, the age of accumulated STR
variation within H1a-M82 lineage for Malaysian Indians
was estimated as 8,707 61,760 years ago (95% CI
6,947–10,467 years), considering the most frequent hap-
lotype to be the founding lineage (Fig. 1A). The TMRCA
was 968 6336 years ago for the same haplogroup in the
Roma investigated and for Iberian Roma. These coales-
cent times were in agreement with reported data
(Gresham et al., 2001; Sengupta et al., 2006) and the
Roma TMRCA corresponds approximately to the date of
the migration from India to Europe.
Haplogroup J2a2-M67. The MJ network of the J2a2-
M67 lineage within the Roma groups (Fig. 1C) included
sixty-five J2a2-M67 chromosomes; out of those, 32
belonged to a core haplotype cluster. The core cluster
represents 49% of the J2a2-M67 chromosomes and pro-
vides further evidence for haplotype sharing between
Roma populations and a common origin of the Roma
groups.
The MJ network constructed from 37 haplotypes
within the J2*-M172 haplogroup originating from the
seven populations displayed a complex topology, where
2H. PAMJAV ET AL.
American Journal of Physical Anthropology
TABLE 1. Y chromosomal haplogroup distributions and diversities in seven populations investigated
Haplogroups Mutation
Hu
(n5230)
HuRo
(n5107)
TiRo
(n529)
ToRo
(n539)
TaRo
(n519)
SloRo
(n562)
MaIn
(n5301)
n%n%n%n%n%n%n%
C (xC1, C2, C3, C4a) M216 0 0.00 0 0.00 0 0.00 0 0.00 0 0.00 0 0.00 8 2.66
C3 M217 0 0.00 1 0.93 0 0.00 0 0.00 0 0.00 0 0.00 1 0.33
E1b1b1a M78 9 3.91 8 7.48 2 6.90 6 15.38 2 10.53 8 12.90 0 0.00
E1b1b1b M81 1 0.43 0 0.00 0 0.00 0 0.00 0 0.00 0 0.00 0 0.00
E1b1b1c M123 3 1.30 0 0.00 0 0.00 0 0.00 0 0.00 0 0.00 0 0.00
FT (xG, H, I, J, KT) M89 0 0.00 0 0.00 0 0.00 0 0.00 0 0.00 0 0.00 40 13.29
G (xG1a, G2a) M201 0 0.00 1 0.93 0 0.00 0 0.00 0 0.00 0 0.00 2 0.66
G1a P20 0 0.00 0 0.00 0 0.00 0 0.00 0 0.00 0 0.00 1 0.33
G2a P15 11 4.78 3 2.80 1 3.45 1 2.56 1 5.26 0 0.00 3 1.00
H1(xH1a) M52 0 0.00 0 0.00 0 0.00 0 0.00 0 0.00 0 0.00 2 0.66
H1a M82 11 4.78 34 31.78 17 58.62 8 20.51 2 10.53 19 30.65 57 18.94
I (xI1, I2a, I2b) M170 2 0.87 1 0.93 0 0.00 0 0.00 0 0.00 0 0.00 0 0.00
I1 M253 17 7.39 10 9.35 3 10.34 1 2.56 2 10.53 6 9.68 0 0.00
I2a P37.2 37 16.09 7 6.54 0 0.00 0 0.00 4 21.05 1 1.61 0 0.00
I2b M223 8 3.48 0 0.00 0 0.00 1 2.56 1 5.26 0 0.00 0 0.00
J M304 8 3.48 0 0.00 0 0.00 0 0.00 0 0.00 0 0.00 4 1.33
J2 M172 10 4.35 3 2.80 2 6.90 1 2.56 0 0.00 10 16.13 11 3.65
J2b M12 0 0.00 0 0.00 0 0.00 0 0.00 0 0.00 0 0.00 18 5.98
J2a2 M67 4 1.74 13 12.15 3 10.34 9 23.08 3 15.79 11 17.74 0 0.00
J2a3 M68 0 0.00 0 0.00 0 0.00 0 0.00 0 0.00 0 0.00 3 1.00
J2a5 M158 0 0.00 0 0.00 0 0.00 0 0.00 0 0.00 0 0.00 1 0.33
K M9 0 0.00 0 0.00 0 0.00 0 0.00 0 0.00 0 0.00 2 0.66
L M11 0 0.00 0 0.00 0 0.00 0 0.00 0 0.00 0 0.00 54 17.94
N1c Tat 2 0.87 1 0.93 0 0.00 0 0.00 0 0.00 0 0.00 0 0.00
O M175 0 0.00 0 0.00 0 0.00 0 0.00 0 0.00 0 0.00 5 1.66
O1a M119 0 0.00 0 0.00 0 0.00 0 0.00 0 0.00 0 0.00 2 0.66
O3 M122 0 0.00 0 0.00 0 0.00 0 0.00 1 5.26 0 0.00 3 1.00
P M45 0 0.00 2 1.87 0 0.00 0 0.00 0 0.00 0 0.00 2 0.66
R1 M173 3 1.30 0 0.00 0 0.00 0 0.00 0 0.00 0 0.00 0 0.00
R1a SRY10831 3 1.30 0 0.00 0 0.00 0 0.00 0 0.00 0 0.00 0 0.00
R1a1(xR1a1a) M198 57 24.78 15 14.02 0 0.00 8 20.51 1 5.26 5 8.06 56 18.60
R1b1 P25 43 18.70 8 7.48 1 3.45 3 7.69 2 10.53 2 3.23 0 0.00
R2 M124 1 0.43 0 0.00 0 0.00 1 2.56 0 0.00 0 0.00 26 8.64
Haplogroup diversity 0.86348 0.84606 0.62813 0.85223 0.92398 0.83406 0.86848
No. of STR haplotypes 194 58 11 30 17 27 272
Haplotype diversity 0.99686 0.97337 0.78876 0.91498 0.98830 0.93601 0.99898
Hungarian: Hu; Hungarian Roma: HuRo; Tiszavasvari Roma: TiRo; Tokaj Roma: ToRo; Taktako
¨z Roma: TaRo; Slovakian Roma: SloRo; Malaysian Indian: MaIn.
3PATERNAL GENETIC STUDY OF THE ROMA PEOPLE
American Journal of Physical Anthropology
the Roma chromosomes represented a very limited sub-
set of closely related haplotypes in a cluster including a
core haplotype and its two one-step neighbors (Fig. 1D
shows only the branch of the Romani groups).
Haplogroup E1b1b1a- M78. The E1b1b1a-M78 net-
work included 37 haplotypes from the Roma groups
investigated and one from the Iberian Roma (Fig. 1E).
There were two haplotype clusters separated by one
mutational step including only the investigated Roma
individuals (8–8 males). Other haplotypes were dis-
persed across the network.
Haplogroups I1-M253 and I2a-P37.2. Twenty-two
haplotypes from the five Roma groups and twelve from
the Iberian Roma were included in the network of
haplogroup I1-M253 (Fig. 1F). The modal haplotype was
shared by all six Roma groups, while most Iberian
haplotypes lay within two steps of the modal haplotype
(Fig. 1F).
The I2a-P37.2 network was constructed from 12 haplo-
types from three Roma groups investigated here and one
from the Iberian Roma (Fig. 1G). The network topology
was different and more divergent than the previous
cases. Only one haplotype was shared by two individuals
(TaRo and HuRo) and all other haplotypes were dis-
persed across the network.
Haplogroup R1a1-M198. The R1a1-M198 network was
constructed from 29 haplotypes from four Roma popula-
tion groups; this haplogroup was absent from the TiRo
group (Fig. 1H). The most frequent haplotype was
observed in seven males from three Romani groups.
Haplogroup R1a1 was the most common type of R1
chromosomes in the Hungarian reference and Malaysian
Indian populations (Table 1).
Haplogroup R1b1-P25. The R1b1-P25 network
included 16 haplotypes from the present study and 35
haplotypes from the Iberian Roma (Fig. 1I). The network
topology was again different from the previous networks.
There were two small haplotype clusters related to chro-
mosomes of Hungarian and Iberian Roma, but the
remaining haplotypes were divergent and confined to dif-
ferent Roma population groups. Besides the two small
clusters, no shared haplotypes were found in the Roma
population groups. This lineage is almost absent in India
(Sengupta et al., 2006).
Other lineages. Haplogroups C3-M217, G*- M201, I*-
M170, I2b-M223, N1c-Tat, R2-M124, and P*-M45 were
observed at very low frequencies in Roma groups.
Haplogroups F*-M89, L-M11and R2-M124 were found at
relatively high frequencies in the Indians (Table 1). In
contrast, the F*-M89 and L-M11 haplogroups were not
detected in the Hungarian and Roma populations.
Genetic structure
Genetic distances between the populations investi-
gated here (Table 2) are displayed as an MDS plot
(see Fig. 2). Three Roma groups cluster together (ToRo,
HuRo, and SloRo); the Hungarian reference and Malay-
sian Indian populations are separate (see Fig. 2), and
the Tiszavasvari Roma appear as an outlier, probably
because of their low haplogroup diversity.
To place these findings in a wider context, genetic dis-
tances between 41 populations, including the groups
investigated here, published Roma, and populations from
India, were calculated (Supporting Information Table
S3), and are again presented as MDS plot (see Fig. 3).
The Upper Castes from India, together with the Hungar-
C
O
L
O
R
C
O
L
O
R
Fig. 1. Median joining (MJ) networks of Y-STR haplotypes
within haplogroups E1b1b1a, H1, I1, I2a, J2*, J2a2, R1a1, and
R1b1 of Hungarian, Roma (present study) and Iberian Gypsies
(Gusma
˜o et al., 2008) and Malaysian Indian populations. The
circle sizes are proportional to the haplotype frequencies. The
smallest area is equivalent to one individual.
Fig. 2. Multidimensional scaling (MDS) plot of Fst genetic
distances of the investigated populations.
4H. PAMJAV ET AL.
American Journal of Physical Anthropology
ian reference and Croatian populations formed a fairly
compact cluster, whereas the Indian tribal groups and
Lower Castes were more widely scattered, but distinct
from the Upper Caste/European cluster. The Roma popu-
lations clustered together among the tribal/Lower Caste
Indians but near the Balkan and Anatolian samples.
The Malaysian Indian population lay between the Upper
Caste cluster and the mainly tribal groups of India.
Affinities between populations may result from their
common origin or from recent admixture resulting from
geographic proximity. In particular, genetic distances
between populations can generally be related to geo-
graphic distances, according to a model of isolation by
distance (Cavalli-Sforza and Bodmer, 1971). Genetic dis-
tances were therefore plotted against geographic distan-
ces. Rst distances (Supporting Information Table S4)
were calculated between the Roma and Indian popula-
tions in the present study, including some Roma data
from previous studies (Fu¨ redi et al., 1999, 2004;
Gresham et al., 2001; Pericic
´et al., 2005b; Roewer et al.,
2005; Nagy et al., 2007; Gusma
˜o et al., 2008; Petrejci-
kova et al., 2009), and as the results are shown in
Figure 4. It is striking that despite the general correla-
tion of genetic variation with geographical proximity the
opposite relationship is seen here. The Roma were genet-
ically closer to the geographically distant Malaysian
Indians than to some geographically close Roma groups,
as highlighted in Figure 4. This observation supports the
model of their origins involving long-distance migration.
DISCUSSION
The main objectives of the present study were to
create a male phylogeny of the Roma groups and to com-
pare them with populations from their likely ancestral
region, India, and also with previously-studied European
Roma groups. The European Roma have been described
as a mosaic of founder populations with a shared Indian
origin, and this model has been supported by linguistic
and cultural, as well as genetic, evidence (Fraser, 1992;
Gresham et al., 2001; Gusma
˜o et al., 2008).
To examine the genetic variation within the Roma
groups, we used evolutionarily stable binary markers
(SNPs) to define the haplogroup of each Y chromosome,
and then examined the STR-defined variation within
each haplogroup. The established geographical specificity
of Y haplogroups meant that the haplogroups observed
in the Roma could be assigned to four main geographical
origins, albeit with some uncertainty. These were ances-
tral Indian (H1a-M82), Middle Eastern/West Asian
(J2a2-M67, J2*-M172 and E1b1b1a-M78), indigenous
European (I1-M253 and I2a-P37.2), and Central Asian/
West Eurasian (R1a1-M198 and R1b1-P25) regions. The
presence of these lineages in the Roma gene pool could
then be interpreted in the light of their migration route
from India via the Balkans to the Carpathian Basin and
admixture with other populations. We now discuss the
likely contributions from each of these sources.
Ancestral Indian lineage
The earliest settlers of the Indian subcontinent have
been postulated to be the ancestors of the current tribal
groups (Majumder, 1998). These aboriginal inhabitants
carry some characteristic Y lineages that include
haplogroups F, H, C (without M217) and O (Cordaux
et al., 2004). H*-M69 is the most frequent Y-haplogroup
TABLE 2. Fst and Fst Pvalues of the populations investigated
Fst values
Hu HuRo TiRo ToRo TaRo SloRo MaIn
Fst Pvalues Hu * 0.06099 0.18517 0.05833 0.01891 0.09334 0.08035
HuRo 0.00000 60.0000 * 0.03899 0.00979 0.01815 0.00701 0.05982
TiRo 0.00000 60.0000 0.01300 60.0003 * 0.10377 0.13358 0.04039 0.12894
ToRo 0.00007 60.0000 0.14190 60.0011 0.00092 60.0001 * 0.01830 0.01305 0.06828
TaRo 0.10004 60.0010 0.11963 60.0010 0.00108 60.0001 0.15229 60.0012 * 0.03479 0.07861
SloRo 0.00000 60.0000 0.14281 60.0011 0.02685 60.0005 0.13962 60.0011 0.04889 60.0007 * 0.07761
MaIn 0.00000 60.0000 0.00000 60.0000 0.00000 60.0000 0.00000 60.0000 0.00031 60.0001 0.0000060.0000 *
Hungarian: Hu; Hungarian Roma: HuRo; Tiszavasvari Roma: TiRo; Tokaj Roma: ToRo; Taktako
¨z Roma: TaRo; Slovakian Roma: SloRo; Malaysian Indian: MaIn. The nonsignificant
(P50.05) Fst values are highlighted bold.
* Separates the Fst values (upper triangle) from the Fst P values (lower triangle).
5PATERNAL GENETIC STUDY OF THE ROMA PEOPLE
American Journal of Physical Anthropology
in tribal groups (30%) and Lower Castes (25%), (Sen-
gupta et al., 2006; Thanseem et al., 2006). It is generally
rare outside of South Asia but is strikingly common
among the Roma, as the H1a-M82 subgroup. This pro-
vides strong evidence in support of an Indian origin.
Median-joining networks of H1a-M82 demonstrate the
sharing of Y-chromosomal haplotypes between all Roma
groups investigated in the present study (Fig. 1A,B).
This common lineage accounted for more than 50% of
haplogroup H1a-M82 chromosomes, and so descent from
a single shared Indian ancestor is suggested. The low
presence of the H1a-M82 lineage in the Hungarian refer-
ence population (4.8%) can be explained by the unse-
lected sample collection which, on the basis of population
frequencies, is expected to contain 6–8% Roma.
Middle Eastern/West Asian lineages
The origin of Haplogroups J and E is an open ques-
tion. The differentiation of Haplogroup J, observed both
with binary and microsatellite markers, points to the
Middle East as its likely homeland, but haplogroup E
(E-M35) probably originated in Eastern Africa (Semino
et al., 2004). According to previous studies the observed
decreasing frequency gradients of haplogroups E and J
reached southwestern Europe as a result of demic
expansions of Neolithic agriculturalists from the Middle
East, and the markers represent the male contribution
of a demic diffusion of farmers (Semino et al., 2000).
Haplogroup J is common in India, but E is very rare.
The J2a2-M67 lineages displayed relatively high
frequencies in all Roma population groups, a lower
frequency in the host Hungarian population and an
absence in Indians. This finding suggests genetic admix-
ture with a population outside India during Roma migra-
tions to the Carpathian Basin. On the basis of the pres-
ent study, taking into account the sharing of haplotypes
with the Iberian Roma, we suggest that the J2a2-M67
chromosomes (Fig. 1C), at least in part, were incorpo-
rated into the Roma gene pool before their exodus from
the Balkans.
The E1b1b1a-M78 chromosome was observed in all the
Roma groups studied and in the Hungarian reference
population, but it was absent from Malaysian Indians
and almost all Indian populations previously investi-
gated. The majority of the haplotypes found (55%) in
Roma form two distinct clusters in network analysis,
indicating a close relationship between these groups
(Fig. 1E). Previous studies noted that some E1b1b1a-
M78 chromosomes might have been incorporated into
the Roma gene pool during their migration, probably in
the Balkans (Gresham et al., 2001; Pericic
´et al., 2005b;
Fig. 3. Multidimensional scaling plot of Fst genetic distances of the investigated and compared populations. 1. Koraga, Tribe,
South India (Cordaux et al., 2004). 2 Yerava, Tribe, South India (Cordaux et al., 2004). 3. Koya, Tribe, South India (Kivisild et al.,
2003). 4. Lambadi, Upper Caste, South India (Kivisild et al., 2003). 5. Konka, Upper Caste, West India (Kivisild et al., 2003). 6.
Punjab, Upper Caste, North India (Kivisild et al., 2003). 7. Naikpod, Tribe, South India (Thanseem et al., 2006). 8. Bagata, Tribe,
South India (Ramana et al., 2001). 9. Poroja, Tribe, South India (Ramana et al., 2001). 10. Vı
´z Brahmin, Upper Caste, South India
(Ramana et al., 2001). 11. Telega, Lower Caste (Thanseem et al., 2006). 12. Kallar, Lower Caste (Wells et al., 2001). 13. Sourash-
tran, Upper Caste, North India (Wells et al., 2001). 14. Austro-Asiatic, Tribe, mixed (Sengupta et al., 2006). 15. Dravidian, Tribe,
mixed (Sengupta et al., 2006). 16. Tibeto-Burman, Tribe, mixed (Sengupta et al., 2006). 17. Indo-European, Upper Caste, mixed
(Sengupta et al., 2006). 18. Anatolia (Cinnioglu et al., 2004). 19. Croatia (Pericic
´et al., 2005a). 20. Herzegovina (Pericic
´et al.,
2005a). 21. Albania, Kosovo (Pericic
´et al., 2005a). 22. Macedonia (Pericic
´et al., 2005a). 23. Macedonian Romani (Pericic
´et al.,
2005a). 24. Bajasi, Medimurje (Klaric
´et al., 2009). 25. Bajasi, Baranja (Klaric
´et al., 2009). 26. Greek (Firasat et al., 2007). 27. Hun-
garian (present study). 28. Hungarian Roma (present study). 29. Tiszavasvari Roma (present study). 30. Tokaji Roma (present
study). 31. Taktakozi Romungros (present study). 32. Slovakian Roma (present study). 33 Malaysian Indian (present study). 34. Ibe-
rian Gypsy (Gusmao et al., 2008). 35. J&K Kashmiri Pandits, Upper Caste, North India (Sharma et al., 2009). 36. Uttar Pradesh
(South) Kols, Tribe, North India (Sharma et al., 2009). 37. Bihar Brahmins, Upper Caste, East India (Sharma et al., 2009). 38. West
Bengal Brahmins, Upper Caste, North East India (Sharma et al., 2009). 39. Maharashtra Brahmins, Upper Caste, West India
(Sharma et al., 2009). 40. Korean (Wells et al., 2001). 41. British (Wells et al., 2001).
Gusma
˜o et al., 2008), since its frequency is considerable
in these regions (Cruciani et al., 2004), and our findings
are consistent with this possibility.
Indigenous European lineages
Haplogroup I (M170) is widespread over Europe but
virtually absent elsewhere, including the Near East, sug-
gesting that it arose in Europe (Rootsi et al., 2004).
I1-M253 chromosomes were detected at relatively high
frequencies in all six Roma groups compared, and 61.8%
of these males share a modal haplotype (Fig. 1F), sug-
gesting a common origin of the Roma as well. According
to Gusma
˜o et al. (2008), the I1-M253 marker occurred at
different proportions in Roma groups compared to their
host populations, but can be explained by genetic drift.
The dispersed network pattern observed for I2a-P37.2
chromosomes in the present study (Fig. 1G) suggests
that they might have been incorporated into the Roma
gene pool by admixture at different times in different
geographical regions.
Central Asian/West Eurasian lineage
Haplogroup R1a is particularly common in a large
region extending from South Asia to Central Europe and
Scandinavia (Underhill et al., 2010). R1a1-M198 was
present in all the population groups, except the
Tiszavasvari Roma. This lineage was absent from the
Bulgarian and Iberian Roma as well (Gresham et al.,
2001; Gusma
˜o et al., 2008). Thus this lineage might
have been introduced into their gene pool by host popu-
lation admixture in the Carpathian Basin later, after
fragmentation of the Roma groups from the Balkan
regions. The network analysis of the Roma groups sup-
ports this possibility (Fig. 1H). Alternatively, some Roma
groups might have lost this haplogroup by genetic drift.
Haplogroup R1b is most common in Western Europe,
and it is present at lower frequencies throughout
Eastern Europe, Western Asia, and Central Asia
(Cruciani et al., 2010). The R1b1-P25 haplogroup was
present in all Roma and the Hungarian reference popu-
lations, but it was absent from the Indians. The network
analysis showed a diverse pattern (Fig. 1I), which sug-
gests that most of the R1b1-P25 chromosomes might
have been incorporated into the Roma gene pool by
different admixture events at different times.
Other lineages
Haplogroups C3-M217, N1c-Tat, R2-124, and P*-M45
observed in the Roma groups are uncommon in Euro-
pean populations and this supports the idea that they
were acquired by the Roma before their entrance to
Europe (Gusma
˜o et al., 2008).
Haplogroups F and L are common in South Asian pop-
ulations. F and its paraphyletic subgroup F*-M89 might
share a common demographic history in India with H,
R2, and L (Sengupta et al., 2006). The presence of the
F*-M89, L-M11, and R2-M124 haplogroups in the Malay-
sian Indians is thus expected, and the presence of R2 at
low frequency in one Roma population provides some
additional support for an Indian origin.
Genetic structure
To evaluate genetic relationships between the Roma
groups, two different methods were used to establish
genetic distances: calculation of Fst values for hap-
Fig. 4. Phylogeographical analysis based on Rst distances of the Roma and Malaysian Indian populations (Fu¨ redi et al., 1999;
Gresham et al., 2001; Fu¨ redi et al., 2004; Pericic
´et al., 2005b; Nagy et al., 2007; Gusma
˜o et al., 2008; Petrejcikova et al., 2009).
7PATERNAL GENETIC STUDY OF THE ROMA PEOPLE
American Journal of Physical Anthropology
logroups and Rst values for haplotypes. The general
structure of the distance matrix was depicted by an
MDS plot, where Roma and non-Roma groups investi-
gated are clearly separated (see Fig. 2). This segregation
was statistically significant between some pairs of popu-
lation groups (Table 2).
In a wider comparison of 41 populations, all Roma
groups in the MDS analysis were affiliated with Balkan
and Anatolian populations (see Fig. 3). This observation
is consistent with the possibility that these areas pro-
vided an important geographic link between the Middle
East, Asia and Europe during their genetic journey. The
Hungarian reference population shows closer genetic af-
finity to the Upper Castes from India than to the Balkan
populations, which may reflect the presence of a rela-
tively high frequency of the R1a1 lineage in Hungarians
as well as in the Indian Upper Castes (Sharma et al.,
2009; Vo
¨lgyi et al., 2009). This may support the hypothe-
sis of the R1a1 haplogroup originating from common
ancestors in Central Asia. The surprisingly low pairwise
genetic distances (Rst) between Roma groups and
Indians (see Fig. 4) supports the assumption of a
common Indian origin as well.
What do these results tell us about the evolutionary
factors that have shaped the structure of the Roma
paternal gene pool? The genetic distances observed
among Roma and non-Roma populations can be inter-
preted as reflecting common ancestry, genetic drift, and
gene flow. The latter two processes could have increased
genetic distances among Roma populations, whereas
admixture could also have the effect of decreasing
genetic distances between Roma and non-Roma popula-
tions. In all, these findings provide a coherent genetic
history of the Roma, broadly consistent with historical
sources.
ACKNOWLEDGMENTS
The authors thank Dr. Eva Susa (General Director of
the Network of Forensic Science Institutes) for her sup-
port, and Chris Tyler-Smith for English editing. They
also thank all the sample donors and the laboratory
assistants.
LITERATURE CITED
Bandelt HJ, Forster P, Ro
¨hl A. 1999. Median-joining networks
for inferring intraspecific phylogenies. Mol Biol Evol 16:37–
48.
´ro
´AZ, Zala
´nA,Vo
¨lgyi A, Pamjav H. 2009. A Y-chromosomal
comparison of the Madjars (Kazakhstan) and the Magyars
(Hungary). Am J Phys Anthropol 139:305–310.
Cavalli-Sforza LL, Bodmer WF. 1971. The genetics of human
populations. San Francisco: Freeman.
Cinnioglu C, King R, Kivisild T, Kalfoglu E, Atasoy S, Cavalleri
GL, Lillie AS, Roseman CC, Lin AA, Prince K, Oefner PJ,
Shen P, Semino O, Cavalli-Sforza LL, Underhill PA. 2004.
Excavating Y-chromosome haplotype strata in Anatolia. Hum
Genet 114:127–148.
Comey CT, Koons BW, Presley KW, Smerick JB, Sobieralski CA,
Stanley DM. 1994. DNA extraction strategies for amplified
fragment length polymorphism analysis. J Forensic Sci
39:1254–1269.
Cordaux R, Aunger R, Bentley G, Nasidze I, Sirajuddin SM,
Stoneking M. 2004. Independent origins of Indian caste and
tribal paternal lineages. Curr Biol 14:231–235.
Cruciani F, La Fratta R, Santolamazza P, Sellitto D, Pascone R,
Moral P, Watson E, Guida V, Colomb EB, Zaharova B, Lavi-
nha J, Vona G, Aman R, Cali F, Akar N, Richards M, Torroni
A, Novelletto A, Scozzari R. 2004. Phylogeographic analysis of
haplogroup E3b (E-M215) Y chromosomes reveals multiple
migratory events within and out of Africa. Am J Hum Genet
74:1014–1022.
Cruciani F, Trombetta B, Sellitto D, Massaia A, Destro-Bisol G,
Watson E, Beraud Colomb E, Dugoujon JM, Moral P, Scozzari
R. 2010. Human Y chromosome haplogroup R-V88: a paternal
genetic record of early mid Holocene trans-Saharan connec-
tions and the spread of Chadic languages. Eur J Hum Genet
18:800–807.
Egyed B, Brandsta
¨tter A, Irwin JA, Pa
´da
´r Z, Parsons TJ,
Parson W. 2007. Mitochondrial control region sequence varia-
tions in the Hungarian population: analysis of population
samples from Hungary and from Transylvania (Romania).
Forensic Sci Int Genet 1:158–162.
Firasat S, Khaliq S, Mohyuddin A, Papaioannou M, Tyler-Smith
C, Underhill PA, Ayub Q. 2007. Y-chromosomal evidence for a
limited Greek contribution to the Pathan population of
Pakistan. Eur J Hum Genet 15:121–126.
Fraser A. 1992. The Gypsies. Oxford: Blackwell Publishers.
Fu¨ redi S, Egyed B, Csikai M, Osztrovics A, Woller J, Padar Z.
2004. Y-STR haplotyping in seven Hungarian (speaking) pop-
ulations. IV. International Forensic Y-User Workshop, Berlin,
November 18–20, 2004; Available at: http://www.ystr.org/.
Fu¨ redi S, Woller J, Pa
´da
´r Z, Angyal M. 1999. Y-STR haplotyp-
ing in two Hungarian populations. Int J Legal Med 113:38–
42.
Gresham D, Morar B, Underhill PA, Passarino G, Lin AA, Wise
C, Angelicheva D, Calafell F, Oefner PJ, Shen P, Tournev I,
de Pablo R, Kucinskas V, Perez-Lezaun A, Marushiakova E,
Popov V, Kalaydjieva L. 2001. Origins and divergence of the
Roma (Gypsies). Am J Hum Genet 69:1314–1331.
Gusma
˜o A, Gusma
˜o L, Gomes V, Alves C, Calafell F, Amorim A,
Prata MJ. 2008. A perspective on the history of the Iberian
Gypsies provided by phylogeographic analysis of Y-chromo-
some lineages. Ann Hum Genet 72:215–227.
Irwin J, Egyed B, Saunier J, Szamosi G, O’Callaghan J, Padar
ZS, Parsons TJ. 2007. Hungarian mtDNA population data-
bases from Budapest and the Baranya county Roma. Int J
Legal Med 121:377–383.
Jobling MA, Tyler-Smith C. 2003. The human Y chromosome: an
evolutionary marker comes of age. Nat Rev Genet 4:598–612.
Karafet TM, Mendez FL, Meilerman MB, Underhill PA, Zegura
SL, Hammer MF. 2008. New binary polymorphisms reshape
and increase resolution of the human Y chromosomal hap-
logroup tree. Genome Res 18:830–838.
Kivisild T, Rootsi S, Metspalu M, Mastana S, Kaldma K, Parik
J, Metspalu E, Adojaan M, Tolk HV, Stepanov V, Go
¨lge M,
Usanga E, Papiha SS, Cinniog
˘lu C, King R, Cavalli-Sforza L,
Underhill PA, Villems R. 2003. The genetic heritage of the
earliest settlers persists both in Indian tribal and caste popu-
lations. Am J Hum Genet 72:313–332.
Klaric
´IM, Salihovic
´MP, Lauc LB, Zhivotovsky LA, Rootsi S,
Janic
´ijevic
´B. 2009. Dissecting the molecular architecture and
origin of Bayash Romani patrilineages: genetic influences
from South-Asia and the Balkans. Am J Phys Anthropol
138:333–342.
Majumder PP. 1998. People of India: biological diversity and
affinities. Evol Anthropol 6:100–110.
Nagy M, Henke L, Henke J, Chatthopadhyay PK, Vo
¨lgyi A,
Zala
´n A, Peterman O, Bernasovska
´J, Pamjav H. 2007.
Searching for the origin of Romanies: Slovakian Romani. Jats
of Haryana and Jat Sikhs Y-STR data in comparison with dif-
ferent Romani populations. Forensic Sci Int 169:19–26.
Nei M. 1973. Analysis of gene diversity in subdivided popula-
tions. Proc Natl Acad Sci USA 70:3321–3323.
Pericic
´M, Klaric
´IM, Lauc LB, Janic
´ijevic B, Dordevic
´D,
Efremovska L, Rudan P. 2005b. Population genetics of 8 Y
chromosome STR loci in Macedonians and Macedonian
Romani (Gypsy). Forensic Sci Int 154:257–261.
Pericic
´M, Lauc LB, Klaric
´IM, Rootsi S, Janic
´ijevic B, Rudan I,
Terzic
´R, Colak I, Kvesic
´A, Popovic
´D, Sijacki A, Behluli I,
Dordevic D, Efremovska L, Bajec DD, Stefanovic
´BD, Villems
R, Rudan P. 2005a. High-resolution phylogenetic analysis of
8H. PAMJAV ET AL.
American Journal of Physical Anthropology
southeastern Europe traces major episodes of paternal
gene flow among Slavic populations. Mol Biol Evol 22:1964–
1975.
Petrejcikova E, Sotak M, Bernasovska J, Bernasovsky I, Sovi-
cova A, Bozikova A, Boronova I, Svickova P, Gabrikova D,
Macekova S. 2009. Y-Haplogroup frequencies in the Slovak
Romany population. Anthropol Sci 117:89–94.
Ramana GV, Su B, Jin L, Singh L, Wang N, Underhill P, Chak-
raborty R. 2001. Y-chromosome SNP haplotypes suggest evi-
dence of gene flow among caste, tribe, and the migrant Siddi
populations of Andhra Pradesh, South India. Eur J Hum
Genet 9:695–700.
Roewer L, Croucher PJ, Willuweit S, Lu TT, Kayser M, Lessig R,
de Knijff P, Jobling MA, Tyler-Smith C, Krawczak M. 2005. Sig-
nature of recent historical events in the European Y-chromo-
somal STR haplotype distribution. Hum Genet 116:279–291.
Rootsi S, Magri C, Kivisild T, Benuzzi G, Help H, Bermisheva
M, Kutuev I, Barac
´L, Pericic
´M, Balanovsky O, Pshenichnov
A, Dion D, Grobei M, Zhivotovsky LA, Battaglia V, Achilli A,
Al-Zahery N, Parik J, King R, Cinniog
˘lu C, Khusnutdinova E,
Rudan P, Balanovska E, Scheffrahn W, Simonescu M, Brehm
A, Goncalves R, Rosa A, Moisan JP, Chaventre A, Ferak V,
Fu¨ redi S, Oefner PJ, Shen P, Beckman L, Mikerezi I, Terzic
´
R, Primorac D, Cambon-Thomsen A, Krumina A, Torroni A,
Underhill PA, Santachiara-Benerecetti AS, Villems R, Semino
O. 2004. Phylogeography of Y-chromosome haplogroup I
reveals distinct domains of prehistoric gene flow in Europe.
Am J Hum Genet 75:128–137.
Schneider S, Roessli D, Excoffier L. 2000. Arlequin: a
software for population genetics data analysis. Ver 2.000,
Genetics and Biometry Lab, Dept. of Anthropology, University
of Geneva.
Semino O, Magri C, Benuzzi G, Lin AA, Al-Zahery N, Battaglia
V, Maccioni L, Triantaphyllidis C, Shen P, Oefner PJ, Zhivotov-
sky LA, King R, Torroni A, Cavalli-Sforza LL, Underhill PA,
Santachiara-Benerecetti AS. 2004. Origin, diffusion, and dif-
ferentiation of Y-chromosome haplogroups E and J: inferences
on the neolithization of Europe and later migratory events in
the Mediterranean area. Am J Hum Genet 74:1023–1034.
Semino O, Passarino G, Oefner PJ, Lin AA, Arbuzova S,
Beckman LE, De Benedictis G, Francalacci P, Kouvatsi A,
Limborska S, Marcikiae M, Mika A, Mika B, Primorac D,
Santachiara-Benerecetti AS, Cavalli-Sforza LL, Underhill PA.
2000. The genetic legacy of Paleolithic Homo sapiens sapiens
in extant Europeans: a Y chromosome perspective. Science
290:1155–1159.
Sengupta S, Zhivotovsky LA, King R, Mehdi SQ, Edmonds
CA, Chow CE, Lin AA, Mitra M, Sil SK, Ramesh A, Usha
Rani MV, Thakur CM, Cavalli-Sforza LL, Majumder PP,
Underhill PA. 2006. Polarity and temporality of high-resolu-
tion Y-chromosome distributions in India identify both indig-
enous and exogenous expansions and reveal minor genetic
influence of central Asian pastoralists. Am J Hum Genet
78:202–221.
Sharma S, Rai E, Sharma P, Jena M, Singh S, Darvishi K, Bhat
AK, Bhanwer AJ, Tiwari PK, Bamezai RN. 2009. The Indian
origin of paternal haplogroup R1a1* substantiates the autoch-
thonous origin of Brahmins and the caste system. J Hum
Genet 54:47–55.
Szuhay P. 2005. The self-definitions of Roma ethnic groups and
their perceptions of other Roma groups. In: Keme
´ny I, editor.
Roma of Hungary. New York: Columbia University Press. p
237–247.
Thanseem I, Thangaraj K, Chaubey G, Singh VK, Bhaskar LV,
Reddy BM, Reddy AG, Singh L. 2006. Genetic affinities among
the lower castes and tribal groups of India: inference from Y
chromosome and mitochondrial DNA. BMC Genet 7:42–52.
Underhill PA, Myres NM, Rootsi S, Metspalu M, Zhivotovsky
LA, King RJ, Lin AA, Chow CE, Semino O, Battaglia V,
Kutuev I, Ja
¨rve M, Chaubey G, Ayub Q, Mohyuddin A, Mehdi
SQ, Sengupta S, Rogaev EI, Khusnutdinova EK, Pshenichnov
A, Balanovsky O, Balanovska E, Jeran N, Augustin DH, Bal-
dovic M, Herrera RJ, Thangaraj K, Singh V, Singh L,
Majumder P, Rudan P, Primorac D, Villems R, Kivisild T.
2010. Separating the post-Glacial coancestry of European and
Asian Y chromosomes within haplogroup R1a. Eur J Hum
Genet 18:479–484.
Vajda I. 1997. Roma szociolo
´giai tanulma
´nyok—Perife
´ria
´n. Bu-
dapest: Ariadne Alapı
´tva
´ny.
Vo
¨lgyi A, Zala
´n A, Szvetnik E, Pamjav H. 2009. Hungarian pop-
ulation data for 11 Y-STR and 49 Y-SNP markers. Forensic
Sci Int Genet 3:27–28.
Wells RS, Yuldasheva N, Ruzibakiev R, Underhill PA, Evseeva
I, Blue-Smith J, Jin L, Su B, Pitchappan R, Shanmuga-
lakshmi S, Balakrishnan K, Read M, Pearson NM, Zerjal T,
Webster MT, Zholoshvili I, Jamarjashvili E, Gambarov S, Nik-
bin B, Dostiev A, Aknazarov O, Zalloua P, Tsoy I, Kitaev M,
Mirrakhimov M, Chariev A, Bodmer WF. 2001. The Eurasian
heartland: a continental perspective on Y-chromosome diver-
sity. Proc Natl Acad Sci USA 98:10244–10249.
Zhivotovsky LA, Underhill PA, Cinniog
˘lu C, Kayser M, Morar
B, Kivisild T, Scozzari R, Cruciani F, Destro-Bisol G, Spedini
G, Chambers GK, Herrera RJ, Yong KK, Gresham D, Tournev
I, Feldman MW, Kalaydjieva L. 2004. The effective mutation
rate at Y chromosome short tandem repeats, with application
to human population-divergence time. Am J Hum Genet
74:50–61.
9PATERNAL GENETIC STUDY OF THE ROMA PEOPLE
American Journal of Physical Anthropology
... Wie in Kapitel 8 erläutert, weisen die H1a-M69-Mutation und die historischen Aufzeichnungen auf Indien als Urheimatland der Roma hin. Überraschenderweise berichten mehrere Studien über eine unerwartete Häufigkeit von I1-M253 bei Roma Populationen in Europa (Gusmão et al. 2008;Petrejčíková et al. 2009;Pamjav et al. 2011). Dies ist überraschend, da Mutationen der Haplogruppen I-M170 in Südasien praktisch nicht vorkommen. ...
... Die H1a-M69-Mutation kommt bei vielen Roma-Gruppen in Europa in signifikanter Häufigkeit vor, z. B. zu 17 Prozent bei den iberischen Roma(Gusmão et al. 2008) und zu 32 Prozent bei den ungarischen Roma(Pamjav et al. 2011). Darüber hinaus wurden in einer 2012 veröffentlichten Studie vonRai et al. (2012) Daten der Haplogruppe H1a-M69 analysiert, die aus 10.000 globalen Proben stammten. ...
Book
Full-text available
Die englische Originalausgabe dieser Monografie erschien 2021 unter den Titel The Prehistory of Language: A Triangulated Y-Chromosome-Based Perspective. Ich bin Linguist und habe diese Übersetzung für meine Kollegen aus dem Sprachbereich angefertigt. Dennoch hoffe ich, dass andere akademische Forscher sich für diese Arbeit interessieren werden, insbesondere Genetiker, Archäologen, Anthropologen und Geowissenschaftler. Diejenigen, die ein allgemeines Interesse an Sprache und Genetik haben, sind ebenfalls herzlich eingeladen, meine Monografie zu lesen. In den letzten vierzig Jahren haben Forscher dank der Sequenzierungstechnologie die molekulargenetische Variation genutzt, um die menschliche Evolutionsgeschichte zu erforschen. Einige haben versucht, diese neue Forschungsrichtung noch weiter auszudehnen mit der Idee, dass genetische Werkzeuge die Vorgeschichte der Sprache erklären können. Da wir unsere Gene und unsere Muttersprache von unseren Eltern geerbt haben, sollten genetische und sprachliche Variationen gut miteinander korrelieren. Die Entschlüsselung der sprachlichen Vorgeschichte anhand genetischer Daten erfordert jedoch die Klärung mehrerer Fragen. Sollen wir die heutige DNA oder die alte DNA oder beides verwenden? Sollen wir mitochondriale, Y-Chromosomen- oder autosomale Marker verwenden? Sollten wir Modelle der Sprachvorgeschichte mit statistischen Methoden erstellen? Oder sollten wir Modelle mit einer Synthese aus archäologischen und paläoklimatologischen Daten erstellen? Ich schlage vor, dass wir eine triangulierte Y-Chromosom-basierte Modellierung als methodische Lösung für die Entschlüsselung der Vorgeschichte der Sprache mit genetischen Werkzeugen verwenden. In meiner Forschung wurden mindestens 110 sprachlich informative Y-Chromosom-Mutationen identifiziert. Die Evolutionsgeschichte dieser Mutationen deutet darauf hin, dass die Geschichte der Sprache vor etwa 100 000 Jahren begann, als der Homo sapiens aus Afrika auswanderte. Nachfolgende Migrationen sowie kulturelle und evolutionäre Anpassungen erklären dann die Ausbreitung der Sprache in alle Teile der Welt. Zu dieser Ausbreitung gehören der Mungo-See-Mensch in Australien, die Mammutsteppen Eurasiens, die feuchte Phase der Sahara-Wüste, die bidirektionale Migration von Rentierzüchtern entlang des Polarkreises, der Ackerbau entlang der Flüsse des Amazonas-Regenwaldes, die Einführung des Reisanbaus in Südasien, Malaria in den Tropen und Hypoxie auf dem tibetischen Plateau.
... Finally, the Y-chromosome data identify India as the putative homeland of the Romani. The H1a-M69 mutation attains a significant frequency among many of the Roma groups in Europe, such as 17% among the Iberian Roma (Gusmão et al. 2008), and 32% among the Hungarian Roma (Pamjav et al. 2011). Moreover, a 2012 study published by Rai et al. 2012 analyzed haplogroup H1a-M69 data that was taken from 10 thousand global samples. ...
... As explained in Chapter 8, the H1a-M69 mutation and the historical record point to India as the putative homeland of the Romani people. Surprisingly, several studies report an unexpected frequency of I1-M253 found among Romani populations in Europe (Gusmão et al. 2008;Petrejčíková et al. 2009;Pamjav et al. 2011. This is surprising because haplogroups I-M170 mutations are virtually absent in South Asia. ...
Book
Full-text available
To deciphering the prehistory of language, I take advantage of a new research direction that arose roughly 40 years ago in the field of genetics. Researchers utilize molecular genetic variation to explore human evolutionary history. Some have attempted to extend this new research direction even further with the idea that genetic tools can explain the prehistory of language. Genetic and linguistic variation should have a good correlation as we inherit our genes and the mother tongue from our parents. Nevertheless, deciphering language prehistory with genetic data required resolution of several questions: Contemporary DNA, or ancient DNA, or both? Mitochondrial, Y-chromosome, or autosomal markers? Should we build models of language prehistory with statistical methods? Or should we build models with a synthesis of archaeological and paleo-climatological data? With this monograph, I suggest that we employ triangulated Y-chromosome-based modeling. My research had identified at least 110 linguistically informative Y-chromosome mutations. The evolutionary history of these mutations suggests that the story of language begins over 100 thousand years ago when Homo sapiens migrated out of Africa. Subsequent migrations as well as cultural and evolutionary adaptations then explain the expansion of language to the four corners of the globe. A discussion of this expansion includes Lake Mungo man in Australia, the mammoth steppes of Eurasia, the humid phase of the Sahara Desert, the bidirectional migration of reindeer herders along the Arctic Circle, raised field agriculture along the rivers of the Amazon rain forest, the arrival of rice agriculture in South Asia, malaria in the tropics, and hypoxia on the Tibetan Plateau.
... Castella et al. 2011;Varszegi et al. 2014), "indigenous" (e.g. Pamjav et al. 2011), or "non-Roma." "Non-Roma" and "Caucasian" are sometimes used interchangeably (e.g. ...
Article
Full-text available
Argument Moreau (2019) has raised concerns about the use of DNA data obtained from vulnerable populations, such as the Uighurs in China. We discuss another case, situated in Europe and with a research history dating back 100 years: genetic investigations of Roma. In our article, we focus on problems surrounding representativity in these studies. We claim that many of the circa 440 publications in our sample neglect the methodological and conceptual challenges of representativity. Moreover, authors do not account for problematic misrepresentations of Roma resulting from the conceptual frameworks and sampling schemes they use. We question the representation of Roma as a “genetic isolate” and the underlying rationales, with a strong focus on sampling strategies. We discuss our results against the optimistic prognosis that the “new genetics” could help to overcome essentialist understandings of groups.
... Both of these mutations differed from those found in Japan and Korea. The authors speculated that the high frequency of URAT1 mutant alleles in these populations might be explained by the migration of a fairly small group of around 1000 individuals from India approximately 1000 years ago [31][32][33]. Thus, founder effects likely contributed to the high frequency of URAT1 mutant alleles among the Roma. ...
Article
Full-text available
A genetic defect in urate transporter 1 (URAT1) is the major cause of renal hypouricemia (RHUC). Although RHUC is detected using a serum uric acid (UA) concentration <2.0 mg/dL, the relationship between the genetic state of URAT1 and serum UA concentration is not clear. Homozygosity and compound heterozygosity with respect to mutant URAT1 alleles are associated with a serum UA concentration of <1.0 mg/dL and are present at a prevalence of ~0.1% in Japan. In heterozygous individuals, the prevalence of a serum UA of 1.1–2.0 mg/dL is much higher in women than in men. The frequency of mutant URAT1 alleles is as high as 3% in the general Japanese population. The expansion of a specific mutant URAT1 allele derived from a single mutant gene that occurred in ancient times is reflected in modern Japan at a high frequency. Similar findings were reported in Roma populations in Europe. These phenomena are thought to reflect the ancient migration history of each ethnic group (founder effects). Exercise-induced acute kidney injury (EI-AKI) is mostly observed in individuals with homozygous/compound heterozygous URAT1 mutation, and laboratory experiments suggested that a high UA load on the renal tubules is a plausible mechanism for EI-AKI.
... Roma people are mainly referred to in the literature as Asian, regarding their ancestry. Recent studies argue that the ancestors of Roma people presently living in Europe can be traced back to their ancestral geographical origins in Northwestern India (Pamjav et al. 2011;Martínez-Cruz et al. 2016). In opposition to this position, other geneticists argue that this type of differentiation is not tenable or useful. ...
Article
Full-text available
Racial/ethnic categorization in medicine presents challenges for clinicians and patients alike. Challenges arise because racial/ethnic identities do not match with objective biological traits, and at the same time, these identities do have medical consequences in a racially and ethnically stratified society. Three major epistemological approaches – biological realism, eliminativism, and constructivism – dominate scientific theorization on the consequences of racial/ethnic categorization in medicine. In this paper, I present a case study of Hungarian medical genetic discourse that focuses on the possible applications of race/ethnicity regarding Roma and non-Roma patients. In applying the methods of constructivist grounded theory, I recorded and analysed 34 expert interviews with human geneticists between 2011 and 2015. In this paper, I argue that the constructivist understanding of medical diagnoses must be complemented with materialist sensitivity, thus making sense of the contingent nature of race/ethnicity as factors that contribute to medical understanding.
... The reason for the two analyses was because of the increased number of markers in the analyzed population and the reduced number of markers in the populations that were available for comparison. The first group of populations selected for comparison with the population of Croatia using the Yfiler™ Plus marker set included: Croatia (n = 507, present study), Slovenia (n = 194, (22)), Belgium (n = 160, (23)(24)(25)(26)), Hungary (n = 218, (27)(28)(29)(30)(31)(32)(33)), Austria (n =392, (34)(35)(36)(37)(38)), Germany (n = 495, (39)(40)(41)(42)(43)(44)(45)(46)(47)(48)(49)), Italy (n = 689, (50)(51)(52)(53)(54)(55)(56)(57)(58)(59)(60)(61)(62)(63)(64)(65)(66)(67)), North Macedonia (n = 295, (68)(69)(70)(71)), Serbia (n = 183, (72-74)), Denmark (n=177, (75)), Ethiopia (n=290, (76)), French Polynesia (n=81), (77,78)), Ghana (n=584, (79)), India (n=541, (80)(81)(82)(83)(84)(85)(86)(87)(88)(89)(90)(91)(92)(93)(94)(95)(96)(97)(98)), Lithuania (n=251, (99)(100)(101)), Mexico (n=354, (102)(103)(104)(105)(106)(107)), Nigeria (n=337, (108)), Pakistan (n=280, (109)(110)(111)(112)(113)(114)(115)(116)(117)(118)(119)(120)(121)), Poland (n=612, (122)(123)(124)(125)(126)(127)(128)(129)(130)), Russian Federation (n=958, (131)(132)(133)(134)(135)(136)(137)(138)(139)(140)(141)(142)(143)), Saudi Arabia (n=156, (144,145)), Spain (n=316, (146)(147)(148)(149)(150)(151)(152)(153)(154)(155)(156)(157)(158)(159)(160)(161)(162)(163)(164)), Switzerland (n=724, (19,165,166)), and United Kingdom (n=115, (167,168)). ...
Preprint
Full-text available
Aim To analyze additional set of Y-Chromosome genetic markers to acquire a more detailed insight into the diversity of the Croatian population. Methods The total number of 518 Yfiler™ Plus profiles was genotyped. Allele, haplotype frequencies and haplotype diversity, were calculated using the STRAF software package v2.0.4. Genetic distances were quantified by R st using AMOVA online tool from the YHRD. The evolutionary history was inferred using the neighbor-joining method of phylogenetic tree construction in MEGAX software. Whit Athey’s Haplogroup Predictor v5 was used for additional comparison with selected European populations. Results The total of 507 haplotypes were used for genetic STR analysis. The interpopulation comparison with the original 27 Y-STR markers shows the lowest genetic diversity between Croatian and Serbian population, and the highest between Croatian and Spanish population. Interpopulation study on 17 Y-STR markers shows the lowest genetic diversity between Croatian and Bosnian-Herzegovinian population, and the highest between Croatian and Irish population. Total of 518 haplotypes were used in the determination of haplogroup diversity. Haplogroup I with its sublineage I2a expressed the highest prevalence. Haplogroup R, with its major sublineage R1a, is the second most abundant in the studied Croatian population, except for the subpopulation of Hvar, where E1b1b is the second most abundant haplogroup. Rare haplogroups also confirmed in this study are L, T and Q. G1 is detected for the very first time in Croatian population. Conclusion New insight into differences between examined subpopulations of Croatia and their possible (dis)similarities with neighboring abroad populations was notified.
... Genetic studies based on uniparental markers, such as mitochondrial DNA and Ychromosome, established the south Asian origin of the Roma population. The prevalence of Y-chromosomal haplogroup H1a [13,14] and mtDNA haplogroups M18, M35b and M5a1 in Roma populations [15,16] gives us a hint of their south Asian ancestry. Nevertheless, studies based on the Y-chromosome and mtDNA contradict each other. ...
Article
Full-text available
Gypsies are a separate ethnic group living in Pakistan and some other countries as well. They are mostly known as 'Roma' and 'untouchables'. They have different types of lifestyles as compared to other common people, as they always keep migrating from one place to another. They do not have proper houses; they live in tent houses and most probably work on daily wages to earn their living. Gypsies cannot be specified according to the place of residence and can only be classified according to their migration route. Previous historical and linguistic research showed the north Indian origin of Roma people. The present study collected 285 unrelated Roma individuals living in Punjab and typed with the Goldeneye Y20 system. Allelic frequencies ranged between 0.0035 and 0.5266, with haplotype diversity (HD) of 0.9999 and discrimination capacity (DC) of 0.8790. Gene diversity (GD) ranged from 0.6489 (DYS391) to 0.9764 (DYS391) (DY385ab). A total of 223 unique alleles were observed. Interestingly, the haplogroup R accounted for 40.56% and J for 22.06%. In MDS analysis, Pakistani Roma formed a close cluster with Roma from Constanta, Romania. The migration pattern of the Roma population from Pakistan, India and Europe was inferred using co-alescence theory in the Migrate-n program. Overlapping Y-STR data were used to test different migration models. These migration models showed us the dominant gene flow from Pakistan to India and Europe to Pakistan. The results of our study showed that Y STRs provided substantially stronger discriminatory power in the Pakistani Roma population.
... Genetic studies of the European Roma suggest that over time their genetic structure was influenced by drift, bottleneck effects, different levels of endogamy and significant admixture with the non-Roma populations, whilst also preserving a genetic Emir Halilovi c and Adisa Ahmi c should be considered joint first author. signature of their the Indian origin (Gresham et al., 2001;Kalaydjieva et al., 2005;Gusmão et al., 2008;Martinovi c-Klari c et al., 2009;Regueiro et al., 2011;Pamjav et al., 2011;Rai et al., 2012;Mendizabal et al., 2012;Martínez-Cruz et al., 2016;Font-Porterias et al., 2019;Bianco et al., 2020). Apart from linguistic and cultural anthropological data (Achim, 2004;Fraser, 1992;Iovita & Schurr, 2004;Marushiakova & Popov, 2001a, 2001b, numerous genetic studies based on the analysis of Y-chromosome, and mitochondrial DNA (mtDNA) data (G omez-Carballa et al., 2013;Martínez-Cruz et al., 2016;Moorjani et al., 2013;Rai et al., 2012) and genome-wide data (Font-Porterias et al., 2019;Mendizabal et al., 2012) indicate that the ancestral homeland of the European Roma is North-West India. ...
Article
Objectives Studies indicate the complex nature of the genetic structure of the European Roma which has been shaped by different effects of their demographic history, while preserving their ancestral Indian origin. The primary aims of this study were to present for the first time the paternal profiles of the Roma from Bosnia and Herzegovina based on the data from Y-chromosome STR loci, identify the components of non-Roma paternal gene flow into the Roma, and evaluate the genetic relationships with other European Roma populations. Materials and methods In this study, 110 DNA samples of unrelated males from Roma populations residing in different regions of Bosnia and Herzegovina were genotyped using the 23 Y-STR loci included in the PowerPlex Y23 system. Results The analysis of the genetic structure of the Bosnian-Herzegovinian Roma revealed intra-country population substructuring and indicated differing genetic affinities between the Bosnian-Herzegovinian Roma and other European Roma populations. The paternal genetic structure of the Bosnian-Herzegovinian Roma has two components: an ancestral component represented by haplogroup H1a1a-M82, and European component presented by haplogroups I1-M253, I2a1a2b-L621, J2a1a-L26, J2a1a1a2b2a3~Z7671, J2b2a-M241, G2a2b2a1a1b-L497, and E1b1b-M215. Conclusion Genetic relations between the Bosnian-Herzegovinian Roma and other European Roma are shaped by different influences on their demographic history. The data suggest that the paternal gene pool of the Roma from Bosnia and Herzegovina might be a consequence of an early separation of the proto-Roma population and the later gene flow as well as factors of the isolation that accompany the Roma populations in some Bosnian-Herzegovinian regions.
Article
Full-text available
The Roma are a group of populations with a common origin that share the Romani identity and cultural heritage. Their genetic history has been inferred through multiple studies based on uniparental and autosomal markers, and current genomic data have provided novel insights into their genetic background. This review was prompted by two factors: (i) new developments to estimate the genetic structure of the Roma at a fine-scale resolution have precisely identified the ancestral components and traced migrations that were previously documented only in historical sources, clarifying and solving debates on the origins and the diaspora of the Roma; (ii) while there has been an effort to review the health determinants of the Roma, the increasing literature on their population genetics has not been subjected to a dedicated review in the last two decades. We believe that a summary on the state of the art will benefit both the public and scholars that are approaching the subject.
Data
Full-text available
European 'gypsies', commonly referred to as Romanies, are represented by a large number of groups spread across many countries. We performed a population genetic study on 200 unrelated Romany males to reveal the genetic origin of the Slovak Romany population. On the basis of Y-chro-mosome haplotypes, we determined the corresponding Y-haplogroups using Whit Athey's Haplogroup Predictor. The obtained distribution of haplogroups provided strong evidence of Asian origins, espe-cially Indian. The Indian Y-haplogroup H was the most prevalent and represented 40% of all the sam-ples. The distribution of haplogroups was: E1b1b, 21%; J2, 16.5%; I1a, 14%. Haplogroups R1a, R1b, I2a, and N1 were observed in small frequencies. The obtained genetic structure indicated that the en-dogamous Romany population has been shaped by a genetic drift and differential admixture, and cor-relates with the migratory history of the Romanies in Europe.
Article
Full-text available
Human Y-chromosome haplogroup structure is largely circumscribed by continental boundaries. One notable exception to this general pattern is the young haplogroup R1a that exhibits post-Glacial coalescent times and relates the paternal ancestry of more than 10% of men in a wide geographic area extending from South Asia to Central East Europe and South Siberia. Its origin and dispersal patterns are poorly understood as no marker has yet been described that would distinguish European R1a chromosomes from Asian. Here we present frequency and haplotype diversity estimates for more than 2000 R1a chromosomes assessed for several newly discovered SNP markers that introduce the onset of informative R1a subdivisions by geography. Marker M434 has a low frequency and a late origin in West Asia bearing witness to recent gene flow over the Arabian Sea. Conversely, marker M458 has a significant frequency in Europe, exceeding 30% in its core area in Eastern Europe and comprising up to 70% of all M17 chromosomes present there. The diversity and frequency profiles of M458 suggest its origin during the early Holocene and a subsequent expansion likely related to a number of prehistoric cultural developments in the region. Its primary frequency and diversity distribution correlates well with some of the major Central and East European river basins where settled farming was established before its spread further eastward. Importantly, the virtual absence of M458 chromosomes outside Europe speaks against substantial patrilineal gene flow from East Europe to Asia, including to India, at least since the mid-Holocene.Keywords: Y chromosome; haplogroup R1a; human evolution; population genetics
Article
A method is presented by which the gene diversity (heterozygosity) of a subdivided population can be analyzed into its components, i.e., the gene diversities within and between subpopulations. This method is applicable to any population without regard to the number of alleles per locus, the pattern of evolutionary forces such as mutation, selection, and migration, and the reproductive method of the organism used. Measures of the absolute and relative magnitudes of gene differentiation among subpopulations are also proposed.
Article
A polymerase chain reaction-based DNA typing method, amplified fragment length polymorphism (AMP-FLP) analysis, has shown promise as a means of analyzing forensic biological evidence. A variety of DNA extraction methods were evaluated for their suitability for AMP-FLP analysis. Factors that were considered in the evaluation included DNA yield, ability of DNA to be amplified, the presence of DNA fragments other than those expected for the alleles in the sample, and differential amplification of different sized alleles for a sample. An initial screen of eight extraction methods was conducted on bloodstains deposited on cotton sheeting. These methods included Chelex® 100, organic extraction followed by Centricon 100® (Amicon, Inc., Beverly, MA) dialysis and concentration, Geneclean(TM) (Bio 101, La Jolla, CA), GlassMax(TM) columns (Gibco BRL, Gaithersburg, MD); GlasPac(TM) (National Scientific Supply Co., Inc., San Rafael, CA), Qiaex (Qiagen Inc., Chatsworth, CA), Elu-Quik(TM) (Schleicher and Schuell, Keene, NH), and DNA Capture Reagent (Gibco BRL, Gaithersburg, MD). Then, four methods, Chelex® 100 extraction, organic extraction followed by ethanol precipitation, organic extraction followed by Centricon 100® (Amicon, Inc., Beverly, MA) dialysis and concentration, and Geneclean were evaluated on blood and semen stains. These stains were deposited on a variety of substrates, including cotton sheeting, denim, wallboard, nylon, wood, and carpet. The effect of addition of bovine serum albumin (BSA) to the amplification reaction was also examined. The method judged most suitable for AMP-FLP analysis was organic extraction followed by Centricon 100® dialysis and concentration, with BSA added to the amplification reaction. Additionally, a modification of an existing differential extraction procedure for separating non-sperm from sperm DNA was developed.
Article
The Indian subcontinent comprises a vast collection of peoples with different morphological, genetic, cultural, and linguistic characteristics. While much of this variability is indigenous, a considerable fraction of it has been introduced through large-scale immigrations into India in historical times. From an evolutionary standpoint, it is of immense interest to quantify biological diversity in contemporary human populations, to study biological affinities and to relate observed patterns of affinities with cultural, linguistic and demographic histories of populations. Such efforts are intended to shed light on the peopling of India. The purpose of this paper is to present a broad overview of the physical (anthropometric) and genetic diversities and affinities of the peoples of India. I shall also attempt to examine how well biological, particularly genetic, diversities and affinities correlate with geographical, socio-cultural, and linguistic diversities and affinities. © 1998 Wiley-Liss, Inc.