ArticlePDF Available

Genome-wide patterns of selection in 230 ancient Eurasians

Authors:

Abstract

Ancient DNA makes it possible to observe natural selection directly by analysing samples from populations before, during and after adaptation events. Here we report a genome-wide scan for selection using ancient DNA, capitalizing on the largest ancient DNA data set yet assembled: 230 West Eurasians who lived between 6500 and 300 bc, including 163 with newly reported data. The new samples include, to our knowledge, the first genome-wide ancient DNA from Anatolian Neolithic farmers, whose genetic material we obtained by extracting from petrous bones, and who we show were members of the population that was the source of Europe's first farmers. We also report a transect of the steppe region in Samara between 5600 and 300 bc, which allows us to identify admixture into the steppe from at least two external sources. We detect selection at loci associated with diet, pigmentation and immunity, and two independent episodes of selection on height.
00 MONTH 2015 | VOL 000 | NATURE | 1
© 2015 Macmillan Publishers Limited. All rights reserved
ARTICLE doi:10.1038/nature16152
Genome-wide patterns of selection in
230 ancient Eurasians
Iain Mathieson1, Iosif Lazaridis1,2, Nadin Rohland1,2, Swapan Mallick1,2,3, Nick Patterson2, Songül Alpaslan Roodenberg4,
Eadaoin Harney1,3, Kristin Stewardson1,3, Daniel Fernandes5, Mario Novak5,6, Kendra Sirak5,7, Cristina Gamba5,8,
Eppie R. Jones8, Bastien Llamas9, Stanislav Dryomov10,11, Joseph Pickrell1, Juan Luís Arsuaga12,13,
José María Bermúdez de Castro14, Eudald Carbonell15,16, Fokke Gerritsen17, Aleksandr Khokhlov18, Pavel Kuznetsov18,
Marina Lozano15,16, Harald Meller19, Oleg Mochalov18, Vyacheslav Moiseyev20, Manuel A. Rojo Guerra21, Jacob Roodenberg22,
Josep Maria Vergès15,16, Johannes Krause23,24, Alan Cooper9, Kurt W. Alt19,25,26, Dorcas Brown27, David Anthony27,
Carles Lalueza-Fox28, Wolfgang Haak9,23*, Ron Pinhasi5* & David Reich1,2,3*
The arrival of farming in Europe around 8,500 years ago necessitated
adaptation to new environments, pathogens, diets and social organi-
zations. While indirect evidence of this adaptation can be detected in
patterns of genetic variation in present-day people1, these patterns are
only echoes of past events, which are difficult to date and interpret,
and are often confounded by neutral processes. Ancient DNA provides
a direct way to study these patterns, and should be a transformative
technology for studies of selection, just as it has transformed studies of
human pre-histor y. Until now, however, the large sample sizes required
to detect selection have meant that studies of ancient DNA have con-
centrated on characterizing effects at parts of the genome already
believed to have been affected by selection2–5.
Genome-wide ancient DNA from West Eurasia
We assembled genome-wide data from 230 ancient individuals from
West Eurasia dated to between 6500 and 300  (Fig. 1a, Extended Data
Table 1, Supplementary Data Table 1 and Supplementary Information
section 1). To obtain this data set, we combined published data from
67 samples from relevant periods and cultures4–6, with 163 samples
for which we report new data, of which 83 have, to our knowledge,
never previously been analysed (the remaining 80 samples include
67 whose targeted single nucleotide polymorphism (SNP) coverage we
tripled from 390,000 (‘390k capture’) to 1,240,000 (‘1240k capture’)7;
and 13 with shotgun data for which we generated new data using our
targeted enrichment strategy3,8). The 163 samples for which we report
new data are drawn from 270 distinct individuals who we screened for
evidence of authentic DNA
7
. We used in-solution hybridization with
synthesized oligonucleotide probes to enrich promising libraries for the
targeted SNPs (Methods). The targeted sites include nearly all SNPs on
the Affymetrix Human Origins and Illumina 610-Quad arrays, 49,711
SNPs on chromosome X, 32,681 SNPs on chromosome Y, and 47,384
SNPs with evidence of functional importance. We merged libraries
from the same individual and filtered out samples with low coverage
or evidence of contamination to obtain the final set of individuals. The
1240k capture gives access to genome-wide data from ancient samples
with small fractions of human DNA and increases efficiency by tar-
geting sites in the human genome that will actually be analysed. The
effectiveness of the approach can be seen by comparing our results
to the largest previously published ancient DNA study, which used a
shotgun sequencing strategy
5
. Our median coverage on analysed SNPs
Ancient DNA makes it possible to observe natural selection directly by analysing samples from populations before, during
and after adaptation events. Here we report a genome-wide scan for selection using ancient DNA, capitalizing on the
largest ancient DNA data set yet assembled: 230 West Eurasians who lived between 6500 and 300 , including 163 with
newly reported data. The new samples include, to our knowledge, the first genome-wide ancient DNA from Anatolian
Neolithic farmers, whose genetic material we obtained by extracting from petrous bones, and who we show were members
of the population that was the source of Europe’s first farmers. We also report a transect of the steppe region in Samara
between 5600 and 300 , which allows us to identify admixture into the steppe from at least two external sources. We
detect selection at loci associated with diet, pigmentation and immunity, and two independent episodes of selection on
height.
1Department of Genetics, Harvard Medical School, Boston, Massachusetts 02115, USA. 2Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA. 3Howard Hughes Medical
Institute, Harvard Medical School, Boston, Massachusetts 02115, USA. 4Independent researcher, Santpoort-Noord, The Netherlands. 5School of Archaeology and Earth Institute, Belfield, University
College Dublin, Dublin 4, Ireland. 6Institute for Anthropological Research, Zagreb 10000, Croatia. 7Department of Anthropology, Emory University, Atlanta, Georgia 30322, USA. 8Smurfit Institute of
Genetics, Trinity College Dublin, Dublin 2, Ireland. 9Australian Centre for Ancient DNA, School of Biological Sciences & Environment Institute, University of Adelaide, Adelaide, South Australia 5005,
Australia. 10Laboratory of Human Molecular Genetics, Institute of Molecular and Cellular Biology, Siberian Branch of the Russian Academy of Sciences, Novosibirsk 630090, Russia. 11Department
of Paleolithic Archaeology, Institute of Archaeology and Ethnography, Siberian Branch of the Russian Academy of Sciences, Novosibirsk 630090, Russia. 12Centro Mixto UCM-ISCIII de Evolución y
Comportamiento Humanos, 28040 Madrid, Spain. 13Departamento de Paleontología, Facultad Ciencias Geológicas, Universidad Complutense de Madrid, 28040 Madrid, Spain. 14Centro Nacional
de Investigacíon sobre Evolución Humana (CENIEH), 09002 Burgos, Spain. 15IPHES. Institut Català de Paleoecologia Humana i Evolució Social, Campus Sescelades-URV, 43007 Tarragona, Spain.
16Area de Prehistoria, Universitat Rovira i Virgili (URV), 43002 Tarragona, Spain. 17Netherlands Institute in Turkey, Istiklal Caddesi, Nur-i Ziya Sokak 5, Beyog˘ lu 34433, Istanbul, Turkey. 18Volga
State Academy of Social Sciences and Humanities, Samara 443099, Russia. 19State Office for Heritage Management and Archaeology Saxony-Anhalt and State Museum of Prehistory, D-06114
Halle, Germany. 20Peter the Great Museum of Anthropology and Ethnography (Kunstkamera) RAS, St Petersburg 199034, Russia. 21Department of Prehistory and Archaeology, University of
Valladolid, 47002 Valladolid, Spain. 22The Netherlands Institute for the Near East, Leiden RA-2300, the Netherlands. 23Max Planck Institute for the Science of Human History, D-07745 Jena,
Germany. 24Institute for Archaeological Sciences, University of Tübingen, D-72070 Tübingen, Germany. 25Danube Private University, A-3500 Krems, Austria. 26Institute for Prehistory and
Archaeological Science, University of Basel, CH-4003 Basel, Switzerland. 27Anthropology Department, Hartwick College, Oneonta, New York 13820, USA. 28Institute of Evolutionary Biology
(CSIC-Universitat Pompeu Fabra), 08003 Barcelona, Spain. Present addresses: Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, Øster Voldgade 5–7,
1350 Copenhagen, Denmark (C.G.); New York Genome Center, New York, New York 10013, USA (J.P.).
*These authors contributed equally to this work.
2 | NATURE | VOL 000 | 00 MONTH 2015
Article
reSeArcH
© 2015 Macmillan Publishers Limited. All rights reserved
is approximately fourfold higher even while the mean number of reads
generated per sample is 36-fold lower (Extended Data Fig. 1).
Insight into population transformations
To learn about the genetic affinities of the archaeological cultures
for which genome-wide data are reported for the first time here, we
studied either 1,055,209 autosomal SNPs when analysing 230 ancient
individuals alone, or 592,169 SNPs when co-analysing them with 2,345
present-day individuals genotyped on the Human Origins array
4
. We
removed 13 samples either as outliers in ancestry relative to others of
the same archaeologically determined culture, or first-degree relatives
(Supplementary Data Table 1).
Our sample of 26 Anatolian Neolithic individuals represents the first
genome-wide ancient DNA data from the eastern Mediterranean. Our
success at analysing such a large number of samples is due to the fact
that in the case of 21 of the successful samples, we obtained DNA from
the inner ear region of the petrous bone9, which has been shown to
increase the amount of DNA obtained by up to two orders of magnitude
relative to teeth3. Principal component (PCA) and ADMIXTURE10
analyses show that the Anatolian Neolithic samples do not resemble any
present-day near-Eastern populations but are shifted towards Europe,
clustering with early European farmers (EEF) from Germany, Hungary
and Spain
7
(Fig. 1b and Extended Data Fig. 2). Further evidence that
the Anatolian Neolithic and EEF were related comes from the high
frequency (47%; n = 15) of Y-chromosome haplogroup G2a typical of
ancient EEF samples7 (Supplementary Data Table 1), and the low fixa-
tion index (FST; 0.005 – 0.016) between Neolithic Anatolians and EEF
(Supplementary Data Table 2). These results support the hypothesis
7
of a common ancestral population of EEF before their dispersal along
distinct inland/central European and coastal/Mediterranean routes.
The EEF are slightly more shifted to Europe in the PCA than are the
Anatolian Neolithic (Fig. 1b) and have significantly more admixture
from Western hunter-gatherers (WHG), as shown by f4-statistics
(|Z| > 6 standard errors from 0) and negative f3-statistics (|Z| > 4)11
(Extended Data Table 2). We estimate that the EEF have 7–11% more
WHG admixture than their Anatolian relatives (Extended Data Fig. 2,
Supplementary Information section 2).
The Iberian Chalcolithic individuals from El Mirador cave are genet-
ically similar to the Middle Neolithic Iberians who preceded them
(Fig. 1b and Extended Data Fig. 2), and have more WHG ancestry
than their Early Neolithic predecessors7 (|Z| > 10) (Extended Data
Table 2). However, they do not have a significantly different proportion
of WHG ancestry (we estimate 23–28%) than the Middle Neolithic
Iberians (Extended Data Fig. 2). Chalcolithic Iberians have no evi-
dence of steppe ancestry (Fig. 1b and Extended Data Fig. 2), in contrast
to central Europeans of the same period
5,7
. Thus, the steppe-related
ancestry that is ubiquitous across present-day Europe4,7 arrived
in Iberia later than in central Europe (Supplementary Information
section 2).
To understand population transformations in the Eurasian steppe,
we analysed a time transect of 37 samples from the Samara region
spanning ~5600–1500  and including the Eastern hunter-gath-
erer (EHG), Eneolithic, Yamnaya, Poltavka, Potapovka and Srubnaya
cultures. Admixture between populations of Near Eastern ancestry
and the EHG
7
began as early as the Eneolithic (5200–4000 ), with
some individuals resembling EHG and some resembling Yamnaya
(Fig. 1b and Extended Data Fig. 2). The Yamnaya from Samara and
Kalmykia, the Afanasievo people from the Altai (3300–3000 ), and
the Poltavka Middle Bronze Age (2900–2200 ) population that fol-
lowed the Yamnaya in Samara are all genetically homogeneous, forming
a tight ‘Bronze Age steppe’ cluster in PCA (Fig. 1b), sharing predom-
inantly R1b Y chromosomes5,7 (Supplementary Data Table 1), and
having 48–58% ancestry from an Armenian-like Near Eastern source
(Extended Data Table 2) without additional Anatolian Neolithic or EEF
ancestry7 (Extended Data Fig. 2). After the Poltavka period, popula-
tion change occurred in Samara: the Late Bronze Age Srubnaya have
~17% Anatolian Neolithic or EEF ancestry (Extended Data Fig. 2).
Previous work documented that such ancestry appeared east of the Urals
beginning at least by the time of the Sintashta culture, and suggested
that it reflected an eastward migration from the Corded Ware peoples
of central Europe
5
. However, the fact that the Srubnaya also had such
ancestry indicates that the Anatolian Neolithic or EEF ancestry could
have come into the steppe from a more eastern source. Further evidence
that migrations originating as far west as central Europe may not have
had an important impact on the Late Bronze Age steppe comes from the
fact that the Srubnaya possess exclusively (n = 6) R1a Y chromosomes
(Supplementary Data Table 1), and four of them (and one Poltavka
male) belonged to haplogroup R1a-Z93, which is common in central/
south Asians12, very rare in present-day Europeans, and absent in all
ancient central Europeans studied to date.
Twelve signals of selection
To study selection, we created a data set of 1,084,781 autosomal SNPs
in 617 samples by merging 213 ancient samples with genome-wide
sequencing data from four populations of European ancestry from
the 1,000 Genomes Project13. Most present-day Europeans can be
modelled as a mixture of three ancient populations related to Western
hunter-gatherers (WHG), early European farmers (EEF) and steppe
pastoralists (Yamnaya)
4,7
, and so to scan for selection, we divided our
Figure 1 | Population relationships of samples. a, Locations colour-
coded by date, with a random jitter added for visibility (8 Afanasievo
and Andronovo samples lie further east and are not shown). b, Principal
component analysis of 777 modern West Eurasian samples (grey), with
221 ancient samples projected onto the first two principal component
axes and labelled by culture. E/M/LN, Early/Middle/Late Neolithic; LBK,
Linearbandkeramik; E/WHG, Eastern/Western hunter-gatherer; EBA,
Early Bronze Age; IA, Iron Age; LNBA, Late Neolithic and Bronze Age.
7,000
6,000
5,000
4,000
3,000
2,000
1,000
0
Date BC
WHG
Motala HG
EHG
Samara Eneolithic
Yamnaya Kalmykia
Yamnaya Samara
Afanasievo
Poltavka
Poltavka outlier
Potapovka
Russia EBA
Srubnaya
Srubnaya outlier
Sintashta
Andronovo
Andronovo outlier
Anatolia Neolithic
Anatolia Neolithic outlier
Hungary EN
LBK EN
Iberia EN
Central MN
Iberia MN
Iceman
Remedello
Iberia Chalcolithic
Central LNBA
Central LNBA outlier
Bell Beaker LN
Northern LNBA
Hungary BA
Scythian IA
Western European hunter-gatherers
Scandinavian hunter-gatherers
Eastern European hunter-gatherers
Eneolithic Samara Bronze Age (steppe)
Srubnaya
Sintashta/Andronovo
Anatolia Neolithic
Early Neolithic
Middle Neolithic
Chalcolithic Iberia
a
b
00 MONTH 2015 | VOL 000 | NATURE | 3
Article reSeArcH
© 2015 Macmillan Publishers Limited. All rights reserved
samples into three groups based on which of these populations they
clustered with most closely (Fig. 1b and Extended Data Table 1). We
estimated mixture proportions for the present-day European ancestry
populations and tested every SNP to evaluate whether its present-day
frequencies were consistent with this model. We corrected for test
statistic inflation by applying a genomic control correction analogous
to that used to correct for population structure in genome-wide asso-
ciation studies
14
. Of approximately one million non-monomorphic
autosomal SNPs, the ~50,000 in the set of potentially functional SNPs
were significantly more inconsistent with the model than neutral
SNPs (Fig. 2), suggesting pervasive selection on polymorphisms of
functional importance. Using a conservative significance threshold
of P = 5.0 × 10
8
, and a genomic control correction of 1.38, we iden-
tified 12 loci that contained at least three SNPs achieving genome-
wide significance within 1 Mb of the most associated SNP (Fig. 2,
Extended Data Table 3, Extended Data Fig. 3 and Supplementary Data
Table 3).
The strongest signal of selection is at the SNP (rs4988235) responsi-
ble for lactase persistence in Europe
15,16
. Our data (Fig. 3) strengthens
previous reports that an appreciable frequency of lactase persistence
in Europe only dates to the last 4,000 years3,5,17. The allele’s earliest
appearance in the dataset is in a central European Bell Beaker sample
(individual I0112) dated to between 2450 and 2140 . Two other
independent signals related to diet are located on chromosome 11
near FADS1 and DHCR7. FADS1 and FADS2 are involved in fatty
acid metabolism, and variation at this locus is associated with plasma
lipid and fatty acid concentration
18
. The selected allele of the most
significant SNP (rs174546) is associated with decreased triglyceride
levels
18
. This locus has experienced independent selection in non-Eu-
ropean populations
13,19,20
and is likely to be a critical component of
adaptation to different diets. Variants at DHCR7 and NADSYN1 are
associated with circulating vitamin D levels21 and the most associ-
ated SNP in our analysis, rs7940244, is highly differentiated across
closely related northern European populations22,23, suggesting
selection related to variation in dietary or environmental sources of
vitamin D.
Two signals have a potential link to coeliac disease. One occurs at the
ergothioneine transporter SLC22A4 that is hypothesized to have expe-
rienced a selective sweep to protect against ergothioneine deficiency in
agricultural diets
24
. Common variants at this locus are associated with
increased risk for ulcerative colitis, coeliac disease, and irritable bowel
disease and may have hitchhiked to high frequency as a result of this
sweep24–26. However, the specific variant (rs1050152, L503F) that was
thought to be the target did not reach high frequency until relatively
recently (Extended Data Fig. 4). The signal at ATXN2/SH2B3—also
associated with coeliac disease
25
—shows a similar pattern (Extended
Data Fig. 4).
The second strongest signal in our analysis is at the derived
allele of rs16891982 in SLC45A2, which contributes to light skin
pigmentation and is almost fixed in present-day Europeans but
occurred at much lower frequency in ancient populations. In con-
trast, the derived allele of SLC24A5 that is the other major deter-
minant of light skin pigmentation in modern Europe (and that is
not significant in the genome-wide scan for selection) appears
fixed in the Anatolian Neolithic, suggesting that its rapid increase
in frequency to around 0.9 in Early Neolithic Europe was mostly
due to migration (Extended Data Fig. 4). Another pigmenta-
tion signal is at GRM5, where SNPs are associated with pigmen-
tation possibly through a regulatory effect on nearby TYR27.
We also find evidence of selection for the derived allele of rs12913832
at HERC2/OCA2, which is at 100% frequency in the European hunter-
gatherers we analysed, and is the primary determinant of light eye
colour in present-day Europeans
28,29
. In contrast to the other loci, the
range of frequencies in modern populations is within that of ancient
populations (Fig. 3). The frequency increases with higher latitude,
suggesting a complex pattern of environmental selection.
The TLR1–TLR6–TLR10 gene cluster is a known target of selec-
tion in Europe, possibly related to resistance to leprosy, tuberculosis
or other mycobacteria
30–32
. There is also a strong signal of selection
at the major histocompatibility complex (MHC) on chromosome 6.
The strongest signal is at rs2269424 near the genes PPT2 and EGFL8,
but there are at least six other apparently independent signals in the
MHC (Extended Data Fig. 3); and the entire region is significantly
more associated than the genome-wide average (residual inflation
of 2.07 in the region on chromosome 6 between 29–34 Mb after
genome-wide genomic control correction). This could be the result
of multiple sweeps, balancing selection, or increased drift as a result
of background selection reducing effective population size in this
gene-rich region.
We find a surprising result in six Scandinavian hunter-gatherers
(SHG) from Motala in Sweden. In three of six samples, we observe
the haplotype carrying the derived allele of rs3827760 in the EDAR
Figure 2 | Genome-wide scan for selection. GC-corrected –log10 Pvalue
for each marker (Methods). The red dashed line represents a genome-wide
significance level of 0.5 × 108. Genome-wide significant points filtered
because there were fewer than two other genome-wide significant points
within 1Mb are shown in grey. Inset, quantile–quantile plots for corrected
–log10 Pvalues for different categories of potentially functional SNPs
(Methods). Truncated at log10[Pvalue] = 30. All curves are significantly
different from neutral expectation. CMS, composite of multiple signals
selection hits; HiDiff, highly differentiated between HapMap populations;
Immune, immune-related; HLA, human leukocyte antigen type tag SNPs;
eQTL, expression quantitative trait loci (see Methods).
123456789101112131415 16 17 18 20 22
50
40
30
20
10
0
–log
10
[P value]
Chromosome
30
25
20
15
10
5
0
0123456
Expected –log10[P value]
Observed –log10[P value]
Neutral
GWAS
CMS
HiDiff
Immune
HLA
eQTL
LCT
TLR1-6-10
SLC45A2
SLC22A4
MHC
ZKSCAN3
DHCR7
FADS1-2
GRM5
ATXN2
Chr13:38.8
HERC2
4 | NATURE | VOL 000 | 00 MONTH 2015
Article
reSeArcH
© 2015 Macmillan Publishers Limited. All rights reserved
gene (Extended Data Fig. 5), which affects tooth morphology and
hair thickness33,34, has been the target of a selective sweep in East
Asia35, and today is at high frequency in East Asians and Native
Americans. The EDAR derived allele is largely absent in present-day
Europe, except in Scandinavia, plausibly owing to Siberian move-
ments into the region millennia after the date of the Motala samples.
The SHG have no evidence of East Asian ancestry4,7, suggesting that
the EDAR derived allele may not have originated in the main ances-
tral population of East Asians as previously suggested
35
. A second
surprise is that, unlike closely related WHGs, the Motala samples
have predominantly derived pigmentation alleles at SLC45A2 and
SLC24A5.
Evidence of selection on height
We also tested for selection on complex traits. The best-documented
example of this process in humans is height, for which the differ-
ences between northern and southern Europe have been driven by
selection
36
. To test for this signal in our data, we used a statistic that
tests whether trait-affecting alleles are both highly correlated and
more differentiated, compared to randomly sampled alleles37. We
predicted genetic heights for each population and applied the test to
all populations together, as well as to pairs of populations (Fig. 4).
Using 180 height-associated SNPs
38
(restricted to 169 for which we
successfully obtained genotypes from at least two individuals from
each population), we detect a significant signal of directional selection
on height (P = 0.002). Applying this to pairs of populations allows us
to detect two independent signals. First, the Iberian Neolithic and
Chalcolithic samples show selection for reduced height relative to
both the Anatolian Neolithic (P = 0.042) and the central European
Early and Middle Neolithic (P = 0.003). Second, we detect a signal
for increased height in the steppe populations (P = 0.030 relative
to the central European Early and Middle Neolithic). These results
suggest that the modern South–North gradient in height across
Europe is due to both increased steppe ancestry in northern popula-
tions, and selection for decreased height in Early Neolithic migrants
to southern Europe. We did not observe any other significant signals
of polygenetic selection in five other complex traits we tested: body
mass index39 (P = 0.20), waist-to-hip ratio40 (P = 0.51), type 2 dia-
betes
41
(P = 0.37), inflammatory bowel disease
26
(P = 0.17) and lipid
levels18 (P = 0.50).
Future studies of selection with ancient DNA
Our results, which take advantage of the massive increase in sample
size enabled by optimized techniques for sampling from the inner-
ear regions of the petrous bone, as well as in-solution enrichment
methods for targeted SNPs, show how ancient DNA can be used to
perform a genome-wide scan for selection. Our results also directly
document selection on loci related to pigmentation, diet and immu-
nity, painting a picture of populations adapting to settled agricultural
life at high latitudes. For most of the signals, allele frequencies of
modern Europeans are outside the range of any ancient populations,
indicating that phenotypically, Europeans of 4,000years ago were
different in important respects from Europeans today, despite having
overall similar ancestry. An important direction for future research
is to increase the sample size for European selection scans (Extended
Data Fig. 6), and to apply this approach to regions beyond Europe
and to other species.
−0.2
−0.1
0.0
0.1
0.2
0.3
0.4
Genetic height (GWAS effect size)
8,000 6,000 4,000 2,000 0
STP
HG
AN
CEM CLB
INC
CEU
IBS
Increasing height
Decreasing height
a
Steppe ancestry
Early farmers
Hunter-gatherers
Years before present
AN
CEM
CLB
HG
INC
STP
IBS
CEU
CEU
IBS
STP
INC
HG
CLB
CEM
AN
−4
−3
−2
2
3
4
Z
b
Figure 4 | Polygenic selection on height. a, Estimated genetic heights.
Boxes show 0.05–0.95 posterior densities for population mean genetic
height (Methods). Dots show the maximum likelihood point estimate.
Arrows show major population relationships, dashed lines represent
ancestral populations. The symbols < and > label potentially independent
selection events resulting in an increase or decrease in height. b, Z scores
for the pairwise polygenic selection test. Positive if the column population
is taller than the row population.
0.0
0.5
1.0
LCT (rs4988235)
0.0
0.5
1.0
FADS1−2 (rs174546)
0.0
0.5
1.0
SLC45A2 (rs16891982)
0.0
0.5
1.0
HERC2/OCA2 (rs12913832)
0.0
0.5
1.0
TLR1−6−10 (rs4833103)
Hunter-gatherer (HG)
Early farmer (AN)
Early farmer (CEM)
Early farmer (INC)
Steppe ancestry (CLB)
Steppe ancestry (STP)
Northwest Europe (CEU)
Great Britain (GBR)
Spain (IBS)
Tuscan (TSI)
Figure 3 | Allele frequencies for five genome-wide significant signals
of selection. Dots and solid lines show maximum likelihood frequency
estimates and a 1.9-log-likelihood support interval for the derived allele
frequency in each ancient population. Horizontal dashed lines show
allele frequencies in the four modern 1000 Genomes populations. AN,
Anatolian Neolithic; HG, hunter-gatherer; CEM, central European Early
and Middle Neolithic; INC, Iberian Neolithic and Chalcolithic; CLB,
central European Late Neolithic and Bronze Age; STP, steppe; CEU, Utah
residents with northern and western European ancestry; IBS, Iberian
population in Spain. The hunter-gatherer, early farmer and steppe ancestry
classifications correspond approximately to the three populations used in
the genome-wide scan with some differences (See Extended Data Table 1
for details).
00 MONTH 2015 | VOL 000 | NATURE | 5
Article reSeArcH
© 2015 Macmillan Publishers Limited. All rights reserved
Online Content Methods, along with any additional Extended Data display items and
Source Data, are available in the online version of the paper; references unique to
these sections appear only in the online paper.
Received 12 March; accepted 30 October 2015.
Published online 23 November 2015.
1. Grossman, S. R. et al. Identifying recent adaptations in large-scale genomic
data. Cell 152, 703–713 (2013).
2. Wilde, S. et al. Direct evidence for positive selection of skin, hair, and eye
pigmentation in Europeans during the last 5,000 y. Proc. Natl Acad. Sci. USA
111, 4832–4837 (2014).
3. Gamba, C. et al. Genome ux and stasis in a ve millennium transect of
European prehistory. Nature Commun. 5, 5257 (2014).
4. Lazaridis, I. et al. Ancient human genomes suggest three ancestral populations
for present-day Europeans. Nature 513, 409–413 (2014).
5. Allentoft, M. E. et al. Population genomics of Bronze Age Eurasia. Nature 522,
167–172 (2015).
6. Keller, A. et al. New insights into the Tyrolean Iceman’s origin and
phenotype as inferred by whole-genome sequencing. Nature Commun. 3,
698 (2012).
7. Haak, W. et al. Massive migration from the steppe was a source for Indo-
European languages in Europe. Nature 522, 207–211 (2015).
8. Olalde, I. et al. Derived immune and ancestral pigmentation alleles in a
7,000-year-old Mesolithic European. Nature 507, 225–228 (2014).
9. Pinhasi, R. et al. Optimal ancient DNA yields from the inner ear part of the
human petrous bone. PLoS ONE 10, e0129102 (2015).
10. Alexander, D. H., Novembre, J. & Lange, K. Fast model-based estimation of
ancestry in unrelated individuals. Genome Res. 19, 1655–1664 (2009).
11. Patterson, N. et al. Ancient admixture in human history. Genetics 192,
1065–1093 (2012).
12. Underhill, P. A. et al. The phylogenetic and geographic structure of
Y-chromosome haplogroup R1a. Eur. J. Hum. Genet. 23, 124–131 (2015).
13. The 1000 Genomes Project Consortium. A global reference for human genetic
variation. Nature 526, 68–74 (2015).
14. Devlin, B. & Roeder, K. Genomic control for association studies. Biometrics 55,
997–1004 (1999).
15. Enattah, N. S. et al. Identication of a variant associated with adult-type
hypolactasia. Nature Genet. 30, 233–237 (2002).
16. Bersaglieri, T. et al. Genetic signatures of strong recent positive selection at the
lactase gene. Am. J. Hum. Genet. 74, 1111–1120 (2004).
17. Burger, J., Kirchner, M., Bramanti, B., Haak, W. & Thomas, M. G. Absence of the
lactase-persistence-associated allele in early Neolithic Europeans. Proc. Natl
Acad. Sci. USA 104, 3736–3741 (2007).
18. Teslovich, T. M. et al. Biological, clinical and population relevance of 95 loci for
blood lipids. Nature 466, 707–713 (2010).
19. Fumagalli, M. et al. Greenlandic Inuit show genetic signatures of diet and
climate adaptation. Science 349, 1343–1347 (2015).
20. Mathias, R. A. et al. Adaptive evolution of the FADS gene cluster within Africa.
PLoS ONE 7, e44926 (2012).
21. Wang, T. J. et al. Common genetic determinants of vitamin D insuciency:
a genome-wide association study. Lancet 376, 180–188 (2010).
22. Price, A. L. et al. The impact of divergence time on the nature of
population structure: an example from Iceland. PLoS Genet. 5, e1000505
(2009).
23. Wellcome Trust Case Control Consortium. Genome-wide association study of
14,000 cases of seven common diseases and 3,000 shared controls. Nature
447, 661–678(2007).
24. Hu, C. D. et al. Crohn’s disease and genetic hitchhiking at IBD5. Mol. Biol.
Evol. 29, 101–111 (2012).
25. Hunt, K. A. et al. Newly identied genetic risk variants for celiac disease related
to the immune response. Nature Genet. 40, 395–402 (2008).
26. Jostins, L. et al. Host–microbe interactions have shaped the genetic
architecture of inammatory bowel disease. Nature 491, 119–124 (2012).
27. Beleza, S. et al. Genetic architecture of skin and eye color in an African–
European admixed population. PLoS Genet. 9, e1003372 (2013).
28. Sturm, R. A. et al. A single SNP in an evolutionary conserved region within
intron 86 of the HERC2 gene determines human blue-brown eye color.
Am. J. Hum. Genet. 82, 424–431 (2008).
29. Eiberg, H. et al. Blue eye color in humans may be caused by a perfectly
associated founder mutation in a regulatory element located within the HERC2
gene inhibiting OCA2 expression. Hum. Genet. 123, 177–187 (2008).
30. Barreiro, L. B. et al. Evolutionary dynamics of human Toll-like receptors and
their dierent contributions to host defense. PLoS Genet. 5, e1000562 (2009).
31. Uciechowski, P. et al. Susceptibility to tuberculosis is associated with TLR1
polymorphisms resulting in a lack of TLR1 cell surface expression. J. Leukoc.
Biol. 90, 377–388 (2011).
32. Wong, S. H. et al. Leprosy and the adaptation of human toll-like receptor 1.
PLoS Pathog. 6, e1000979 (2010).
33. Fujimoto, A. et al. A scan for genetic determinants of human hair morphology:
EDAR is associated with Asian hair thickness. Hum. Mol. Genet. 17, 835–843
(2008).
34. Kimura, R. et al. A common variation in EDAR is a genetic determinant of
shovel-shaped incisors. Am. J. Hum. Genet. 85, 528–535 (2009).
35. Kamberov, Y. G. et al. Modeling recent human evolution in mice by expression
of a selected EDAR variant. Cell 152, 691–702 (2013).
36. Turchin, M. C. et al. Evidence of widespread selection on standing variation in
Europe at height-associated SNPs. Nature Genet. 44, 1015–1019 (2012).
37. Berg, J. J. & Coop, G. et al. A population genetic signal of polygenic adaptation.
PLoS Genet. 10, e1004412 (2014).
38. Lango Allen, H. et al. Hundreds of variants clustered in genomic loci and
biological pathways aect human height. Nature 467, 832–838 (2010).
39. Speliotes, E. K. et al. Association analyses of 249,796 individuals reveal 18
new loci associated with body mass index. Nature Genet. 42, 937–948
(2010).
40. Heid, I. M. et al. Meta-analysis identies 13 new loci associated with waist–hip
ratio and reveals sexual dimorphism in the genetic basis of fat distribution.
Nature Genet. 42, 949–960 (2010).
41. Morris, A. P. et al. Large-scale association analysis provides insights into the
genetic architecture and pathophysiology of type 2 diabetes. Nature Genet. 44,
981–990 (2012).
Supplementary Information is available in the online version of the paper.
Acknowledgements We thank P. de Bakker, J. Burger, C. Economou,
E. Fornander, Q. Fu, F. Hallgren, K. Kirsanow, A. Mittnik, I. Olalde, A. Powell,
P. Skoglund, S. Tabrizi and A. Tandon for discussions, suggestions about
SNPs to include, or contribution to sample preparation or data curation.
We thank S. Pääbo, M. Meyer, Q. Fu and B. Nickel for collaboration in
developing the 1240k capture reagent. We thank J. M. V. Encinas and M. E.
Prada for allowing us to resample La Braña 1. I.M. was supported by
the Human Frontier Science Program LT001095/2014-L. C.G. was
supported by the Irish Research Council for Humanities and Social
Sciences (IRCHSS). F.G. was supported by a grant of the Netherlands
Organization for Scientific Research, no. 380-62-005. A.K., P.K. and O.M. were
supported by RFBR no. 15-06-01916 and RFH no. 15-11-63008 and O.M.
by a state grant of the Ministry of Education and Science of the Russia
Federation no. 33.1195.2014/k. J.K. was supported by ERC starting grant
APGREID and DFG grant KR 4015/1-1. K.W.A. was supported by DFG grant
AL 287 / 14-1. C.L.-F. was supported by a BFU2015-64699-P grant from
the Spanish government. W.H. and B.L. were supported by Australian
Research Council DP130102158. R.P. was supported by ERC starting grant
ADNABIOARC (263441), and an Irish Research Council ERC support grant.
D.R. was supported by US National Science Foundation HOMINID grant
BCS-1032255, US National Institutes of Health grant GM100233, and the
Howard Hughes Medical Institute.
Author Contributions W.H., R.P. and D.R. supervised the study. S.A.R., J.L.A.,
J.M.B., E.C., F.G., A.K., P.K., M.L., H.M., O.M., V.M., M.A.R., J.R., J.M.V., J.K., A.C.,
K.W.A., D.B., D.A., C.L., W.H., R.P. and D.R. assembled archaeological material.
I.M., I.L., N.R., S.M., N.P., S.D., J.P., W.H. and D.R. analysed genetic data. N.R., E.H.,
K.St., D.F., M.N., K.Si., C.G., E.R.J., B.L., C.L. and W.H. performed wet laboratory
ancient DNA work. I.M., I.L. and D.R. wrote the manuscript with input from all
co-authors.
Author Information The aligned sequences are available through the
European Nucleotide Archive under accession number PRJEB11450. The
Human Origins genotype datasets including ancient individuals can be
found at (http://genetics.med.harvard.edu/reich/Reich_Lab/Datasets.html).
Reprints and permissions information is available at www.nature.com/
reprints. The authors declare no competing financial interests. Readers are
welcome to comment on the online version of the paper. Correspondence
and requests for materials should be addressed to I.M. (iain_mathieson@hms.
harvard.edu), W.H. (haak@shh.mpg.de), R.P. (ron.pinhasi@ucd.ie) or
D.R. (reich@genetics.med.harvard.edu).
Article
reSeArcH
© 2015 Macmillan Publishers Limited. All rights reserved
Genome-wide scan for selection. For most ancient samples, we did not have
sufficient coverage to make reliable diploid calls. We therefore used the counts of
sequences covering each SNP to compute the likelihoo d of the al lele frequency in
each population. Suppose that at a particular site, for each population we have M
samples with sequence level data, and N samples with full diploid genotype calls
(Loschbour, Stuttgart and the 1,000 Genomes samples). For samples i = 1…N, with
diploid genotype data, we observe X copies of the reference allele out of 2N total
chromosomes. For each of samples i = (N+1)(N+M), with sequence level data,
we observe R
i
sequences with the reference allele out of T
i
total sequences. Then,
the likelihood of the population reference allele frequency, p given data
=DXNR T{, ,,}
ii
is given by
Lp
DBXNp
pBRT ppBRT
iN
NM
ii ii
;,,
,,.,,
()
=
()
×
()
+−
()
=+
+
2
12105
1
2ε
(()
+−
()
()
{}
12
pBRT
ii
,,ε
where
B
knp
n
kpp
,, 1
knk
()
()=(−)
is the binomial probability distribution and
ε is a small probability of error, which we set to 0.001. We write
()pD;
for the
log-likelihood. To estimate allele frequencies, for example in Fig . 3 or for the poly
-
genic selection test, we maximized this likelihood numerically for each population.
To s can for selection across the genome, we used the following test. Consider a
single SNP. Assume that we can model the allele frequencies pmod in A modern
populations as a linear combination of allele frequencies in B ancient populations
p
anc
. That is, p
mod
= C p
anc
, where C is an A by B matrix with rows summing to 1.
We have data Dj from population j which is some combination of sequence counts
and genotypes as described above. Then, writing
p
pp pp,
ancmod AB1
=
=
+ the
log-likelihood of the allele frequencies equals the sum of the log-likelihoods for
each population.
()=()
=
+
ppDD,;
j
AB
jj
1
To detect deviations in allele frequency from expectation, we test the null
hypothesis H
0
: p
mod
= C p
anc
against the alternative H
1
: p
mod
unconstrained. We
numerically maximize this likelihood in both the constrained and unconstrained
model and use the fact that twice the difference in log-likelihood is approximately
χA
2
distributed to compute a test statistic and Pvalue.
We defined the ancient source populations by the ‘Selection group 1’ label in
Extended Data Table 1 and Supplementary Table 1 and used the 1000 Genomes
CEU, GBR, IBS and TSI as the present-day populations. We removed SNPs that
were monomorphic in all four of these modern populations as well as in 1000
Genomes Yoruba (YRI). We do not use FIN as one of the modern populations,
because they do not fit this three-population model well. We estimated the pro-
portions of (HG, EF, SA) to be CEU = (0.196, 0.257, 0.547), GBR = (0.362, 0.229,
0.409), IBS = (0, 0.686, 0.314) and TSI = (0, 0.645, 0.355). In practice, we found
that there was substantial inflation in the test statistic, most likely due to unmod-
elled ancestry or additional drift. To address this, we applied a genomic control
correction14, dividing all the test statistics by a constant, λ, chosen so that the
median Pvalue matched the median of the null
4
2
χ
distribution. Excluding sites
in the potentially functional set, we estimated λ = 1.38 and used this value as a
correction throughout. One limitation of this test is that, although it identifies likely
signals of selection, it cannot provide much information about the strengt h or date
of selection. If the ancestral populations in the model are, in fact, close to the real
ancestral populations, then any selection must have occurred after the first admix-
ture event (in this case, after 6500 ), but if the ancestral p opulations are mis-spec-
ified, even this might not be true.
To estimate power, we randomly sampled allele counts from the full data set,
restricting to polymorphic sites with a mean frequency across all populations of <0.1.
We then simulated what would happen if the allele had been under selection
in all of the modern populations by simulating a Wright–Fisher trajectory with
selection for 50, 100 or 200 generations, starting at the observed frequency. We
took the final frequency from this simulation, sampled observations to replace the
actual observations in that population, and counte d the proportion of simulations
that gave a genome-wide significant result after GC correction (Extended Data
Fig. 6a). We resampled sequence counts for the observed distribution for each
population to simulate the effect of increasing sample size, assuming that the
coverage and distribution of the sequences remained the same (Extended Data
Fig. 6b).
We investigated how the genomic control correction responded when we sim-
ulated small amounts of admixture from a highly diverged population (Yoruba;
METHODS
No statistical methods were used to predetermine sample size. The experiments
were not randomized and the investigators were not blinded to allocation during
experiments and outcome assessment.
Ancient DNA analysis. We s creened 433 next-generation sequencing libraries from
270 distinct samples for authentic ancient DNA using previously reported proto-
cols
7
. All libraries that we included in nuclear genome analysis were treated with
uracil-DNA-glycosylase (UDG) to reduce characteristic errors of ancient DNA42.
We p erformed in-solution enrichment for a targeted set of 1,237,207 SNPs using
previously reported protocols
4,7,43
. The targeted SNP set merges 394,577 SNPs first
reported in ref. 7 (390k capture), and 842,630 SNPs first reported in ref. 44 (840k
capture). For 67 samples for which we newly report data in this study, there was
pre-existing 390k capture data7. For these samples, we on ly performed 840k capture
and merged the resulting sequences with previously generated 390k data. For the
remaining samples, we pooled the 390k and 840k reagents together to produce a
single enrichment reagent, which we called 1240k. We attempted to sequence each
enriched library up to the point where we estimated that it was economically inef-
ficient to sequence further. Specifically, we iteratively sequenced more and more
from each sample and only stopp ed when we est imated that the expected increase
in the number of targeted SNPs hit at least once would be less than about one for
every 100 new read pairs generated. After sequencing, we filtered out samples
with <30,000 targeted SNPs covere d at least once, with evidence of contamination
based on mitochondrial DNA polymorphism
43
, a high rate of heterozygosity on
chromosome X despite being male45, or an atypical ratio of X to Y sequences.
Of the targeted SNPs, 47,384 are ‘potentially functional’ sites chosen as follows
(with some overlap): 1,290 SNPs identified as t argets of selection in Europeans by
the Composite of Multiple Signals (CMS) test1; 21,723 SNPS identified as signif-
icant hits by genome-wide association studies, or with known phenotypic effect
(GWAS); 1,289 SNPs with extremely differentiated frequencies between HapMap
populations46 (HiDiff ); 9,116 ‘Immunochip’ SNPs chosen for study of immu-
nity-related phenotypes (Immune); 347 SNPs phenotypically relevant to South
America (mostly altitude adaptation SNPs in EGLN1 and EPAS1), 5,387 SNPs
which tag HLA haplotypes and 13,672 expression quant it ative trait loci47 (eQTL).
Population history analysis. We used two data sets for population history analysis.
‘HO’ consists of 592,169 SNPs, taking the intersection of the SNP targets and the
Human Origins SNP array
4
; we used this data set for co-analysis of present-day
and ancient samples. ‘HOIll’ consists of 1,055,209 SNPs that additionally includes
sites from the Illumina genotype array
48
; we used this data set for analyses only
involving the ancient samples.
On the HO data set, we carried out principal components analysis in smartpca
49
using a set of 777 West Eurasian individuals4, and projected the ancient individuals
with the option ‘lsqproject: YES’. We carried out admixture analysis on a set of 2,345
present-day individuals and the ancient samples after pruning for LD in PLINK
1.9 (https://www.cog-genomics.org/plink2)50 with parameters ‘-indep-pairwise
200 25 0.4’. We varied the number of ancestral populations between K = 2 and
K = 20, and used cross-validation (–cv.) to identify the value of K = 17 to plot in
Extended Data Fig. 2f.
We used ADMIXTOOLS11 to compute f-statistics, determining standard errors
with a block jackknife and default parameters. We used the option ‘inbreed: YES’
when computing f3-stat istics of the form f3(ancient; Ref1, Ref2) as the ancient samples
are represented by randomly sampled alleles rather than by diploid genotypes. For
the same reason, we estimated F
ST
genetic distances between populations on the HO
data set with at least two individuals in smartpca also using the ‘inbreed: YES’ option.
We estimated ancestral proportions as in Supplementary Information section 9
of ref. 7, using a method that fits mixture proportions on a ‘test’ population as a
mixture of n ‘reference’ populations by using f
4
-statistics of the form f
4
(test or ref,
O1; O2, O3) that exploit allele frequency correlations of the test or reference pop-
ulations with triples of outgroup populations We used a set of 15 world outgroup
populations4,7. In Extended Data Fig. 2, we added WHG and EHG as outgroups
for those analyses in which they are not used as reference populations. We plot
tRa
r
esnorm T
2
2
=−
ˆ
the squared 2-norm of the residuals where â is a vector
of n estimated mixture proportions (summing to 1), t is a vector of
()
mm1
2
f4-statistics of the form f4(test, O1; O2, O3) for m outgroups, and R is
a
()
×
mmn
1
2
matrix of the form f4(ref, O1; O2, O3) (Supplementary
Information section 9 of ref. 7).
We determined sex by examining the ratio of aligned reads to the sex chro-
mosomes51. We assigned Y-chromosome haplogroups to males using version
9.1.129 of the nomenclature of the International Society of Genetic Genealogy
(http://www.isogg.org), restricting analysis using samtools52 to sites with map
quality and base quality of at least 30, and excluding two bases at the ends of each
sequenced fragment.
Article reSeArcH
© 2015 Macmillan Publishers Limited. All rights reserved
1000 Genomes YRI) into a randomly chosen modern population. The genomic
inflation factor increases from around 1.38 to around 1.51 with 10% admixture,
but there is little reduction in power (Extended Data Fig. 6c). Finally, we investi-
gated how robust the test was to misspecification of the mixture matrix C. We
re-ran the power simulations using a matrix C = xC + (1 x)R for
x[0,1]
where
R was a random matrix chosen so that for each modern population the mixture
proportions of the three ancient popul ations were jointly uniformly distributed on
[0,1]. Increasing x increases the genomic inflation factor and reduces power,
demonstrating the advantage of explicitly modelling the ancestries of the modern
populations (Extended Data Fig. 6d).
Test for polygenic selection. We implemented the test for polygenic selection
described by ref. 37. This evaluates whether trait-associated alleles, weighted by
their effect size, are over-dispersed compared to randomly sampled alleles, in the
directions associated with the effects measured by genome-wide association stud-
ies (GWAS). For each trait, we obtained a list of significant SNP associations and
effect estimates from GWAS data, and then applied the test both to all populations
combined and to selected pairs of populations. For height, we restricted the list
of GWAS associations to 169 SNPs where we observed at least two chromosomes
in all tested populations (Selection population 2). We estimated frequencies in
each population by computing the maximum likelihood estimate (MLE), using the
likelihood described above. For each test we sampled SNPs, frequency-matched in
20 bins, computed the test stat ist ic Q
X
and for ease of comparison converted t hese
to Z scores, signed according the direction of the genetic effects. Theoretically
QX has a χ2 distribution but in practice, it is over-dispersed. Therefore, we report
bootstrap Pvalues computed by sampling 10,000 sets of frequency-matched
SNPs.
To estimate population-level genetic height in Fig. 4a, we assumed a uniform
prior on [0,1] for the frequency of all height-associated alleles, and then sampled
from the posterior joint frequency distribution of the alleles, assuming they were
independent, using a Metropolis–Hastings sampler with a N(0,0.001) proposal
density. We then multiplied the sampled allele frequencies by the effect sizes to
get a distribution of genetic height.
Code availability. Code implementing the selection analysis is available at
https://github.com/mathii/europe_selection.
42. Briggs, A. W. et al. Removal of deaminated cytosines and detection of in vivo
methylation in ancient DNA. Nucleic Acids Res. 38, e87 (2010).
43. Fu, Q. et al. DNA analysis of an early modern human from Tianyuan Cave,
China. Proc. Natl Acad. Sci. USA 110, 2223–2227 (2013).
44. Fu, Q. et al. An early modern human from Romania with a recent Neanderthal
ancestor. Nature. 524, 216–219 (2015).
45. Korneliussen, T. S., Albrechtsen, A. & Nielsen, R. ANGSD: analysis of next
generation sequencing data. BMC Bioinformatics 15, 356 (2014).
46. International HapMap Consortium. A second generation human haplotype
map of over 3.1 million SNPs. Nature 449, 851–861 (2007).
47. Lappalainen, T. et al. Transcriptome and genome sequencing uncovers
functional variation in humans. Nature 501, 506–511 (2013).
48. Li, J. Z. et al. Worldwide human relationships inferred from genome-wide
patterns of variation. Science 319, 1100–1104 (2008).
49. Loh, P. R. et al. Inferring admixture histories of human populations using
linkage disequilibrium. Genetics 193, 1233–1254 (2013).
50. Chang, C. C. et al. Second-generation PLINK: rising to the challenge of larger
and richer datasets. GigaScience 4 (2015).
51. Skoglund, P., Storå, J., Götherström, A. & Jakobsson, M. Accurate sex
identication of ancient human remains using DNA shotgun sequencing.
J. Archaeol. Sci. 40, 4477–4482 (2013).
52. Li, H. et al. The sequence alignment/map format and SAMtools. Bioinformatics
25, 2078–2079 (2009).
53. Norton, H. L. et al. Genetic evidence for the convergent evolution of light skin in
Europeans and East Asians. Mol. Biol. Evol. 24, 710–722 (2007).
54. Bokor, S. et al. Single nucleotide polymorphisms in the FADS gene cluster are
associated with delta-5 and delta-6 desaturase activities estimated by serum
fatty acid ratios. J. Lipid Res. 51, 2325–2333 (2010).
55. Tanaka, T. et al. Genome-wide association study of plasma polyunsaturated
fatty acids in the InCHIANTI Study. PLoS Genet. 5, e1000338 (2009).
56. Ahn, J. et al. Genome-wide association study of circulating vitamin D levels.
Hum. Mol. Genet. 19, 2739–2745 (2010).
57. Gründemann, D. et al. Discovery of the ergothioneine transporter. Proc. Natl
Acad. Sci. USA 102, 5256–5261 (2005).
58. Chauhan, S. et al. ZKSCAN3 is a master transcriptional repressor of autophagy.
Mol. Cell 50, 16–28 (2013).
59. Soler Artigas, M. et al. Genome-wide association and large-scale follow up identies
16 new loci inuencing lung function. Nature Genet. 43, 1082–1090 (2011).
60. Pruim, R. J. et al. LocusZoom: regional visualization of genome-wide
association scan results. Bioinformatics 26, 2336–2337 (2010).
Article
reSeArcH
© 2015 Macmillan Publishers Limited. All rights reserved
6KRWJXQGDWD
/LWHUDWXUH
&DSWXUHGDWD
1HZO\UHSRUWHG
$YHUDJHQXPEHURIUDZUHDGSDLUVKXQGUHGVRI
WKRXVDQGVVLPLODUWRFRVWLQGROODUV 
0HGLDQFRYHUDJHRQDQDO\]HG613V 


1XPEHURIVHTXHQFHVLQKXQGUHGVRIWKRXVDQGV
0HDQFRYHUDJHRQDQDO\VHG613V
         











 6KRWJXQGDWDQ 
&DSWXUHGDWDQ 
Extended Data Figure 1 | Efficiency and cost-effectiveness of 1240k
capture. We plot the number of raw sequences against the mean coverage
of analysed SNPs after removal of duplicates, comparing the 163 samples
for which capture data are newly reported in this study, against the 102
samples analysed by shotgun sequencing in ref. 5. We caution that the true
cost is more than that of sequencing alone.
Article reSeArcH
© 2015 Macmillan Publishers Limited. All rights reserved
Extended Data Figure 2 | Early isolation and later admixture between
farmers and steppe populations. a, Mainland European populations later
than 3000  are better modelled with steppe ancestry as a third ancestral
population, (closer correspondence between empirical and estimated
f4-statistics as estimated by resnorm; Methods). b, Later (post-Poltavka)
steppe populations are better modelled with Anatolian Neolithic as a
third ancestral population. c, Estimated mixture proportions of mainland
European populations without steppe ancestry. d, Estimated mixture
proportions of Eurasian steppe populations without Anatolian Neolithic
ancestry. e, Estimated mixture proportions of later populations with
both steppe and Anatolian Neolithic ancestry. f, Admixture plot at k = 17
showing population differences over time and space. EN, Early Neolithic;
MN, Middle Neolithic; LN, Late Neolithic; BA, Bronze Age; LNBA, Late
Neolithic and Bronze Age.
0.0000 0.0005 0.0010 0.0015
2468
resnorm
9 populations
Sintashta
Srubnaya
Andronovo
Potapovka
Samara_Eneolithic
Poltavka
Afanasievo
Yamnaya_Kalmykia
Yamnaya_Samara N=2(Armenian+EHG)
N=3(Anatolia_Neolithic+Armenian+EHG)
0e+00 1e−04 2e−04 3e−04 4e−04 5e−04 6e−04
24681012
resnorm
12 populations
Central_LNBA
Northern_LNBA
Bell_Beaker_LN
Remedello
Hungary_BA
Iceman
Iberia_MN
Central_MN
Iberia_EN
Iberia_Chalcolithic
LBK_EN
Hungary_EN N=2(Anatolia_Neolithic+WHG)
N=3(Anatolia_Neolithic+WHG+Yamnaya_Samara)
Hungary_EN
Remedello
Iceman
Iberia_EN
LBK_EN
Iberia_MN
Central_MN
Iberia_Chalcolithic
Anatolia_Neolithic
WHG
Samara_Eneolithic
Yamnaya_Kalmykia
Yamnaya_Samara
Afanasievo
Poltavka
Armenian
EHG
Andronovo
Potapovka
Srubnaya
Hungary_BA
Northern_LNBA
Sintashta
Bell_Beaker_LN
Central_LNBA
Anatolia_Neolithic
WHG
Yamnaya_Samara
ab
cd
e
K=17
Central_LNBA_outlier
Hungary_BA
Anatolia_Neolithic_outlier
Anatolia_Neolithic
Hungary_EN
LBK_EN
Remedello
Iberia_Chalcolithic
Central_MN
Iberia_MN
Iberia_EN
Iceman
EHG
Motala_HG
Scythian_IA
Poltavka
Yamnaya_Samara
Afanasievo
Yamnaya_Kalmykia
Potapovka
Russia_EBA
Andronovo_outlier
Samara_Eneolithic
Srubnaya_outlier
WHG
Bell_Beaker_LN
Northern_LNBA
Srubnaya
Andronovo
Central_LNBA
Sintashta
Poltavka_outlier
f
Article
reSeArcH
© 2015 Macmillan Publishers Limited. All rights reserved





ORJSïYDOXH



5HFRPELQDWLRQUDWH
F00E
UV




U
$&06'
&&17ï$6
&&17
0$3.
5$%*$3
=5$1%
5+'0
0,5ï
8%;1
/&7
/2&
0&0
'$56
/2&
&;&5
   
3RVLWLRQRQFKU0E



ORJSïYDOXH




5HFRPELQDWLRQUDWH
F00E
● ●
UV




U
''%
'$.
&<%$
70(0
70(0
&36)
6'+$)
3335
0,5
/55&%
6<7
/2&
53/33
'$*/$
0<5)
'.)=3.
70(0
0,5
)(1
)$'6
0,5
)$'6
)$'6
0,5
5$%,/
%(67
)7+
,1&(13
6&*%'
6&*%$
6&*%'
6&*%$
6&*%'
  
3RVLWLRQRQFKU0E



ORJSïYDOXH




5HFRPELQDWLRQUDWH
F00E
UV




U
6+$1.
6+$1.ï$6
0,5
)/-
'+&5
1$'6<1
0,5
.57$3ï
.57$3ï
.57$3ï
.57$3ï
.57$3ï
)$0&
$/*/3
=1)(
'()%%
/2&
/2&
51)
,/%3
  
3RVLWLRQRQFKU0E
/&7 )$'6 '+&51$'6<1


ORJSïYDOXH




5HFRPELQDWLRQUDWH
F00E
UV




U
)1,3
/2&
$&6/ ,/
&6)
3+$ï$6
3+$
0,5
3'/,0
6/&$
/2&
0,5
6/&$
&RUI
,5)
,/
5$'
,/
,/
.,)$
&&1,
6(37
62:$+$
6+5220
    
3RVLWLRQRQFKU0E



ORJSïYDOXH




5HFRPELQDWLRQUDWH
F00E
UV




U
&8;
0,5
)$0$
6+%
$7;1
%5$3
$&$'
$/'+
0,5
0$3.$3.ï$6
0$3.$3.
$'$0$
70(0
(53
1$$
    
3RVLWLRQRQFKU0E




ORJSïYDOXH




5HFRPELQDWLRQUDWH
F00E
UV




U
7$56 $'$076
5;)3
6/&$
$0$&5
&471)ï$0$&5
&471)
  
3RVLWLRQRQFKU0E
6/&$$7;1 6/&$


ORJSïYDOXH




5
HFRPELQDWLRQUDWH
F00E
UV




U
5$% &76& *50ï$6
*50
7<5 12;
  
3RVLWLRQRQFKU0E

ORJSïYDOXH




5HFRPELQDWLRQUDWH
F00E
UV




U
2&$ +(5& *2/*$)
*2/*$*
0,5ï
0,5ï
0,5ï
+(5&3
*2/*$0
:+$003
/2&
3'&',33
    
3RVLWLRQRQFKU0E



ORJSïYDOXH



5HFRPELQDWLRQUDWH
F00E
UV




U
/,1& ./)ï$6
./)
7/5
7/5
7/5
)$0$
0,5
70(0
./+/
0,5+
:'5
5)&
    
3RVLWLRQRQFKU0E
*50 +(5&2&$ 7/5


ORJSïYDOXH




5HFRPELQDWLRQUDWH
F00E
UV




U
+,67+$/
+,67+%
+,67+,
+,67+/
+,67+-
+,67+$0
+,67+%2
25%
25%
=1)
=6&$13
=6&$1ï$6
=6&$1
=.6&$1
=1)3
72%3
=6&$1
=.6&$1
1.$3/
=6&$1
3*%'
=6&$1
=.6&$1
=6&$1
=6&$1
*3;
*3;
=%('
/2&
   
3RVLWLRQRQFKU0E

ORJSïYDOXH




5HFRPELQDWLRQUDWH
F00E
UV




U
753& /,1& 8)0 /,1& )5(0
   
3RVLWLRQRQFKU0E
=.6&$1 &KU0E

ORJSïYDOXH




5
HFRPELQDWLRQUDWH
F00E
UV




U
25+
0$6/
/,1&
8%'
6125'%
25+
*$%%5
02*
=)3
+/$ï)
+/$ï)ï$6
,),703
+&*
/2&
+/$ï*
+/$ï+
+&*%
+/$ï$
+&*
=15'ï$6
+/$ï-
+&*
=15'
3335
51)
75,0
75,0
75,0
75,0
75,0
+&*
+/$ï/
0,5
+&*
75,0
75,0ï533
533
   


3RVLWLRQRQFKU0E
0+&&KU0E


ORJSïYDOXH



5HFRPELQDWLRQUDWH
F00E
UV




U
JHQHV
RPLWWHG
''5
0,5
*7)+
9$56
6)7$
'3&5
08&
08&
+&*
&RUI
36256&
&'61
36256&
&&+&5
7&)
328)
36256&
+&*
+/$ï&
+/$ï%
0,5
0,&$
+&3
+&*
0,&%
0&&'
'';%
$739*ï'';%
6125'
6125'
$739*
1).%,/
/7$
71)
/7%
$,)
%$*
$320
&RUI
*3$1.
&61.%
/<*%
/<*&
$%+'$
0,5
/<*)
&/,&
06+
6$3&'
9:$
9$56
/60
+63$/
+63$$
+63$%
&RUI
   
3RVLWLRQRQFKU0E
0+&&KU0E



ORJSïYDOXH



5HFRPELQDWLRQUDWH
F00E
UV




U
JHQHV
RPLWWHG
*3$1.
&61.%
/<*%
/<*&
$%+'$
0,5
/<*)
/<*(
/<*'
/<*&
/60
+63$/
+63$$
+63$%
&RUI
6125'
6125'
1(8
6/&$
(+07
&
=%7%
&)%
1(/)(
0,5
6.,9/
';2
67.
&$
&%
&%B
&<3$
71;%
$7)%
).%3/
3557
/2&
337
337ï(*)/
(*)/
$*3$7
0,5
51)
&RUI
+&*
%71/
+/$ï'5$
+/$ï'5%
+/$ï'5%
+/$ï'5%
+/$ï'4$
+/$ï'4%
 
3RVLWLRQRQFKU0E
0+&&KU0E


ORJSïYDOXH


5HFRPELQDWLRQUDWH
F00E
UV




U
&RUI
+&*
%71/
+/$ï'5$
+/$ï'5%
+/$ï'5%
+/$ï'5%
+/$ï'4$
+/$ï'4%
+/$ï'4$
+/$ï'4%
+/$ï'2%
7$3
360%
360%ï$6
7$3
360%
/2&
+/$ï'0%
+/$ï'0$
%5'
+/$ï'2$
+/$ï'3$
+/$ï'3%
+/$ï'3%
&2/$
5;5%
6/&$
+6'%
0,5$
5,1*
+&*
936
  
3RVLWLRQRQFKU0E
0+&&KU0E
Extended Data Figure 3 | Regional association plots. Locuszoom60
plots for genome-wide significant signals. Points show the –log10
P value for each SNP, coloured according to their linkage disequilibrium
(LD; units of r2) with the most associated SNP. The blue line shows the
recombination rate, with scale on right hand axis in centimorgans per
megabase (cM/Mb). Genes are shown in the lower panel of each subplot.
Article reSeArcH
© 2015 Macmillan Publishers Limited. All rights reserved
  
'+&51$'6<1UV
  
$7;1UV
  
6/&$UV
  
6/&$UV
  
*50UV
  
=.6&$1UV

 
FKUUV

 
6/&$UV
1RUWKZHVW(XURSH&(8
U ) U
U ) U U U
U ) U U
U U ]
U
D
Extended Data Figure 4 | PCA of selection populations and derived
allele frequencies for genome-wide significant signals. a, Ancient
samples projected onto principal components of modern samples, as
in Fig. 1, but labelled according to selection populations defined in
Extended Data Table 1. b, Allele frequency plots as in Fig. 3. Six signals
not included in Fig. 3—for SLC22A4 we show both rs272872, which is our
strongest signal, and rs1050152, which was previously hypothesized to be
under selection, and we also show SLC24A5, which is not genome-wide
significant but is discussed in the main text.
Article
reSeArcH
© 2015 Macmillan Publishers Limited. All rights reserved
Extended Data Figure 5 | Motala haplotypes carrying the derived,
selected EDAR allele. This figure compares the genotypes at all sites
within 150kb of rs3827760 (in blue) for the 6 Motala samples and 20
randomly chosen CHB (Chinese from Beijing) and CEU (Utah residents
with northern and western European ancestry) samples. Each row is a
sample and each column is a SNP. Grey means homozygous for the major
(in CEU) allele. Pink denotes heterozygous and red indicates homozygous
for the other allele. For the Motala samples, an open circle means that
there is only a single sequence, otherwise the circle is coloured according
to the number of sequences observed. Three of the Motala samples
are heterozygous for rs3827760 and the derived allele lies on the same
haplotype background as in present-day East Asians. The only other
ancient samples with evidence of the derived EDAR allele in this data set
are two Afanasievo samples dating to 3300–3000 , and one Scythian
dating to 400–200  (not shown).
Article reSeArcH
© 2015 Macmillan Publishers Limited. All rights reserved





     
V
3RZHU





     
V
3RZHU
$GPL[WXUHSURSRUWLRQ
*HQRPLFLQIODWLRQIDFWRU
     
     
   
3RZHU
5DQGRPSURSRUWLRQ
*HQRPLFLQIODWLRQIDFWRU
     
     
   
3RZHU
D E
F
G
JHQHUDWLRQVRIVHOHFWLRQ
JHQHUDWLRQVRIVHOHFWLRQ
JHQHUDWLRQVRIVHOHFWLRQ
[DVPXFKGDWD
7KLVVWXG\
[DVPXFKGDWD
3RZHU
*HQRPLFLQIODWLRQ
3RZHU
*HQRPLFLQIODWLRQ
[DVPXFKGDWD
[DVPXFKGDWD
Extended Data Figure 6 | Estimated power of the selection scan.
a, Estimated power for different selection coefficients (s) for a SNP
that is selected in all populations for either 50, 100 or 200 generations.
b, Effect of increasing sample size, showing estimated power for a SNP
selected for 100 generations, with different amounts of data, relative to
the main text. c, Effect of admixture from Yoruba (YRI) into one of the
modern populations, showing the effect on the genomic inflation
factor (blue, left axis) and the power to detect selection on a SNP
selected for 100 generations with a selection coefficient of 0.02. d, Effect
of mis-specification of the mixture proportions. Here 0 on the x axis
corresponds to the proportions we used, and 1 corresponds to a random
mixture matrix.
Article
reSeArcH
© 2015 Macmillan Publishers Limited. All rights reserved
Extended Data Table 1 | 230 ancient individuals analysed in this study
Population, samples grouped by a combination of date, source, archaeology and genetics; Date range, approximate date range of samples in this group; N, number of individuals sampled; Out,
number of PCA outliers (marked with an asterisk if used in selection analysis); Rel, number of related individuals removed; E N Chr, average over sites of the eective number of chromosomes when
we use genotype likelihoods, computed as 2 per called site for samples with genotype calls, or 2 0.5(c1) for samples with read depth c; Selection population 1, coarse population labels (marked
with a caret if not used in genome-wide scan); Selection population 2, ne population labels. E/M/LN, Early/Middle/Late Neolithic; LBK, Linearbandkeramik; E/S/WHG, Eastern/Scandinavian/Western
hunter-gatherer; EBA, Early Bronze Age; IA, Iron Age.


    





 ,7# 

04#-#" ,7# 

/#40-+#"'0-+4*+% ,7#    

5/)#27",7#    

",7#    

'/42#-",7# 

$'2+#" ,7# 

$'2+#" ,7# 

$'2+#"*#-%0-+4*+% ,7#    

'.'&'--0,7# 

%'.#/,7# 

'/42#-" ,7#    

#./#7#"#.#2# ,7# 

#./#7#"#-.7,+# ,7# 

(#/#3+'60,7# 

0-4#6,#,7#   

+/4#3*4# ,7# 

04#106,#,7# 

25$/#7#,7#  

/&20/060,7#   

533+#",7# 

024*'2/" ,7#    

'--"'#,'2",7#   !

5/)#27",7#   !

 ,7#  

#.#2#"/'0-+4*+% ,7#  

%74*+#/" ,7#  
 



    
 ,7#   #2-7#2.'2 
 ,7# 5/4'2)#4*'2'2 
 ,7#   4'11'/%'3427 


    
 ,7#   /#40-+#/'0-+4*+% 
  ,7#   '/42#-5201'#/#2-7#/&+&&-''0-+4*+%
  ,7#   $'2+#/'0-+4*+%#/&*#-%0-+4*+%
 ,7# 5/4'2)#4*'2'2 
  ,7#   '/42#-5201'#/#4''0-+4*+%#/&20/8')'
  ,7#   4'11'
Article reSeArcH
© 2015 Macmillan Publishers Limited. All rights reserved
Extended Data Table 2 | Key f-statistics used to support claims about population history
    
!
    
1%62/-%$)2/-6,-' $!,-03 
%4/;7423)%1%40)45,%(024)!
%1')564;6,%11%62/-%1)2/-6,-'
1%62/-%$)2/-6,-' 71+%4;$!,-03
1%62/-%$)2/-6,-' &)4-%$ ! ,-03  
&)4-%$ &)4-%$,%/'2/-6,-' ! ,-03 &)4-%1,%/'2/-6,-',%(024)!
%1')564;6,%1&)4-%1%4/;)2/-6,-'
&)4-%$ &)4-%$,%/'2/-6,-' ! ,-03 
&)4-%1,%/'2/-6,-'(-(126,%8)024)
!%1')564;6,%1&)4-%1-((/)
)2/-6,-'
 %0%4%$1)2/-6,-'  ,-03 -456(-/76-212*1'-)16246,74%5-%1
%1')564;34-24626,)421<)+)
"%01%;%'7/674)
 "%01%;%$ %0%4%  ,-03  
%0%4%$1)2/-6,-' "%01%;%$ %0%4%  ,-03 
 "%01%;%$ %0%4% 40)1-%1,-03 
2164-&76-212*)%4%56)41%1')564;62
6,)421<)+)"%01%;%'7/674)
 "%01%;%$%/0;.-% 40)1-%1,-03 
%0%4%$1)2/-6,-' "%01%;%$ %0%4% 40)1-%1,-03 
 2/6%8.%40)1-%1 ,-03  
"%01%;%$ %0%4% "%01%;%$%/0;.-%  ,-03   6%&-/-6;2*1'-)16246,74%5-%1
%1')564;&)69))1%4/;421<)+)
"%01%;%*420%/0;.-%%1( %0%4%
%1(6,)-((/)421<)+)2/6%8.%
"%01%;%$ %0%4% 2/6%8.%,-03 
"%01%;%$%/0;.-% 2/6%8.%,-03
"%01%;%$ %0%4% 47&1%;%  ,-03  )'21((-/76-212*1'-)16246,
74%5-%1%1')564;34-24626,)%6)
421<)+) 47&1%;%'7/674)
"%01%;%$%/0;.-% 47&1%;%  ,-03 
2/6%8.%47&1%;%  ,-03 
"%01%;%$ %0%4% 47&1%;% $,-0344-8%/2*%4/;7423)%1%40)44)/%6)(
%1')564;34-24626,)%6)421<)+)
47&1%;%'7/674) 6%6-56-'59-6,
1%62/-%$)2/-6,-'-156)%(2*$
%4)5-0-/%4#1265,291
"%01%;%$%/0;.-% 47&1%;% $,-03
2/6%8.%47&1%;% $,-03 
 "%01%;%$ %0%4% 40)1-%1$  
-**)4)165274')2*(-/76-212*1'-)16
246,74%5-%1%1')564;34-24626,)
"%01%;%)%4%56)418534-24626,)
47&1%;%%4/;7423)%1%40)4
4)/%6)(
 "%01%;%$%/0;.-% 40)1-%1$  
 2/6%8.%40)1-%1 $ 
"%01%;%$ %0%4% 47&1%;% 40)1-%1$  
"%01%;%$%/0;.-% 47&1%;% 40)1-%1$  
2/6%8.%47&1%;% 40)1-%1$  
 



 


!
   
! 1%62/-%$)2/-6,-' 71+%4;$ %4/;7423)%1*%40)459)4)*240)(&;
%(0-:674)&)69))11%62/-%)2/-6,-'%1(
!6,)1215-+1-*-'%165-+1%/-16,)
&)4-%1$0%;&)(7)62+)1)6-'(4-*6
53)'-*-'626,-53237/%6-21
! 1%62/-%$)2/-6,-' $ 
! 1%62/-%$)2/-6,-' &)4-%$ 
 40)1-%12/6%8.% %4/;%1(-((/)421<)+)56)33)
3%5624%/-5659)4)*240)(&;%(0-:674)
&)69))1%1(%3237/%6-212*)%4
%56)41%1')564;
 40)1-%1"%01%;%$%/0;.-%  
 40)1-%1"%01%;%$ %0%4%  
$"%01%;%$ %0%4% 47&1%;% 
47&1%;%9%5*240)(&;%(0-:674)
&)69))13237/%6-2154)/%6)(62"%01%;%
%1(%4/;7423)%1%40)45
Article
reSeArcH
© 2015 Macmillan Publishers Limited. All rights reserved
Extended Data Table 3 | Twelve genome-wide significant signals of selection



+, 

-, * +,$,- ( 
+, 

%$(*$"' (--$)(
+, 
+ "$)(
''.($-0
+,   

--0$' -)&$,'
+, 

''.($-0
+,  

(%()/(
+,  

$-'$(' -)&$,'
+,  

%$(*$"' (--$)(
+, 

+")-#$)( $( -+(,*)+-
+, 

.-)*#"0.("!.(-$)(
+,  

(%()/(
+,  

0 )&)+


Chromosome/Position/Range, co-ordinates (hg19) of the SNP with the most signicant signal, and the approximate range in which genome-wide signicant SNPs are found. Genes, genes in which
the top SNP is located, and selected nearby genes. Potential function, function of the gene, or specic trait under selection. Marked with an asterisk if the signal was still genome-wide signicant in an
analysis that used only the populations that correspond best to the three ancestral populations (WHG, Anatolian Neolithic and Bronze Age steppe), resulting in a less powerful test with the eective
number of chromosomes analysed at the average SNP reduced from 125 to 50, a genomic control correction of 1.32, and ve genome-wide signicant loci that are a subset of the original twelve.
Refs 53–59 are cited in this table.
... Palaeogenomic studies have revealed that the genetic trajectory of modern Europeans has been strongly shaped by the expansion into Europe of early farmers from the Aegean Basin and western Anatolia starting around 8,600 YBP (1)(2)(3), introducing agriculture in regions that were previously dominated by hunter-gatherer lifestyles (4). The question of whether the Neolithic transition was primarily a cultural process, in which hunter-gatherer groups adopted farming from neighboring communities (5), or instead a demic process, in which farmers migrated from Southwest Asia into Europe (6), has long been discussed, with studies highlighting the regional role of both processes (e.g. ...
... Subsequent genomic studies supported these findings (e.g. 9), successfully tracing the ancestral roots of early Neolithic populations in Europe to the area encompassing Western Anatolia and the Aegean basin (2,3). It was further established that the Early Neolithic inhabitants of present-day Iran differed genetically from Aegean and European farmers, demonstrating that the Neolithic migration chain did not begin in the Fertile Crescent as some researchers have long assumed (10). ...
... [19][20][21]. Some palaeogenomic data support the assertion that hunter-gatherers and farmers only admixed sporadically at the outset of the Neolithic (3,22). Other studies indicate that admixture intensified at later stages of the Neolithic (starting approximately 7,000 YBP, 17,18,21,23), possibly after the so-called 'crisis' of the LBK (Linearbandkeramik). ...
Preprint
Full-text available
The replacement of hunter-gatherer lifestyles by agriculture represents a pivotal change in human history. The initial stage of this Neolithic transition in Europe was instigated by the migration of farmers from Anatolia and the Aegean basin. In this study, we modeled the expansion of Neolithic farmers into Central Europe from Anatolia, along the Continental route of dispersal. We employed spatially explicit simulations of palaeogenomic diversity and high-quality palaeogenomic data from 67 prehistoric individuals to assess how population dynamics between indigenous European hunter-gatherers and incoming farmers varied across space and time. Our results demonstrate that admixture between the two groups increased locally over time at each stage of the Neolithic expansion along the Continental route. We estimate that the effective population size of farmers was about five times that of the hunter-gatherers. Additionally, we infer that sporadic long distance migrations of early farmers contributed to their rapid dispersal, while competitive interactions with hunter-gatherers were limited. Teaser The first farmers of continental Europe increasingly admixed over time with indigenous hunter-gatherers.
... Therefore, selection might have acted against the epigenetic inactivation of the lactase gene during infancy. Indeed, some authors report evidence of intensive selective pressure favoring lactase persistence [4][5][6][7][8][9][10][11][12]. It is most likely the cause for many populations throughout Europe, Asia, and Africa, showing a high frequency of lactase persistence. ...
... Amplified libraries were purified using SPRI technology 61 as described in Gansauge et al. (2020). Libraries were enriched using a bait set targeting approximately 1.2 M SNP positions (in-solution hybridization capture) 32,62,63 . Libraries that were prepared at the Core Unit were subject to two consecutive rounds of capture as described in Fu et al. (2013). ...
Preprint
Full-text available
Archaeological and archaeogenetic studies have highlighted the pivotal role of the Caucasus region throughout prehistory, serving as a central hub for cultural, technological, and linguistic innovations. However, despite its dynamic history, the critical area between the Greater and Lesser Caucasus mountain ranges, mainly corresponding to modern-day Georgia, has received limited attention. Here, we generated an ancient DNA time transect consisting of 219 individuals with genome-wide data from 47 sites in this region, supplemented by 97 new radiocarbon dates. Spanning from the Early Bronze Age 5000 years ago to the so-called "Migration Period" that followed the fall of the Western Roman Empire, we document a largely persisting local gene pool that continuously assimilated migrants from Anatolia/Levant and the populations of the adjacent Eurasian steppe. More specifically, we observe these admixture events as early as the Middle Bronze Age. Starting with Late Antiquity (late first century AD), we also detect an increasing number of individuals with more southern ancestry, more frequently associated with urban centers - landmarks of the early Christianization in eastern Georgia. Finally, in the Early Medieval Period starting 400 AD, we observe genetic outlier individuals with ancestry from the Central Eurasian steppe, with artificial cranial deformations (ACD) in several cases. At the same time, we reveal that many individuals with ACD descended from native South Caucasus groups, indicating that the local population likely adopted this cultural practice.
... Captured libraries were sequenced either on a HiSeq4000 with 75 bp single-end sequencing chemistry (1 × 76 + 8 + 8 cycles) or on a NextSeq500 with 75 bp paired-end sequencing chemistry (2 × 76 + 8 + 8 cycles). A further 41 libraries from the sites of St. Rombout's cemetery in Mechelen and Laguna de los Cóndores were captured using the 1,240k SNP capture array 72,81,82 , which is widely used in genome-wide studies of human ancient DNA (Supplementary Methods 3). ...
Article
Full-text available
Malaria-causing protozoa of the genus Plasmodium have exerted one of the strongest selective pressures on the human genome, and resistance alleles provide biomolecular footprints that outline the historical reach of these species¹. Nevertheless, debate persists over when and how malaria parasites emerged as human pathogens and spread around the globe1,2. To address these questions, we generated high-coverage ancient mitochondrial and nuclear genome-wide data from P. falciparum, P. vivax and P. malariae from 16 countries spanning around 5,500 years of human history. We identified P. vivax and P. falciparum across geographically disparate regions of Eurasia from as early as the fourth and first millennia bce, respectively; for P. vivax, this evidence pre-dates textual references by several millennia³. Genomic analysis supports distinct disease histories for P. falciparum and P. vivax in the Americas: similarities between now-eliminated European and peri-contact South American strains indicate that European colonizers were the source of American P. vivax, whereas the trans-Atlantic slave trade probably introduced P. falciparum into the Americas. Our data underscore the role of cross-cultural contacts in the dissemination of malaria, laying the biomolecular foundation for future palaeo-epidemiological research into the impact of Plasmodium parasites on human history. Finally, our unexpected discovery of P. falciparum in the high-altitude Himalayas provides a rare case study in which individual mobility can be inferred from infection status, adding to our knowledge of cross-cultural connectivity in the region nearly three millennia ago.
... We report genome-wide SNP data for nine individuals from the ST site Rostovka, new data for two BOO individuals, and shotgun genome data for five already published BOO individuals (Table 1). We performed 1240k SNP 22,23 and mitochondrial genome captures on the nine individuals from ROT, and the two new BOO individuals, as well as Y-chromosomal capture 24 on just the males. Lastly, we generated shotgun sequence data for five published BOO individuals, including one 40× covered individual (Fig. 1a . ...
Article
Full-text available
The Eurasian Bronze Age (BA) has been described as a period of substantial human migrations, the emergence of pastoralism, horse domestication, and development of metallurgy. This study focuses on two north Eurasian sites sharing Siberian genetic ancestry. One of the sites, Rostovka, is associated with the Seima-Turbino (ST) phenomenon (~2200-1900 BCE) that is characterized by elaborate metallurgical objects found throughout Northern Eurasia. The genetic profiles of Rostovka individuals vary widely along the forest-tundra Siberian genetic cline represented by many modern Uralic-speaking populations, and the genetic heterogeneity observed is consistent with the current understanding of the ST being a transcultural phenomenon. Individuals from the second site, Bolshoy Oleni Ostrov in Kola, in comparison form a tighter cluster on the Siberian ancestry cline. We further explore this Siberian ancestry profile and assess the role of the ST phenomenon and other contemporaneous BA cultures in the spread of Uralic languages and Siberian ancestry.
... In the last decade, advancements in molecular biology technology have facilitated the isolation and sequencing of DNA extracted from ancient remains, thereby deepening our understanding of past populations and revealing intriguing and sometimes contradictory insights, especially regarding the most common SEE Y haplogroup-I2a. Ancient DNA proof confirmed haplogroup I as one of the oldest European haplogroups-it has been detected in Palaeolithic hunter-gatherers from Switzerland, Hungary, and Scandinavia, as well as in Neolithic and Bronze Age samples from Hungary, Germany, and Iberia [8,12,14,[52][53][54][55][56]. For example, complete NRY sequences of five male individuals (one from Luxemburg and four from Sweden), dated to 8 kya, showed that all five belonged to the I haplogroup [53]. ...
Article
Full-text available
Due to its turbulent demographic history, marked by extensive settlement and gene flow from diverse regions of Eurasia, Southeastern Europe (SEE) has consistently served as a genetic crossroads between East and West and a junction for the migrations that reshaped Europe’s population. SEE, including modern Croatian territory, was a crucial passage from the Near East and even more distant regions and human populations in this region, as almost any other European population represents a remarkable genetic mixture. Modern humans have continuously occupied this region since the Upper Paleolithic era, and different (pre)historical events have left a distinctive genetic signature on the historical narrative of this region. Our views of its history have been mostly renewed in the last few decades by extraordinary data obtained from Y-chromosome studies. In recent times, the international research community, bringing together geneticists and archaeologists, has steadily released a growing number of ancient genomes from this region, shedding more light on its complex past population dynamics and shaping the genetic pool in Croatia and this part of Europe.
Article
The Merovingian period (5th to 8th cc AD) was a time of demographic, socioeconomic, cultural, and political realignment in Western Europe. Here, we report the whole-genome shotgun sequence data of 30 human skeletal remains from a coastal Late Merovingian site of Koksijde (675 to 750 AD), alongside 18 remains from two Early to Late Medieval sites in present-day Flanders, Belgium. We find two distinct ancestries, one shared with Early Medieval England and the Netherlands, while the other, minor component, reflecting likely continental Gaulish ancestry. Kinship analyses identified no large pedigrees characteristic to elite burials revealing instead a high modularity of distant relationships among individuals of the main ancestry group. In contrast, individuals with >90% Gaulish ancestry had no kinship links among sampled individuals. Evidence for population structure and major differences in the extent of Gaulish ancestry in the main group, including in a mother–daughter pair, suggests ongoing admixture in the community at the time of their burial. The isotopic and genetic evidence combined supports a model by which the burials, representing an established coastal nonelite community, had incorporated migrants from inland populations. The main group of burials at Koksijde shows an abundance of >5 cM long shared allelic intervals with the High Medieval site nearby, implying long-term continuity and suggesting that similarly to Britain, the Early Medieval ancestry shifts left a significant and long-lasting impact on the genetic makeup of the Flemish population. We find substantial allele frequency differences between the two ancestry groups in pigmentation and diet-associated variants, including those linked with lactase persistence, likely reflecting ancestry change rather than local adaptation.
Article
Full-text available
The third millennium BCE was a pivotal period of profound cultural and genomic transformations in Europe associated with migrations from the Pontic-Caspian steppe, which shaped the ancestry patterns in the present-day European genome. We performed a high-resolution whole-genome analysis including haplotype phasing of seven individuals of a collective burial from ~2500 cal BCE and of a Bell Beaker individual from ~2300 cal BCE in the Paris Basin in France. The collective burial revealed the arrival in real time of steppe ancestry in France. We reconstructed the genome of an unsampled individual through its relatives’ genomes, enabling us to shed light on the early-stage admixture patterns, dynamics, and propagation of steppe ancestry in Late Neolithic Europe. We identified two major Neolithic/steppe-related ancestry admixture pulses around 3000/2900 BCE and 2600 BCE. These pulses suggest different population expansion dynamics with striking links to the Corded Ware and Bell Beaker cultural complexes.
Article
Full-text available
The ancient city of Chichén Itzá in Yucatán, Mexico, was one of the largest and most influential Maya settlements during the Late and Terminal Classic periods (ad 600–1000) and it remains one of the most intensively studied archaeological sites in Mesoamerica1–4. However, many questions about the social and cultural use of its ceremonial spaces, as well as its population’s genetic ties to other Mesoamerican groups, remain unanswered². Here we present genome-wide data obtained from 64 subadult individuals dating to around ad 500–900 that were found in a subterranean mass burial near the Sacred Cenote (sinkhole) in the ceremonial centre of Chichén Itzá. Genetic analyses showed that all analysed individuals were male and several individuals were closely related, including two pairs of monozygotic twins. Twins feature prominently in Mayan and broader Mesoamerican mythology, where they embody qualities of duality among deities and heroes⁵, but until now they had not been identified in ancient Mayan mortuary contexts. Genetic comparison to present-day people in the region shows genetic continuity with the ancient inhabitants of Chichén Itzá, except at certain genetic loci related to human immunity, including the human leukocyte antigen complex, suggesting signals of adaptation due to infectious diseases introduced to the region during the colonial period.
Preprint
Full-text available
Scalable methods for estimating marginal coalescent trees across the genome present new opportunities for studying evolution and have generated considerable excitement, with new methods extending scalability to thousands of samples. Benchmarking of the available methods has revealed general tradeoffs between accuracy and scalability, but performance in downstream applications has not always been easily predictable from general performance measures, suggesting that specific features of the ARG may be important for specific downstream applications of estimated ARGs. To exemplify this point, we benchmark ARG estimation methods with respect to a specific set of methods for estimating the historical time course of a population-mean polygenic score (PGS) using the marginal coalescent trees encoded by the ancestral recombination graph (ARG). Here we examine the performance in simulation of six ARG estimation methods: ARGweaver, RENT+, Relate, tsinfer+tsdate, ARG-Needle/ASMC-clust, and SINGER, using their estimated coalescent trees and examining bias, mean squared error (MSE), confidence interval coverage, and Type I and II error rates of the downstream methods. Although it does not scale to the sample sizes attainable by other new methods, SINGER produced the most accurate estimated PGS histories in many instances, even when Relate, tsinfer+tsdate, and ARG-Needle/ASMC-clust used samples ten times as large as those used by SINGER. In general, the best choice of method depends on the number of samples available and the historical time period of interest. In particular, the unprecedented sample sizes allowed by Relate, tsinfer+tsdate, and ARG-Needle/ASMC-clust are of greatest importance when the recent past is of interest---further back in time, most of the tree has coalesced, and differences in contemporary sample size are less salient.
Article
Full-text available
Most common human traits and diseases have a polygenic pattern of inheritance: DNA sequence variants at many genetic loci influence the phenotype. Genome-wide association (GWA) studies have identified more than 600 variants associated with human traits, but these typically explain small fractions of phenotypic variation, raising questions about the use of further studies. Here, using 183,727 individuals, we show that hundreds of genetic variants, in at least 180 loci, influence adult height, a highly heritable and classic polygenic trait. The large number of loci reveals patterns with important implications for genetic studies of common human diseases and traits. First, the 180 loci are not random, but instead are enriched for genes that are connected in biological pathways (P = 0.016) and that underlie skeletal growth defects (P?<?0.001). Second, the likely causal gene is often located near the most strongly associated variant: in 13 of 21 loci containing a known skeletal growth gene, that gene was closest to the associated variant. Third, at least 19 loci have multiple independently associated variants, suggesting that allelic heterogeneity is a frequent feature of polygenic traits, that comprehensive explorations of already-discovered loci should discover additional variants and that an appreciable fraction of associated loci may have been identified. Fourth, associated variants are enriched for likely functional effects on genes, being over-represented among variants that alter amino-acid structure of proteins and expression levels of nearby genes. Our data explain approximately 10% of the phenotypic variation in height, and we estimate that unidentified common variants of similar effect sizes would increase this figure to approximately 16% of phenotypic variation (approximately 20% of heritable variation). Although additional approaches are needed to dissect the genetic architecture of polygenic human traits fully, our findings indicate that GWA studies can identify large numbers of loci that implicate biologically relevant genes and pathways.
Article
Full-text available
The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.
Article
There is increasing evidence that genome-wide association (GWA) studies represent a powerful approach to the identification of genes involved in common human diseases. We describe a joint GWA study (using the Affymetrix GeneChip 500K Mapping Array Set) undertaken in the British population, which has examined similar to 2,000 individuals for each of 7 major diseases and a shared set of similar to 3,000 controls. Case-control comparisons identified 24 independent association signals at P < 5 X 10(-7): 1 in bipolar disorder, 1 in coronary artery disease, 9 in Crohn's disease, 3 in rheumatoid arthritis, 7 in type 1 diabetes and 3 in type 2 diabetes. On the basis of prior findings and replication studies thus-far completed, almost all of these signals reflect genuine susceptibility effects. We observed association at many previously identified loci, and found compelling evidence that some loci confer risk for more than one of the diseases studied. Across all diseases, we identified a large number of further signals (including 58 loci with single-point P values between 10(-5) and 5 X 10(-7)) likely to yield additional susceptibility loci. The importance of appropriately large samples was confirmed by the modest effect sizes observed at most loci identified. This study thus represents a thorough validation of the GWA approach. It has also demonstrated that careful use of a shared control group represents a safe and effective approach to GWA analyses of multiple disease phenotypes; has generated a genome-wide genotype database for future studies of common diseases in the British population; and shown that, provided individuals with non-European ancestry are excluded, the extent of population stratification in the British population is generally modest. Our findings offer new avenues for exploring the pathophysiology of these important disorders. We anticipate that our data, results and software, which will be widely available to other investigators, will provide a powerful resource for human genetics research.
Article
North East Europe harbors a high diversity of cultures and languages, suggesting a complex genetic history. Archaeological, anthropological, and genetic research has revealed a series of influences from Western and Eastern Eurasia in the past. While genetic data from modern-day populations is commonly used to make inferences about their origins and past migrations, ancient DNA provides a powerful test of such hypotheses by giving a snapshot of the past genetic diversity. In order to better understand the dynamics that have shaped the gene pool of North East Europeans, we generated and analyzed 34 mitochondrial genotypes from the skeletal remains of three archaeological sites in northwest Russia. These sites were dated to the Mesolithic and the Early Metal Age (7,500 and 3,500 uncalibrated years Before Present). We applied a suite of population genetic analyses (principal component analysis, genetic distance mapping, haplotype sharing analyses) and compared past demographic models through coalescent simulations using Bayesian Serial SimCoal and Approximate Bayesian Computation. Comparisons of genetic data from ancient and modern-day populations revealed significant changes in the mitochondrial makeup of North East Europeans through time. Mesolithic foragers showed high frequencies and diversity of haplogroups U (U2e, U4, U5a), a pattern observed previously in European hunter-gatherers from Iberia to Scandinavia. In contrast, the presence of mitochondrial DNA haplogroups C, D, and Z in Early Metal Age individuals suggested discontinuity with Mesolithic hunter-gatherers and genetic influx from central/eastern Siberia. We identified remarkable genetic dissimilarities between prehistoric and modern-day North East Europeans/Saami, which suggests an important role of post-Mesolithic migrations from Western Europe and subsequent population replacement/extinctions. This work demonstrates how ancient DNA can improve our understanding of human population movements across Eurasia. It contributes to the description of the spatio-temporal distribution of mitochondrial diversity and will be of significance for future reconstructions of the history of Europeans.
Article
In Europe, the Neolithic transition (8,000–4,000 b.c.) from hunting and gathering to agricultural communities was one of the most important demographic events since the initial peopling of Europe by anatomically modern humans in the Upper Paleolithic (40,000 b.c.). However, the nature and speed of this transition is a matter of continuing scientific debate in archaeology, anthropology, and human population genetics. To date, inferences about the genetic make up of past populations have mostly been drawn from studies of modern-day Eurasian populations, but increasingly ancient DNA studies offer a direct view of the genetic past. We genetically characterized a population of the earliest farming culture in Central Europe, the Linear Pottery Culture (LBK; 5,500–4,900 calibrated b.c.) and used comprehensive phylogeographic and population genetic analyses to locate its origins within the broader Eurasian region, and to trace potential dispersal routes into Europe. We cloned and sequenced the mitochondrial hypervariable segment I and designed two powerful SNP multiplex PCR systems to generate new mitochondrial and Y-chromosomal data from 21 individuals from a complete LBK graveyard at Derenburg Meerenstieg II in Germany. These results considerably extend the available genetic dataset for the LBK (n = 42) and permit the first detailed genetic analysis of the earliest Neolithic culture in Central Europe (5,500–4,900 calibrated b.c.). We characterized the Neolithic mitochondrial DNA sequence diversity and geographical affinities of the early farmers using a large database of extant Western Eurasian populations (n = 23,394) and a wide range of population genetic analyses including shared haplotype analyses, principal component analyses, multidimensional scaling, geographic mapping of genetic distances, and Bayesian Serial Simcoal analyses. The results reveal that the LBK population shared an affinity with the modern-day Near East and Anatolia, supporting a major genetic input from this area during the advent of farming in Europe. However, the LBK population also showed unique genetic features including a clearly distinct distribution of mitochondrial haplogroup frequencies, confirming that major demographic events continued to take place in Europe after the early Neolithic.
Article
Environmental disorders associated with vitamin D deficiency include musculoskeletal disorders (childhood rickets, osteomalacia, and fractures), and may include extraskeletal disorders (diabetes, cardiovascular disease, risk of falls, and cancer). There is high interindividual variability in the occurrence of both musculoskeletal and extraskeletal disorders. Previous twin and family studies suggested that genetic factors play a significant role in this variability. Little data exist on the possible effects of common genetic variation on vitamin D status; the available studies have been small and only small numbers of variants were examined. The SUNLIGHT consortium (study of underlying genetic determinants of vitamin D and highly related traits) was a multicenter genome-wide association study designed to identify common genetic variants that affect vitamin D concentrations and increase the risk of vitamin D insufficiency. Concentrations of vitamin D were determined in 33,996 individuals of European descent from 15 epidemiologic cohorts. Of these cohorts, 5 were designated as discovery cohorts (n = 16,125), 5 as in-silico replication cohorts (n = 9367), and 5 as de novo replication cohorts (n = 8504). Genome-wide analyses were conducted in all cohorts. Methods used to measure 25-hydroxyvitamin D concentrations varied between cohorts and included radioimmunoassay, chemiluminescent assay, enzyme-linked immunosorbent assay, or mass spectrometry. Concentrations lower than 75 nmol/L or 50 nmol/L were the defined threshold for vitamin D insufficiency. Combined effect estimates from the logistic regression analysis across cohorts were calculated by meta-analysis using a weighted Z-score-based approach. A genotype score was constructed by taking a weighted average of the confirmed variants. Genome-wide significance for association with 25-hydroxyvitamin D concentration was reached in variants at 3 loci in discovery cohorts, and was confirmed in replication cohorts: the first locus was 4p12 within or near the GC gene (overall P = 1.9 × 10−109 for rs2282679); the second was 11q12 near DHCR7 (P = 2.1 × 10−27 for rs12785878; and the third was11p15 near CYP2R1 (P = 3.3 × 10−20 for rs10741657). The first locus encoded an enzyme involved in cholesterol synthesis, the second encoded an enzyme implicated in 25-hydroxylation of vitamin D in the liver, and the third was involved in synthesis of a liver protein involved in transport of vitamin D and its metabolites. Genome-wide significance in the pooled sample was reached for variants at a fourth locus, 20q13, near CYP24A1 (P = 6.0 × 10−10 for rs6013897). A high genotype score in the highest quartile for the 3 confirmed variants in comparison with a score in the lowest quartile was associated with a substantial increase in the risk of vitamin D insufficiency (25-hydroxyvitamin D concentrations <75 nmol/L or <50 nmol/L; the odds ratio for <75 nmol/L was 2.47, with a 95% confidence interval of 2.20–2.78 (P = 2.3 × 10−48) and the odds ratio for <50 nmol/L was 1.92, with a 95% confidence interval of 1.70–2.16 (P = 1.0 × 10−26). These findings identify genetic variants at 3 confirmed loci involved in regulation of circulating 25-hydroxyvitamin D concentrations that substantially increase the risk of vitamin D insufficiency.
Article
Roughly half the world's population speaks languages derived from a shared linguistic source known as Proto-Indo-European. But who were the early speakers of this ancient mother tongue, and how did they manage to spread it around the globe? Until now their identity has remained a tantalizing mystery to linguists, archaeologists, and even Nazis seeking the roots of the Aryan race.The Horse, the Wheel, and Languagelifts the veil that has long shrouded these original Indo-European speakers, and reveals how their domestication of horses and use of the wheel spread language and transformed civilization. David Anthony identifies the prehistoric peoples of central Eurasia's steppe grasslands as the original speakers of Proto-Indo-European, and shows how their innovative use of the ox wagon, horseback riding, and the warrior's chariot turned the Eurasian steppes into a thriving transcontinental corridor of communication, commerce, and cultural exchange. He explains how they spread their traditions and gave rise to important advances in copper mining, warfare, and patron-client political institutions, thereby ushering in an era of vibrant social change. Anthony describes his discovery of how the wear from bits on ancient horse teeth reveals the origins of horseback riding. And he introduces a new approach to linking prehistoric archaeological remains with the development of language. The Horse, the Wheel, and Languagesolves a puzzle that has vexed scholars for two centuries--the source of the Indo-European languages and English--and recovers a magnificent and influential civilization from the past.