Content uploaded by Sarah E Perkins
Author content
All content in this area was uploaded by Sarah E Perkins on Jul 11, 2018
Content may be subject to copyright.
Adaptive evolution during an ongoing range expansion:
the invasive bank vole (Myodes glareolus) in Ireland
THOMAS A. WHITE,*†SARAH E. PERKINS,‡GERALD HECKEL†§and JEREMY B. SEARLE*
*Department of Ecology and Evolutionary Biology, Cornell University, Corson Hall, Ithaca, NY 14853-2701, USA,
†Computational and Molecular Population Genetics (CMPG), Institute of Ecology and Evolution, University of Bern,
Baltzerstrasse 6, CH-3012, Bern, Switzerland, ‡School of Biosciences, Cardiff University, Sir Martin Evans Building, Museum
Avenue, Cardiff, CF10 3AX, UK, §Swiss Institute of Bioinformatics, Genopode, CH 1015 Lausanne, Switzerland
Abstract
Range expansions are extremely common, but have only recently begun to attract
attention in terms of their genetic consequences. As populations expand, demes at the
wave front experience strong genetic drift, which is expected to reduce genetic diver-
sity and potentially cause ‘allele surfing’, where alleles may become fixed over a wide
geographical area even if their effects are deleterious. Previous simulation models
show that range expansions can generate very strong selective gradients on dispersal,
reproduction, competition and immunity. To investigate the effects of range expansion
on genetic diversity and adaptation, we studied the population genomics of the bank
vole (Myodes glareolus) in Ireland. The bank vole was likely introduced in the late
1920s and is expanding its range at a rate of ~2.5 km/year. Using genotyping-by-
sequencing, we genotyped 281 bank voles at 5979 SNP loci. Fourteen sample sites were
arranged in three transects running from the introduction site to the wave front of the
expansion. We found significant declines in genetic diversity along all three transects.
However, there was no evidence that sites at the wave front had accumulated more
deleterious mutations. We looked for outlier loci with strong correlations between
allele frequency and distance from the introduction site, where the direction of correla-
tion was the same in all three transects. Amongst these outliers, we found significant
enrichment for genic SNPs, suggesting the action of selection. Candidates for selection
included several genes with immunological functions and several genes that could
influence behaviour.
Keywords: allele frequency cline, genotyping-by-sequencing, nonmodel, outlier, population
genomics, RAD
Received 21 November 2012; accepted 3 April 2013
Introduction
Many empirical studies of the genetic consequences of
species introductions have tended to focus on the intro-
duction event itself (e.g. Tsutsui et al. 2000; Kolbe et al.
2004; Bossdorf et al. 2005) and fail to consider the
genetic consequences of the subsequent range expan-
sion—an integral part of successful establishment of
any invasive species—in an explicitly spatial context.
The genetic consequences of range expansion are not
only important for invasive species. Many, if not most,
species have recently experienced range expansions
(Excoffier et al. 2009); examples include the expansion
of species from refugia following glacial retreat or
advance (Hewitt 2000), recovery of species after perse-
cution or overexploitation (Lubina & Levin 1988), the
current movement of species due to climate change
(Parmesan & Yohe 2003), expansions associated with
geological events (Marshall et al. 1982), the spread of
species with novel adaptations, such as the expansion
of anatomically modern humans out of Africa (Fagun-
des et al. 2007), and the spread of pathogens during
disease epidemics (Biek et al. 2007; Velo-Ant
on et al.
Correspondence: Thomas A. White,
E-mail: tawhite201@gmail.com
©2013 John Wiley & Sons Ltd
Molecular Ecology (2013) 22, 2971–2985 doi: 10.1111/mec.12343
2012). Despite their frequency, researchers have only
recently begun to appreciate the importance of range
expansions in shaping the current distribution of
genetic diversity at both neutral and functional loci
(Prugnolle et al. 2005; Handley et al. 2007; Besold et al.
2008; Buckley et al. 2012; Velo-Ant
on et al. 2012; Waters
et al. 2012).
Theoretical studies have shown that range expansions
are fundamentally different from purely demographic
expansions. As a population expands its range, it
undergoes a series of founder events, which can lead to
fluctuations in allele frequency and stochastic loss of
alleles (Slatkin & Excoffier 2012). Range expansions are
generally associated with decreasing allelic richness and
heterozygosity with increasing distance along the axis
of expansion (Estoup et al. 2004; Heckel et al. 2005;
Prugnolle et al. 2005; Handley et al. 2007; Besold et al.
2008; Parisod & Bonvin 2008; Velo-Ant
on et al. 2012).
This reduced genetic diversity and associated inbreed-
ing may negatively impact the fitness of individuals at
the expanding range margin. Edmonds et al. (2004) and
Klopfstein et al. (2006) demonstrated that neutral muta-
tions arising on the edge of a range expansion can
sometimes ‘surf’ on the wave of the advance and reach
higher frequencies than would be expected in a popula-
tion at equilibrium. Klopfstein et al. (2006) suggested
that this phenomenon could lead to increased rates of
evolution at range margins. However, Travis et al.
(2007) have shown with simulation models that deleteri-
ous mutations can also surf to high frequencies at
expanding range margins, including mutations having a
negative effect on reproductive rate and juvenile com-
petitive ability. Despite these theoretical insights, there
remain very few empirical studies that have tested their
predictions, and the distribution of genetic diversity in
expanding populations, and its significance, remains
poorly understood.
In addition to strong drift and allele surfing, range
expansions may generate very strong selection pres-
sures. Simulation modelling predicts that individuals at
the expanding wave front should experience selection
for increased dispersal and reproduction (Travis &
Dytham 2002). This is due to a combination of spatial
sorting (Shine et al. 2011) and natural selection (acting
over multiple generations; Travis et al. 2009) favouring
individuals at the edge of an expansion. Evolution of
dispersal and reproduction during range expansions
has now been documented in a number of taxa, includ-
ing plants (Cwynar & MacDonald 1987; Monty & Mahy
2010), amphibians (Phillips et al. 2006), humans (Mo-
reau et al. 2011) and insects (Simmons & Thomas 2004;
Hughes et al. 2007). The process of range expansion
can also influence host–parasite interactions. During a
range expansion, parasites and pathogens may lag
behind their hosts, due to both stochastic loss and low
host density at the wave front of the expansion (Phil-
lips et al. 2010). Where trade-offs exist, individuals at
the wave front should therefore invest less in intraspe-
cific competition (Burton et al. 2010) and immune
defence (Phillips et al. 2010). If such a lag does occur,
these traits may also experience relaxed selection at the
genic level, for example if specific antigen receptor
alleles are no longer required for parasite or pathogen
recognition. In longer established populations behind
the wave front of the expansion, host densities and
parasite burdens are expected gradually to return to
baseline levels, so here selection should favour invest-
ment in intraspecific competition and immunity over
dispersal. To the extent that dispersal, reproduction,
competitive ability and immunity are genetically deter-
mined, spatial sorting and natural selection should be
reflected by allele frequency clines along the axis of
expansion at loci influencing these traits (Hancock et al.
2010a).
However, detecting such adaptations at the genetic
level is expected to present a number of challenges.
Many of the traits in which we might expect to see
adaptation are polygenic, and much of the adaptation
is predicted to come from standing genetic variation
rather than new mutations (Barret & Schluter 2008).
Therefore, adaptation is expected to occur via subtle
shifts in allele frequencies (Hancock et al. 2010a) rather
than hard sweeps (Novembre & Han 2012). Outlier
approaches based on F
ST
values are unlikely to be
useful in this case (Hancock et al. 2010a), as selection
is unlikely to create large differences in allele frequen-
cies between populations. In addition, F
ST
-based
approaches are unable to distinguish allele frequency
variation that is related to an underlying environmen-
tal variable or gradient (such as distance along the
axis of a range expansion) vs. variation that follows a
spatially incoherent pattern (Yang et al. 2012). The pre-
vious rationale underlying genome scans for selection
has been that drift and demographic processes affect
the entire genome, and therefore, unusual patterns at
particular loci should reflect the action of selection
(Zayed & Whitfield 2008). It is now known that ‘allele
surfing’ can generate clines in allele frequencies, but
this affects loci at random (Excoffier et al. 2009). There-
fore, a na€
ıve genome scan may reveal many loci that
are putatively under selection but which are actually
false positives (Hofer et al. 2009). It is unlikely that
any method will be able to overcome this problem
completely, but it may be possible to minimize the
problem using replication. An allele frequency cline
at a locus in one region may be due to drift or selec-
tion caused by some underlying environmental
variable. However, the direction of drift or surfing is
©2013 John Wiley & Sons Ltd
2972 T. A. WHITE ET AL.
independent between different ‘sectors’ of the expan-
sion (Hallatschek et al. 2007; Excoffier & Ray 2008).
Clines in the same direction in multiple regions are
therefore less likely to be due to drift.
Here, we report the results of one of the first popula-
tion genomic studies of an ongoing range expansion.
Our study system is the bank vole, Myodes glareolus,in
Ireland. The bank vole is a small rodent distributed
throughout much of Eurasia from Iberia to central Sibe-
ria and from the Mediterranean to Scandinavia, but not
recorded in Ireland until 1964 (Claassens & O’Gorman
1965). Previous studies of mtDNA variation and para-
site distribution support a single introduction event
involving a small number of founders arriving in the
late 1920s on the southern shore of the Shannon Estuary
(Fairley 1971; Ryan et al. 1996; Stuart et al. 2007). Stuart
et al. (2007) place the arrival in 1926 at the deep-water
port of Foynes, as this coincides with the importation of
heavy earth moving equipment from Germany prior to
the construction of the Shannon hydroelectricity
scheme. Since its introduction, the vole has occupied
approximately one-third of the island of Ireland and is
continuing to expand its range at a constant rate of
c. 2.5 km/year (White et al. 2012).
Using a genotyping-by-sequencing (GBS) approach,
we simultaneously identify and genotype a large panel
of SNPs for the bank vole in Ireland. We report changes
in genic and nongenic diversity over the course of the
range expansion and develop a new approach to iden-
tify loci under selection. Importantly, we identify
genetic signatures of adaptation to the process of range
expansion itself.
Methods
Sampling and DNA extraction
In autumn 2010 and summer 2011, 281 bank voles
were sampled from 14 sample sites in Ireland
(Table 1). These sites were arranged in three transects
running from the site of introduction at Foynes out to
the expansion front, to the north, the northeast and
the east (Fig. 1). Voles were euthanized by isoflurane
overdose followed by cervical dislocation. For each
vole, a piece of liver tissue was placed in an Eppen-
dorf tube with 95% ethanol. Genomic DNA was
extracted using the DNeasy kit from Qiagen.
genotyping-by-sequencing (GBS)
Extracted DNA was sent to the Cornell Institute for
Genomic Diversity to conduct GBS. GBS (Elshire et al.
2011) is a simple technique for constructing reduced
representation libraries for the Illumina sequencing plat-
form and is conceptually similar to RAD sequencing
(Hohenlohe et al. 2010). Briefly, DNA from each indi-
vidual was separately digested using the restriction
enzyme PstI (CTGCAG). The fragmented DNA was
then ligated to a barcoded adaptor and a common
adaptor with appropriate sticky ends. The digestion
and ligation were carried out in a 96-well plate. The
wells each contained DNA from a different individual
and a barcoded adaptor unique to that well. One con-
trol well did not contain any DNA. After ligation, the
wells were pooled into one Eppendorf tube and cleaned
Table 1 Sampling information, detailing the names of sample sites, their locations, the transects on which they fall, sample sizes (n),
start and end of trapping periods, distance from the introduction site at Foynes and three measures of genetic diversity for all 5979
SNPs: mean expected heterozygosity per locus (H
e
), mean alleles per locus (A) and mean allelic richness per locus (A
rich
)
Sample site Latitude Longitude Transect nTrapping period
Distance from
Foynes (km)*
H
e
All SNPs
A
All SNPs
A
rich
All SNPs
Foynes 52.574 9.140 All 20 09–12/07/2011 0 0.357 1.959 1.759
Tulla 52.795 8.731 N 20 27/07–02/08/2011 45 0.288 1.845 1.622
Gort 53.138 8.770 N 20 16–17/10/2010 84 0.265 1.782 1.571
Tuam 53.497 8.737 N 21 12–15/10/2010 124 0.256 1.757 1.553
Cloonfad 53.708 8.730 N 20 14–21/08/2011 148 0.253 1.750 1.546
Limerick 52.660 8.451 NE 20 13–14/07/2011 48 0.307 1.893 1.661
Nenagh 52.861 8.266 NE 20 01/11/2010 67 0.277 1.834 1.602
Birr 53.131 7.906 NE 20 28–30/10/2010 106 0.254 1.755 1.547
Ballynahown 53.360 7.871 NE 20 22–23/08/2011 129 0.237 1.715 1.513
Adare 52.471 8.765 E 20 21/11/2010 28 0.342 1.956 1.734
Kilteely 52.495 8.408 E 20 24–26/07/2011 50 0.312 1.906 1.674
Cashel 52.479 7.907 E 20 02/11/2010 87 0.301 1.878 1.650
Windgap 52.438 7.404 E 20 07–08/08/2011 119 0.309 1.896 1.665
New Ross 52.418 7.046 E 20 10–15/11/2010 146 0.300 1.874 1.648
*Shortest straight-line distance, except that the shortest land route around the Shannon Estuary was incorporated for transects
incorporating this feature.
©2013 John Wiley & Sons Ltd
ADAPTIVE EVOLUTION DURING A RANGE EXPANSION 2973
using a Qiagen QIAquick PCR purification kit to form a
library. The library was then subjected to a PCR, using
long primers that matched the barcoded and common
adaptors. The PCR has two functions. One is to
perform a size-selection step, as the PCR preferentially
amplifies fragments of an ideal length for Illumina
sequencing. The second is that the long primers add a
length of sequence to the fragments in the library.
These sequences bind to the Illumina flow cell and are
also used to prime subsequent DNA sequencing reac-
tions. After PCR, the library was cleaned again using
a Qiagen QIAquick PCR purification kit. Each library
was then diluted and sent for sequencing using
single-end 100-bp reads on the Illumina HiSeq 2000 at
the Cornell Core Laboratories Center. To assess repeat-
ability of our approach, a second library was made
for 24 of the individuals, which was sequenced as
before.
SNP genotyping and annotation pipeline
Raw sequence files from Illumina were converted into
individual genotypes using the UNEAK pipeline, avail-
able as part of the TASSEL 3.0 software (Bradbury et al.
2007). Briefly, the UNEAK pipeline keeps good reads
with a barcode, cut site and no ‘N’s in the first 64 bp of
the sequence after the barcode. Reads are then trimmed
to 64 bp after the barcode. Identical reads are clustered
into tags, and counts of these tags present in each bar-
coded individual are stored. Following this, all unique
tags are merged, and their counts in the whole sample
of individuals are stored. Pairwise alignment of tags is
then performed, and tag pairs with 1-bp mismatch are
considered as candidate SNPs. With a certain error
tolerance rate (here set to 0.03), only reciprocal pairs of
tags are retained as SNPs, according to standard proto-
cols of the Cornell Institute for Genomic Diversity. Fol-
lowing SNP identification, counts of each tag (or allele)
are output for each locus and each individual. After
running UNEAK, individual genotypes were recalled
following the approach of Lynch (2009), using a global
sequencing error rate of 0.03. The likelihood of each
genotype was calculated using a multinomial sampling
distribution, and a genotype was called if it had an AIC
value at least four lower than the next best genotype.
Otherwise, the genotype was coded as ‘missing’. To
filter out potential paralogs, we discarded loci with a
mean observed heterozygosity >0.75. This cut-off is
obviously somewhat arbitrary, but choosing different
cut-off values between 0.5 and 1 made little difference
to our results (data not shown). After filtering, we had
5979 loci that could be confidently called in at least 80%
of individuals. Although allele dropout can affect esti-
mates of genetic variation within and between popula-
tions (Gautier et al. 2012), the number of individuals
with missing data (which could reflect different levels
of allele dropout) had no effect on the patterns of diver-
sity reported here (data not shown).
Twenty-four individuals were analysed in two sepa-
rate GBS runs. Where individuals were assigned geno-
types from both runs, the genotype calls from the two
runs were compared. This analysis showed that repeat-
ability of genotyping was high (mean, 97.2%; SD, 1.4%).
Locus sequences were blasted against the RefSeq mam-
malian RNA database using BLASTN (Altschul et al. 1997)
with parameters: word_size =11; gapopen =5; gapex-
tend =2; penalty =3; and reward =2. Sequences were
also blasted against the SwissProt and NR databases using
BLASTX with default parameters. SwissProt was used pref-
erentially, to facilitate functional annotation using Uni-
Prot. Loci were identified as putatively genic if they had
an expectation value e<1910
5
in matches against the
RefSeq database or e<1910
3
in matches against Swiss-
50 km
Fig. 1 Location of sample sites in Ireland. CD =Cloonfad,
TM =Tuam, GT =Gort, TA =Tulla, BN =Ballynahown, BR =
Birr, NH =Nenagh, LK =Limerick, NS =New Ross,
WP =Windgap, CL =Cashel, KY =Kilteely, AE =Adare,
FS =Foynes. Sites on the northern transect are marked with
squares, those on the north-eastern transect are marked with
circles, and those on the eastern transect with triangles. Foynes,
which is the introduction site and is on all three transects is
marked with a cross. The dashed line shows the approximate
range limits of the bank vole in 2011.
©2013 John Wiley & Sons Ltd
2974 T. A. WHITE ET AL.
Prot/NR databases. BLASTX was used to determine
whether the genic SNPs were synonymous or nonsynony-
mous (NS).
Genetic diversity patterns
Mean expected heterozygosity (H
e
), mean alleles per
locus (A) and mean allelic richness (A
rich
) were calcu-
lated for each population and each locus class [NS
SNPs, genic (not NS) SNPs and nongenic SNPs] using
the software ARLEQUIN 3.5 (Excoffier & Lischer 2010) and
HP-RARE (Kalinowski 2005). Measures of genetic diver-
sity were regressed onto the geographical distance
between the sampling locality and the point of intro-
duction (Foynes) using the ‘lm’ package in R2.15
R Core Team (2012). As the Shannon Estuary represents
a significant barrier to dispersal, we calculated distances
as the shortest path by land. To test for differences in
slopes and intercepts between the different SNP locus
classes, an ANCOVA was performed taking mean diver-
sity (H
e
,Aor A
rich
) as the response variable and locus
class, distance and their interaction as the independent
variables.
Identifying SNP outliers
Two general approaches were used to identify loci
potentially under selection relating to range expansion.
The first was to calculate the Spearman rank correlation
between allele frequency and the geographical distance
between the sampling locality and Foynes as the point
of introduction. This was done for the three transects
separately. We then took the absolute value of the mean
correlation coefficients across the three transects. Loci
were ranked by mean correlation coefficient, and an
empirical P-value was calculated as the rank divided by
the number of loci. We then identified potential outlier
loci as those with empirical P-values <0.05 and <0.01.
Using this approach, the Foynes population appeared
as the starting site in all three transects, so correlation
coefficients may have been disproportionately influ-
enced by the allele frequency at Foynes. Therefore, we
repeated the correlations, excluding Foynes from all
three transects. Mean correlation coefficients and p-val-
ues were calculated as before. As the Foynes sample
does still contain relevant information, we considered
outliers to be those loci that appeared in the tails of the
distributions of the correlations both with and without
Foynes.
The second approach to identify outliers was to use
the method of Coop et al. (2010), implemented in the
software Bayenv. This approach estimates the covari-
ance in allele frequencies between populations from a
set of control loci. In our case, this was the set of 5713
nongenic SNPs. For each of the 5979 SNPs, a Bayes
factor was then calculated for a model where an envi-
ronmental variable has a linear effect on allele frequen-
cies compared with a model given by the covariance
matrix alone. The environmental variable of interest
was the geographical distance from the point of intro-
duction at Foynes. Each locus was binned according to
the frequency of allele 1 (arbitrarily defined) over all
populations into one of 10 bins with a frequency interval
of 0.1. Within each frequency bin, loci were ranked by Ba-
yes factors, and an empirical P-value was calculated as the
rank divided by the number of loci in that bin. We then
identified potential outlier loci as those with empirical P-
values <0.05 and <0.01. Variance–covariance matrices
were compared within and between independent runs of
the programme to ensure convergence.
Putative functions and Gene Ontology (GO) Biologi-
cal Process terms were assigned to outlier loci using the
UniProt Knowledgebase [‘The UniProt Consortium
(2012) Reorganizing the protein space at the Universal
Protein Resource (UniProt)’] and PANTHER v7.2 (Thomas
et al. 2008).
Neutral simulations
A modified version of SPLATCHE (Ray et al. 2010) was
used to simulate neutral genetic diversity after a range
expansion in the bank vole. Ireland was represented as
a lattice of 1 Km squares. Areas of land were defined as
potential bank vole habitat, whereas areas of sea or
lakes were defined as unsuitable. Simulated sample
sites were arranged according to the same coordinates
as our real sample sites and had the same sample sizes.
The range expansion began at Foynes and progressed
until all sample sites had been colonized. The forward
demographic part of the SPLATCHE simulation records for
each time step the population sizes in each deme and
migration events between demes. Samples of genes
were taken from each sample site, and SNP data were
simulated using a discrete time coalescent model. For
each demographic simulation, 5979 neutral SNP loci
were simulated. Allele frequencies were calculated for
each sample site, and these were correlated with dis-
tance from Foynes using Spearman rank correlation. As
with the real genetic data, we calculated these correla-
tions separately for each transect and took the absolute
mean correlation across all three transects. We also cal-
culated the correlations with and without the Foynes
sample site. For both these approaches, we recorded the
number of loci that had higher correlation coefficients
than our observed outliers at the 5% and 1% thresholds.
The strength of allele frequency correlations at neutral
loci will depend on the amount of genetic drift
experienced by the population as it expands, which in
©2013 John Wiley & Sons Ltd
ADAPTIVE EVOLUTION DURING A RANGE EXPANSION 2975
turn will depend on the demographic model used. As
this is unknown, we performed 1000 different demo-
graphic simulations, whose parameters (founding num-
ber of individuals, carrying capacity of each deme,
growth rate per generation, migration rate, Allee effect
severity and Allee effect scale (see Stephens & Suther-
land 1999)) were drawn from uniform distributions that
we had previously found to generate close matches to
the observed SNP data (T. A. White, unpublished data).
Deleterious SNPs
The program PolyPhen-2 (Ramensky & Sunyaev 2009)
was used to predict the functional impact of each NS
SNP on the translated protein. This approach is based
on multiple alignments and biochemical and physical
characteristics of the amino acid replacements. In cases
where one of the alleles at a locus matched to a human
or rodent reference genome, this allele was used as the
reference, and the effect of changing this to the other
allele was assessed using PolyPhen-2. The functional
impact of each NS SNP was designated as ‘Benign’,
‘Possibly damaging’ or ‘Probably damaging’, the latter
two classes we call potentially deleterious SNPs. If
neither allele matched to a reference sequence, the func-
tional impact of the substitution was left unclassified.
For SNPs classified as ‘Possibly damaging’ or ‘Probably
damaging’, the frequency of the potentially deleterious
allele was calculated for each population, and the rela-
tionship between these frequencies and geographical
distance from Foynes was determined using Spearman
rank correlation.
Results
Data quality and coverage
Illumina sequencing of 281 individuals on three lanes
resulted in 786 834 622 reads. Of these, 676 763 709
reads contained a unique barcode and cut site remnant
and contained no ‘N’s. These data were used in the
UNEAK pipeline. UNEAK identified 60 417 biallelic
SNP loci. However, many of these had low coverage or
were only present in a small number of individuals.
Over all of these loci, the mean coverage per locus per
individual was 3.39(max coverage per individual
252.79, min 0.0049). When loci with more than 20%
missing data were excluded, 6398 loci were retained,
with a mean coverage of 16.99(max coverage per indi-
vidual 252.79, min 7.29). Discarding loci with observed
heterozygosity >0.75 left 5979 loci, with a mean cover-
age of 16.569(max coverage per individual 168.09,
min 7.29). This last difference in coverage is consistent
with the idea that paralogs should have high coverage
and also high heterozygosity, as filtering them out
reduces the mean coverage but especially the maximum
coverage at a locus.
Locus classification
Our BLAST approach identified 266 (4.4%) loci as ‘genic’,
245 of which had a match in the RefSeq database and
124 in the SwissProt/NR databases. Results from BLASTX
also determined that 30 of the genic SNPs were NS.
Genetic diversity patterns
Genetic diversity declined significantly with distance
from Foynes (Table S1 and Fig. 2). This was regardless
of whether we used all SNPs, NS SNPs, genic (not NS)
SNPs or nongenic SNPs and whether measured as H
e
,
Aor A
rich
. The slope of the regressions of H
e
and A
rich
on distance was steeper for NS SNPs than for the other
SNP locus classes, and the intercept of the regression
was higher (i.e. there was greater diversity for NS SNPs
at Foynes). However, ANCOVA revealed no significant
effect of SNP locus class on the relationship between
distance and diversity or on the levels of diversity at
Foynes, regardless of which measure of diversity was
used (results not shown). When the three transects are
compared, it can be seen that the loss of diversity in the
eastern transect appears to be less severe than in the
northern and northeastern transects (Fig. 2).
The mean number of alleles is 1.959 at Foynes and in
the wave front populations is 1.874 at New Ross, 1.750
at Cloonfad and 1.715 at Ballynahown. However, when
the wave front populations are pooled, the mean num-
ber of alleles is 1.951. So, it appears that the loss of
diversity has been somewhat independent in the three
transects, as different subsets of alleles have been lost
in each.
Outlier loci
Using the Spearman rank correlation approach (includ-
ing Foynes in each transect), 21 of the 266 genic SNPs,
and 278 of the 5713 nongenic SNPs, had an empirical
P-value <0.05. This represented a 1.6-fold enrichment of
genic SNPs in the outliers (Fisher’s exact test, one-tailed
P=0.0245). Nine of the 266 genic SNPs, and 51 of the
5713 nongenic SNPs, had an empirical P-value <0.01.
This represented a 3.8-fold enrichment of genic SNPs
(Fisher’s exact test, one-tailed P=0.0012).
When Foynes was excluded from this analysis, 21 of
the 266 genic SNPs, and 278 of the 5713 nongenic SNPs,
had an empirical P-value <0.05. This represented a
1.6-fold enrichment of genic SNPs in the outliers
(Fisher’s exact test, one-tailed P=0.0245). Seven of the
©2013 John Wiley & Sons Ltd
2976 T. A. WHITE ET AL.
266 genic SNPs, and 53 of the 5713 nongenic SNPs, had
an empirical P-value <0.01, a 2.8-fold enrichment of
genic SNPs (Fisher’s exact test, one-tailed P=0.0164).
One hundred and sixty-two SNPs lay in the top 5%
of the distribution of correlation coefficients in both
correlations with and without Foynes. Of these, 12 were
genic SNPs, representing a not quite significant 1.7-fold
enrichment of genic SNPs (Fisher’s exact test, one-tailed
P=0.0564). Thirty-five SNPs appeared in the top 1% in
both analyses. Of these, seven were genic, representing
a highly significant 6-fold enrichment of genic SNPs
(Fisher’s exact test, one-tailed P=0.0004).
Thus, using our Spearman rank correlation approach,
we find a significant enrichment of genic SNPs amongst
those SNPs with the strongest correlations between
allele frequency and distance from Foynes, the point of
introduction. This is consistent with adaptation during
the range expansion, as we expect the targets of selec-
tion to be either genes or regulatory regions in close
linkage with genes. The 12 genic loci that are common
to both Spearman rank correlation approaches are listed
in Table 2.
Using Bayenv, 293 SNPs were identified as outliers
with P<0.05. Of these, 13 were genic SNPs. Fifty-four
SNPs were outliers with P<0.01, only one of which
was a genic SNP. There was no enrichment of genic
SNPs in either outlier set identified by Bayenv.
Forty-two SNPs were identified as outliers using both
our correlation approach and Bayenv, of which five
were genic SNPs. This represented a 2.9-fold
enrichment of genic SNPs in the outliers (Fisher’s exact
test, one-tailed P=0.0372).
A total of 20 genic outliers were identified using
either our correlation-based method or Bayenv. These
are listed in Table 2. Of these, 16 genes were assigned
GO terms under ‘biological processes’, of which four
genes had the GO term ‘immune system process’. In
the mouse genome, there are 24 935 genes that are
assigned biological process GO terms, of which 1421
have the GO term ‘immune system process’ (Eppig
et al. 2012). Therefore, assuming that a similar propor-
tion holds true for the bank vole, in our outliers there is
significant enrichment for genes involved in immunity
(Fisher’s exact test, one-tailed P=0.0205).
Neutral simulations
For a range of reasonable demographic models for the
bank vole expansion, we found that, on average, the pro-
portion of simulated loci with more extreme correlation
coefficients than our observed 0.05 and 0.01 thresholds
was 0.041 and 0.008 for correlations including Foynes,
and 0.04 and 0.009 for correlations excluding Foynes. In
our real data, the proportion of loci falling in the 5% tail
of both distributions was 0.027, whilst the proportion fall-
ing in the 1% tail of both distributions was 0.006. In the
simulated neutral data, these proportions were 0.021 and
0.004, respectively. These results suggest that our
0 50 100 150
0.24 0.28 0.32 0.36
Distance from introduction site (km)
He
(a)
0 50 100 150
1.70 1.80 1.90 2.00
Distance from introduction site (km)
A
(b)
0 50 100 150
1.50 1.60 1.70 1.80
Distance from introduction site (km)
Arich
(c)
Fig. 2 Decline of genetic diversity with dis-
tance from the introduction site of the bank
vole in Ireland. Measured as (a) mean
expected heterozygosity (H
e
), R
2
=0.5345;
P=0.003, (b) mean alleles per locus (A),
R
2
=0.5005; P=0.005, and (c) mean allelic
richness (A
rich
), R
2
=0.4911; P=0.003. Sites
on the northern transect are marked with
squares, those on the north-eastern transect
are marked with circles, and those on the
eastern transect with triangles. Foynes,
which is the introduction site and is on all
three transects is marked with a cross.
©2013 John Wiley & Sons Ltd
ADAPTIVE EVOLUTION DURING A RANGE EXPANSION 2977
Table 2 Outlier SNPs identified using our Spearman rank correlation approach and Bayenv. The column ‘outlier’ gives the method used to identify that SNP as an outlier. The
next four columns give the accession nos of the best matches and associated gene descriptions, in the mammalian RNA RefSeq database and the SwissProt/NR databases. ‘Type’
shows whether the SNP is synonymous (S) or nonsynonymous (NS) or whether it is located in a noncoding region of the gene (–). The final two columns give the functions or
processes in which the genes are involved, according to the UniProt Knowledgebase, and Panther GO (Gene Ontology) classifications, respectively
SNP Outlier*
Match in
mammalian
RNA RefSeq
database Description
Significant match
in SwissProt/NR
protein database Description Type
UniProt knowledgebase
function/process
PANTHER GO biological
process
mg8017 1,3 XM_002753731.1 Hypothetical protein
LOC100361418
P06323.1|
TVA3_MOUSE
T-cell receptor
alpha chain V
NS Receptor activity Immune system process
mg10984 1,3,5 NM_133638.3 ADAM metallopeptidase
with thrombospondin
type 1 motif, 19
(ADAMTS19)
NA NA —Proteolysis Signal transduction;
Cell–cell adhesion;
Proteolysis
mg39858 1,4 NM_006946.2 Spectrin, beta,
nonerythrocytic 2
(SPTBN2)
NA NA —Actin cytoskeleton
organization; axon
guidance
Cellular component
morphogenesis
mg68377 1,3 NM_181652.2 Peroxiredoxin 5 (PRDX5) NA NA —Intracellular redox
signalling
Immune system process;
Oxygen and reactive
oxygen species metabolic
process
mg71009 1,3,5 NM_014899.3 Rho-related BTB domain
containing 3 (RHOBTB3)
NA NA —Retrograde transport,
endosome to Golgi;
Small GTPase-mediated
signal transduction
G-protein coupled
receptor protein
signalling pathway
mg72604 1,3,5 NM_001118890.1 Glutaredoxin
(thioltransferase) (GLRX)
NA NA —Cell redox homoeostasis Sulphur metabolic process
mg81865 1,4,5 NM_017415.2 Kelch-like 3 (KLHL3) Q5REP9.1|
KLHL3_PONAB
Kelch-like protein 3 S Protein ubiquitination Neurological system
process; Cellular
component morphogenesis
mg96770 1,3 NM_001160392.1 tRNA phosphotransferase
1 (TRPT1)
NA NA —tRNA processing Nucleobase, nucleoside,
nucleotide and nucleic
acid metabolic process
mg123985 1,3 NM_005956.3 Methylenetetrahydrofolate
dehydrogenase (NADP+
dependent) 1 (MTHFD1)
P11586.3|
C1TC_HUMAN
C-1-tetrahydrofolate
synthase
NS Folic acid metabolism;
neural tube development
Purine base metabolic
process; Cellular amino
acid biosynthetic process
mg8197 2,4 NM_025258.2 Von Willebrand factor A
domain containing 7
(VWA7)
NA NA —Glycoprotein —
mg17560 2,4 NM_001004736.2 Olfactory receptor,
family 5, subfamily K,
member 1 (OR5K1)
Q8NHB7.2|
OR5K1_HUMAN
Olfactory
receptor 5K1
S Olfaction; sensory
transduction
—
©2013 John Wiley & Sons Ltd
2978 T. A. WHITE ET AL.
Table 2 Continued
SNP Outlier*
Match in
mammalian
RNA RefSeq
database Description
Significant match
in SwissProt/NR
protein database Description Type
UniProt knowledgebase
function/process
PANTHER GO biological
process
mg83555 2,4,5 NM_015125.3 Capicua homolog
(Drosophila) (CIC)
NA NA —Central nervous
system development
Regulation of transcription
from RNA polymerase II
promoter
mg13786 5 NM_001031749.2 LY6/PLAUR domain
containing 5 (LYPD5)
NA NA —— —
mg24029 5 XM_002752883.1 Leukotriene A4 hydrolase,
transcript variant 2
(LTA4H)
Q6S9C8.3|
LKHA4_CHILA
Leukotriene
A-4 hydrolase
S Leukotriene biosynthesis;
inflammatory response
Immune system process;
Fatty acid biosynthetic
process; Proteolysis
mg26799 5 NM_001005217.1 FSHD region gene 2
(FRG2)
ABB88900.1 Oocyte-specific
eukaryotic
translation
initiation
factor 4E-like
(Eif4e1b)
S Protein biosynthesis Translation
mg49438 5 XM_001101962.2 WNT5A wingless-type
MMTV integration site
family, member 5A
P22726.2|
WNT5B_MOUSE
Protein Wnt-5b S Wnt signalling pathway G-protein coupled receptor
protein signalling pathway;
Cell–cell signalling
mg57185 5 NM_021226.2 Rho GTPase activating
protein 22 (ARHGAP22)
NA NA —Positive regulation of
GTPase activity;
signal transduction
—
mg59899 5 NM_001012426.1 Forkhead box P4 (FOXP4) NA NA —Embryonic foregut
morphogenesis; heart
development;
transcription, DNA
dependent
Visual perception; Sensory
perception; Cell cycle;
Cell surface receptor linked
signal transduction;
Carbohydrate metabolic
process; Regulation of
transcription from RNA
polymerase II promoter;
Cellular component
morphogenesis; Segment
specification; Anterior/
posterior axis specification;
Ectoderm development;
Mesoderm development;
Embryonic development;
Nervous system
development
©2013 John Wiley & Sons Ltd
ADAPTIVE EVOLUTION DURING A RANGE EXPANSION 2979
observed data may contain more loci with extreme allele
frequency clines than expected under neutrality.
Deleterious mutations
Ten NS SNPs were classed by PolyPhen-2 as ‘Possibly
damaging’ or ‘Probably damaging’ (Table S2, Support-
ing information). Of these loci, two showed significant
negative correlations between the frequency of the dele-
terious allele and distance from the introduction site
(Fig. 3). These SNPs, mg8017 and mg134581, were
located in the T-cell receptor alpha V (TVA3) gene,
involved in antigen recognition, and the laminin
subunit alpha 2 (LAMA2) gene, respectively. Defects in
Lama2 are a cause of murine muscular dystrophy (Xu
et al. 1994). One SNP, mg123985, located in the C-1-
tetrahydrofolate synthase (C1TC) gene showed a signifi-
cant positive correlation (Fig. 3). Mutations in this gene
may impair foetal growth in mice (Beaudin et al. 2012).
Discussion
Using the bank vole invasion of Ireland as our study
system, we found evidence for adaptation during the
range expansion, despite an overall loss of genetic
diversity due to strong genetic drift at the wave front.
This suggests that selection pressures during range
expansions may be very strong. This is one of the first
studies to provide empirical genomic evidence for the
adaptation to the process of range expansion in a wild
population.
We found that the eastern transect shows the least
reduction in genetic diversity, whilst the northern and
northeastern transects show similar patterns of greater
loss (Fig. 2). In the east of the country, there are few
barriers to dispersal, whilst in the north and north-
Table 2 Continued
SNP Outlier*
Match in
mammalian
RNA RefSeq
database Description
Significant match
in SwissProt/NR
protein database Description Type
UniProt knowledgebase
function/process
PANTHER GO biological
process
mg87917 5 XM_002817845.1 CAP-GLY domain-
containing linker protein
2 (CLIP2)
O55156.1|
CLIP2_RAT
CAP-Gly
domain-containing
linker protein 2
S Control of
brain-specific
organelle translocations
Intracellular protein
transport; Vesicle-mediated
transport; Mitosis; Cellular
component morphogenesis
mg122511 5 NM_006162.3 Nuclear factor of activated
T cells (NFATC1)
NA NA —Transcription regulation Immune system process;
Regulation of transcription
from RNA polymerase II
promoter; Mesoderm
development; Cellular
defence response
*1 =significant correlation of allele frequency with distance with P<0.01 (Foynes included); 2 =significant correlation of allele frequency with distance with P<0.05 (Foynes
included); 3 =significant correlation of allele frequency with distance with P<0.01 (Foynes excluded); 4 =significant correlation of allele frequency with distance with P<0.05
(Foynes excluded); 5 =outlier with P<0.05 in Bayenv analysis.
0 50 100 150
0.0 0.2 0.4 0.6 0.8 1.0
Distance from introduction site (km)
Frequency of deleterious allele
Fig. 3 Change in frequencies of deleterious alleles (identified
using PolyPhen-2) with distance from the introduction site.
Only loci with significant correlations are shown. The SNP
mg8017 is shown with open circles, mg123985 with open
squares, and mg134851 with crosses.
©2013 John Wiley & Sons Ltd
2980 T. A. WHITE ET AL.
east, the expanding population would have encoun-
tered substantial barriers to dispersal, including the
River Shannon to the north and unsuitable bog habi-
tat in the northeast. However, diversity in the north-
ern and northeastern transects shows a monotonic
decline, suggesting that the difference between
transects is due to some continuously acting process
and not a one-off founder event, such as might be
caused by crossing a semi-permeable barrier to dis-
persal. These two transects may additionally experi-
ence reduced lateral dispersal along most of their
length, due to the close proximity to the River Shan-
non (Fig. 1). We might expect lateral dispersal to
influence the amount of genetic diversity lost or
retained in a particular transect, if different alleles are
found in different transects. This appears to be the
case, as when we pooled the three populations at the
wave front of the expansion, the mean number of
alleles was almost as high as at the point of introduc-
tion (Foynes, Fig. 1), showing that different alleles
had been lost (or preserved) in different transects.
We can consider at least three different types of selec-
tion acting on individuals in a range expansion that
could produce consistent allele frequency clines
between transects. One is spatial sorting. Individuals
that are more likely to disperse, or that disperse long
distances, are more likely to be found towards the wave
front of the range expansion. If dispersal strategy has a
genetic component, breeding between highly dispersive
individuals at the wave front may lead to an increase in
dispersal over time in wave front populations, the so-
called “Olympic Village Effect”(Shine et al. 2011). If this
is the case, one would expect alleles contributing to a
highly dispersive phenotype to show a frequency cline,
increasing from the core to the wave front. Traditional
natural selection could generate such a pattern in two
ways. With positive selection at the expansion front,
individuals would disperse to a new habitat without
respect to genotype. In this new habitat, differential sur-
vival and/or fecundity would lead to changes in allele
frequency in the next generation. An alternative to this
model is relaxed selection at the expansion front, which
may be due to reduced density of conspecifics and
reduced parasite burdens in a deme. As time pro-
gresses, conspecific density and parasites in the deme
will increase, potentially leading to purifying selection
behind the expansion front. Both traditional selection
models will generate differences in allele frequency
between demes along the expansion axis. These types
of selection may be difficult to separate empirically.
Indeed, they are not mutually exclusive and may act to
reinforce or oppose one another. Whilst we also expect
expanding populations to be under selection due to
external environmental variables, such as climate
(Hancock et al. 2011), we predict that selection due to
range expansion processes will be much stronger, par-
ticularly over such a small scale in Ireland where envi-
ronmental variation is limited.
The major challenge to date in identifying genes
involved in adaptation during range expansion has
been in separating the signals of selection from drift
and allele surfing (Hofer et al. 2009). Here, we make use
of replicated transects to identify loci showing signifi-
cant allele frequency clines in the same direction in sev-
eral transects. Of course, this approach may fail to
detect some loci that are under selection, and some out-
liers may continue to reflect drift or allele surfing rather
than selection. However, our simulation modelling
showed that our data contained more extreme allele fre-
quency clines than expected under a neutral model,
suggesting the action of selection. In addition, the fact
that we observe an enrichment of genic vs. nongenic
SNPs in the outliers, and that this enrichment is stron-
ger when we consider the tail of the distribution with
P<0.01 vs. P<0.05, suggests that a majority of these
loci are good candidates for being under selection (Han-
cock et al. 2011).
Our simple outlier detection approach may work
better than Bayenv in this case. This is partly because
Bayenv computes only a Bayes factor for each SNP (com-
paring an environmental selection model to a null
model), meaning that only overall relationships can be
assessed, and information on the direction of relation-
ships in different transects is lost. A second reason is that
in a range expansion much of the population genetic var-
iation lies in the direction of the expansion. By removing
the average effect of the variance–covariance of allele fre-
quencies, Bayenv may also be removing signals of adap-
tation to expansion. Other studies that have used Bayenv
successfully have not considered environmental vari-
ables in parallel with the direction of range expansion
(Eckert et al. 2010; Hancock et al. 2010b, 2011; Chen et al.
2012), and none have considered such a recent range
expansion as the one studied here.
It is predicted that during a range expansion, individ-
uals should experience selection for increased dispersal
(most likely due to spatial sorting; Burton et al. 2010;
Shine et al. 2011) and positive selection for reproduction
early and often at the expansion front (including rapid
growth and maturation; Moreau et al. 2011). They
should also experience relaxed selection on intraspecific
competition. It is likely that very many genes influence
these traits (but see Haag et al. (2005) and Matthews &
Butler (2011)), and it is difficult to make predictions
about the classes of genes that should appear as
outliers. For mammals, we might predict that changes
in dispersal, reproduction and competition might be
mediated via behavioural changes, particular with
©2013 John Wiley & Sons Ltd
ADAPTIVE EVOLUTION DURING A RANGE EXPANSION 2981
regard to how individuals interact with conspecifics. As
the bank vole in Ireland has experienced strong bottle-
necking and genetic drift during its introduction and
subsequent expansion, we expect strong linkage dis-
equilibrium between different regions of the genome.
Therefore, the outliers we identify may not be the tar-
gets of selection themselves, but merely linked to
regions under selection. Nevertheless, it is interesting
that many of the outliers we identified may be involved
in sensory perception and neural development (mg
39858, mg81865, mg123985, mg17560, mg83555, mg
59899, mg87917; see Table 2), although no GO terms
relating to these functions had more than one outlier
assigned to them. Other interesting outliers include
mg10984 [encoding ADAMTS19, a gene involved in sex-
ual differentiation and expressed predominantly in the
foetal ovary (Menke & Page 2002)] and mg26799
(encoding EIFE1B, an oocyte-specific translation initia-
tion factor). These genes are potential candidates for
adaptation relating to differential investment in repro-
duction at the expansion front.
It should be easier to assign a mechanistic basis to
outlier genes involved in the immune response. At the
expansion front, we might expect both an increased
need to invest in other traits and a reduced need
to invest in immunity if parasites lag behind their hosts.
Here, hosts should divert fewer resources to
maintaining their immune systems (White & Perkins
2012). Reduced investments may be targeted at particu-
lar aspects of the immune response, due to different
development and use costs (Lee 2006). For example, if
selection at the expansion front favours rapid growth,
trade-offs may lead to reduced investment in immune
components with particularly high development costs
(van der Most et al. 2011), such as induced cell-medi-
ated and antibody responses (Tschirren et al. 2003). The
distribution of parasites (helminths and ectoparasites)
changes markedly along the axis of the bank vole range
expansion (S. E. Perkins, unpublished data), and so it is
interesting to see differential selection pressure on
immune response reflected at the genetic level. As a
class, we found that immune system genes were signifi-
cantly enriched amongst the outliers. Genic outliers
involved in immunity are mg8107 (a T-cell receptor),
mg68377 (PRDX5, involved in peroxisome signalling),
mg24029 (LKHA4, involved in the inflammatory
response) and mg122511 (NFATC1, which plays a role
in the inducible expression of cytokine genes in T cells,
especially in the induction of IL-2 and IL-4 gene tran-
scription; Table 2). Of course, there is a need to validate
outliers and candidate loci through functional assays,
association studies and quantitative genetic dissection.
This will not only filter out potential false positive out-
liers, but will also lead to a greater understanding of
the mechanistic underpinnings of adaptation to range
expansion.
In this study, ten NS SNPs were identified where one
of the alleles was predicted to have a damaging effect
on the final protein product. Two of these damaging
alleles had a significant decline in frequency with dis-
tance from the point of introduction, whilst one had a
significant increase in frequency. There is therefore no
evidence for extensive fixation or frequency increase in
deleterious alleles during the range expansion, although
this may be due to the small sample of NS SNPs with
predicted functional effects. Lohmueller et al. (2008)
found that non-African human populations had signifi-
cantly more deleterious mutations than Africans, a pat-
tern they interpreted as being due to founder events,
genetic drift and allele surfing as humans moved out of
Africa. In a simulation study, Travis et al. (2007) found
that deleterious alleles arising at the edge of an expan-
sion were much more likely to persist than if they had
arisen in a stationary population. However, this result
depended strongly on the growth (r), carrying capacity
and dispersal (m) parameters used in the simulations. A
high rvalue increases the chances that a deleterious
mutation will surf, whilst at higher mvalues, deleteri-
ous alleles were less likely, and beneficial mutations
more likely, to surf. For small mammals, the costs of
dispersal are such that they should only disperse as far
as the nearest suitable unoccupied space. Emigration
may largely be driven by positive density-dependent
dispersal and agonistic behaviour from conspecifics
(Matthysen 2005; Hahne et al. 2011; Le Galliard et al.
2012), which is supported by our previous analysis of
the bank vole range expansion in Ireland (White et al.
2012). Positive density-dependent dispersal should tend
to reduce the rate of range expansion and minimize the
effect of genetic drift in demes at the expansion front.
Moreover, the simulation results of Travis et al. (2007)
were based on novel mutations arising near the expan-
sion front and did not consider standing variation. In
an introduced population such as the one considered
here, standing deleterious alleles may have increased in
frequency at the introduction site due to drift during
the initial bottleneck. Thereafter, selection by spatial
sorting may result in a kind of ‘spatial purging’. One
might expect that mutations having negative effects on
reproduction or dispersal might tend to be left behind
during a range expansion. Indeed, Travis et al. (2007)
found that mutations with a negative impact on fertility
were much less likely to surf than those with negative
effects on survival. The difference in deleterious allele
frequency between the expansion front and older estab-
lished populations may in general be less pronounced
than suggested by the findings of Lohmueller et al.
(2008).
©2013 John Wiley & Sons Ltd
2982 T. A. WHITE ET AL.
This study used a genome-wide approach to track
changes in genetic diversity across a well-characterized
range expansion. Using both functional and neutral loci,
we found that the introduced bank vole population in
Ireland has lost a substantial proportion of its diversity
during the expansion. Due to changes in diversity along
the axis of expansion and the potential for allele surf-
ing, traditional outlier approaches to detect loci under
selection are likely to return many false positives. Here,
we introduced a new test to detect loci under direc-
tional selection during the expansion. Using a correla-
tion-based approach, we identified a number of genes
under selection during the range expansion. It appears
that the bank vole has been able to respond adaptively
to the range expansion in spite of the general loss of
genetic diversity. However, there is no evidence that
populations at the expansion front carry more deleteri-
ous mutations than those at the range core, and this
may be because spatial purging is also important in
removing deleterious alleles as the population expands
its range. This is of relevance to many other species
expanding their ranges, for example due to climate
change, as it suggests that fitness does not necessarily
decline towards the wave front of the expansion.
The bank vole in Ireland represents an excellent sys-
tem with which to test hypotheses associated with
range expansions. The range is continuing to expand
without any human interference, and the history of the
expansion has been well characterized demographically
(White et al. 2012). In Ireland, the bank vole is expand-
ing into a landscape with relatively minor environmen-
tal perturbations, as shown by the consistent and
similar declines in genetic diversity across all three
transects. As the bank vole is amenable to laboratory
breeding and manipulation, the system also offers the
possibility to study the mechanics of an invasion/range
expansion of a small mammal experimentally.
To date, much work in population genetics and
genomics has used analytical models developed for
populations at approximate equilibrium. As many, if
not most, species have undergone recent range expan-
sions, we believe that it is of general relevance to con-
sider whether range expansions could have influenced
the genetic variation seen in any particular study sys-
tem, and use statistical models and simulations appro-
priate to such cases.
Acknowledgements
This research was supported by a Marie Curie FP7-PEOPLE-
2009-IOF and a Marie Curie FP7-PEOPLE-2009-IEF within the
7th European Community Framework Programme. TW was
also supported by a Heredity Fieldwork Grant from The Genet-
ics Society and a Percy Sladen Memorial Fund Grant from the
Linnean Society. GH acknowledges support from Swiss
National Science Foundation grant 31003A_127377/1. Colin
Lawton, Michael Field-May, Sam Grathoff, Libby Nixon, Nia
Thomas and Sophie Watson assisted in the collection of speci-
mens. The authors would like to thank Rob Elshire, Sharon
Mitchell and Charlotte Acharya in the Buckler lab at Cornell for
help with genotyping-by-sequencing, Rodrigo Vega for help
with laboratory work, Robert Bukowski at the Cornell Compu-
tational Biology Service Unit for bioinformatics advice and
Laurent Excoffier for access to computing facilities. The editor
and reviewers provided helpful comments and suggestions.
References
Altschul SF, Madden TL, Sch€
affer AA et al. (1997) Gapped
BLAST and PSI-BLAST: a new generation of protein data-
base search programs. Nucleic Acids Research,25, 3389–3402.
Barret RD, Schluter D (2008) Adaptation from standing varia-
tion. Trends in Ecology and Evolution,23,38–44.
Beaudin AE, Perry CA, Stabler SP, Allen RH, Stover PJ (2012)
Maternal Mthfd1 disruption impairs fetal growth but does
not cause neural tube defects in mice. American Journal of
Clinical Nutrition,95, 882–891.
Besold J, Schmitt T, Tammaru T, Cassel-Lundhagen A (2008)
Strong genetic impoverishment from the centre of distribu-
tion in southern Europe to peripheral Baltic and isolated
Scandinavian populations of the pearly heath butterfly. Jour-
nal of Biogeography,35, 2090–2101.
Biek R, Henderson JC, Waller LA, Rupprecht CE, Real LA
(2007) A high-resolution genetic signature of demo-
graphic and spatial expansion in epizootic rabies virus.
Proceedings of the National Academy of Sciences USA,104,
7993–7998.
Bossdorf O, Auge H, Lafuma L et al. (2005) Phenotypic and
genetic differentiation between native and introduced plant
populations. Oecologia,144,1–11.
Bradbury PJ, Zhang Z, Kroon DE et al. (2007) TASSEL: soft-
ware for association mapping of complex traits in diverse
samples. Bioinformatics,23, 2633–2635.
Buckley J, Butlin RK, Bridle JR (2012) Evidence for evolution-
ary change associated with the recent range expansion of the
British butterfly, Aricia agestis, in response to climate change.
Molecular Ecology,21, 267–280.
Burton OJ, Phillips BL, Travis JMJ (2010) Trade-offs and the
evolution of life-histories during range expansion. Ecology
Letters,13, 1210–1220.
Chen J, K€
allman T, Ma X et al. (2012) Disentangling the roles
of history and local selection in shaping clinal variation of
allele frequencies and gene expression in Norway Spruce
(Picea abies). Genetics,191, 865–881.
Claassens AJM, O’Gorman F (1965) The bank vole Clethrionomys glare-
olus Schreber –a mammal new to Ireland. Nature,205,923–924.
Coop G, Witonsky D, Di Rienzo A, Pritchard JK (2010) Using
environmental correlations to identify loci underlying local
adaptation. Genetics,185, 1411–1423.
Cwynar LC, MacDonald GM (1987) Geographical variation of
lodgepole pine in relation to population history. American
Naturalist,129, 463–469.
Eckert AJ, Bower AD, Gonz
alez-Mart
ınez SC et al. (2010) Back
to nature: ecological genomics of loblolly pine (Pinus taeda,
Pinaceae). Molecular Ecology,19, 3789–3805.
©2013 John Wiley & Sons Ltd
ADAPTIVE EVOLUTION DURING A RANGE EXPANSION 2983
Edmonds CA, Lillie AS, Cavalli-Sforza LL (2004) Mutations
arising in the wave front of an expanding population. Pro-
ceedings of the National Academy of Sciences USA,101, 975–979.
Elshire RJ, Glaubitz JC, Sun Q et al. (2011) A robust, simple
genotyping-by-sequencing (GBS) approach for high diversity
species. PLoS ONE,6, e19379.
Eppig JT, Blake JA, Bult CJ, Kadin JA, Richardson JE; the
Mouse Genome Database Group (2012) The Mouse Genome
Database (MGD): comprehensive resource for genetics and
genomics of the laboratory mouse. Nucleic Acids Research,40,
D881–D886.
Estoup A, Beaumont M, Sennedot F, Moritz C, Cornuet JM
(2004) Genetic analysis of complex demographic scenarios:
spatially expanding populations of the cane toad, Bufo mari-
nus.Evolution,58, 2021–2036.
Excoffier L, Lischer HEL (2010) Arlequin suite ver. 3.5: a new ser-
ies of programs to perform population genetics analyses under
Linux and Windows. Molecular Ecology Resources,10, 564–567.
Excoffier L, Ray N (2008) Surfing during population expan-
sions promotes genetic revolutions and structuration. Trends
in Ecology and Evolution,23, 347–351.
Excoffier L, Foll M, Petit RJ (2009) Genetic consequences of
range expansions. Annual Review of Ecology, Evolution, and
Systematics,40, 481–501.
Fagundes NJR, Ray N, Beaumont M et al. (2007) Statistical eval-
uation of alternative models of human evolution. Proceedings
of the National Academy of Sciences USA,104, 17614–17619.
Fairley JS (1971) Malareus penicilliger mustelae: a flea new to
Ireland. Entomologist’s Monthly Magazine,107, 44.
Gautier M, Gharbi K, Cezard T et al. (2013) The effect of
RAD allele dropout on the estimation of genetic variation
within and between populations. Molecular Ecology,22,
3165–3178.
Haag CR, Saastamoinen M, Marden JH, Hanski I (2005) A can-
didate locus for variation in dispersal rate in a butterfly
metapopulation. Proceedings of the Royal Society of London.
Series B, Biological Sciences,272, 2449–2456.
Hahne J, Jenkins T, Halle S, Heckel G (2011) Establishment suc-
cess and resulting fitness consequences for vole dispersers.
Oikos,120,95–105.
Hallatschek O, Hersen P, Ramanathan S, Nelson DR (2007)
Genetic drift at expanding frontiers promotes gene segrega-
tion. Proceedings of the National Academy of Sciences USA,104,
19926–19930.
Hancock AM, Alkorta-Aranburu G, Witonsky DB, Di Rienzo A
(2010a) Adaptations to new environments in humans: the
role of subtle allele frequency shifts. Philosophical Transactions
of the Royal Society of London. Series B, Biological Sciences,365,
2459–2468.
Hancock AM, Witonsky DB, Ehler E et al. (2010b) Human
adaptations to diet, subsistence, and ecoregion are due to
subtle shifts in allele frequency. Proceedings of the National
Academy of Sciences USA,107, 8924–8930.
Hancock AM, Witonsky DB, Alkorta-Aranburu G et al. (2011)
Adaptations to climate-mediated selective pressures in
humans. PLoS Genetics,7, e1001375.
Handley LJL, Manica A, Goudet J, Balloux F (2007) Going the
distance: human population genetics in a clinal world. Trends
in Genetics,23, 432–439.
Heckel G, Burri R, Fink S, Desmet J-F, Excoffier L (2005)
Genetic structure and colonization processes in European
populations of the common vole Microtus arvalis.Evolution,
59, 2231–2242.
Hewitt G (2000) The genetic legacy of the Quaternary ice ages.
Nature,405, 907–913.
Hofer T, Ray N, Wegmann D, Excoffier L (2009) Large allele
frequency differences between human continental groups
are more likely to have occurred by drift during range
expansions than by selection. Annals of Human Genetics,73,
95–108.
Hohenlohe PA, Bassham S, Etter PD, et al. (2010) Population
genomics of parallel adaptation in threespine stickleback
using sequenced RAD tags. PloS Genetics,6, e1000862.
Hughes CL, Dytham C, Hill JK (2007) Modelling and analysing
evolution of dispersal in populations at expanding range
boundaries. Ecological Entomology,32, 437–445.
Kalinowski ST (2005) HP-RARE 1.0: a computer program for
performing rarefaction on measures of allelic richness. Molec-
ular Ecology Notes,5, 187–189.
Klopfstein S, Currat M, Excoffier L (2006) The fate of mutations
surfing on the wave of a range expansion. Molecular Biology
and Evolution,23, 482–490.
Kolbe JJ, Glor RE, Rodr
ıguez Schettino L et al. (2004) Genetic
variation increases during biological invasion by a Cuban
lizard. Nature,431, 177–181.
Le Galliard J-F, R
emy A, Ims RA, Lambin X (2012) Patterns
and processes of dispersal behaviour in arvicoline rodents.
Molecular Ecology,21, 505–523.
Lee KA (2006) Linking immune defenses and life history at the
levels of the individual and the species. Integrative and Com-
parative Biology,46, 1000–1015.
Lohmueller KE, Indap AR, Schmidt S et al. (2008) Proportion-
ally more deleterious genetic variation in European than in
African populations. Nature,451, 994–997.
Lubina JA, Levin SA (1988) The spread of a reinvading species:
range expansion in the California sea otter. American Natural-
ist,131, 526–543.
Lynch M (2009) Estimation of allele frequencies from high-
coverage genome-sequencing projects. Genetics,182, 295–301.
Marshall LG, Webb SD, Sepkoski JJ, Raup DM (1982) Mamma-
lian evolution and the great American interchange. Science,
215, 1351–1357.
Matthews LJ, Butler PM (2011) Novelty-seeking DRD4 poly-
morphisms are associated with human migration distance
out-of-Africa after controlling for neutral population gene
structure. American Journal of Physical Anthropology,145, 382–
389.
Matthysen E (2005) Density-dependent dispersal in birds and
mammals. Ecography,28, 403–416.
Menke DB, Page DC (2002) Sexually dimorphic gene expres-
sion in the developing mouse gonad. Gene Expression Pat-
terns,2, 359–367.
Monty A, Mahy G (2010) Evolution of dispersal traits along an
invasion route in the wind-dispersed Senecio inaequidens
(Asteraceae). Oikos,119, 1563–1570.
Moreau C, Bherer C, Vezina H et al. (2011) Deep human gene-
alogies reveal a selective advantage to be on an expanding
wave front. Science,334, 1148–1150.
van der Most PJ, de Jong B, Parmentier HK, Verhulst S (2011)
Trade-off between growth and immune function: a meta-
analysis of selection experiments. Functional Ecology,25,
74–80.
©2013 John Wiley & Sons Ltd
2984 T. A. WHITE ET AL.
Novembre J, Han E (2012) Human population structure and
the adaptive response to pathogen-induced selection pres-
sures. Philosophical Transactions of the Royal Society of London.
Series B, Biological Sciences,367, 878–886.
Parisod C, Bonvin G (2008) Fine-scale genetic structure and
marginal processes in an expanding population of Biscutella
laevigata L. (Brassicaceae). Heredity,101, 536–542.
Parmesan C, Yohe G (2003) A globally coherent fingerprint of
climate change impacts across natural systems. Nature,421,
37–42.
Phillips BL, Brown GP, Webb JK, Shine R (2006) Invasion and
the evolution of speed in toads. Nature,439, 803.
Phillips BL, Kelehear C, Pizzatto L et al. (2010) Parasites and
pathogens lag behind their host during periods of host range
advance. Ecology,91, 872–881.
Prugnolle F, Manica A, Charpentier M et al. (2005) Pathogen-
driven selection and worldwide HLA class I diversity. Cur-
rent Biology,15, 1022–1027.
R Core Team (2012) R: A language and environment for statistical
computing. R Foundation for Statistical Computing, Vienna,
Austria, ISBN 3-900051-07-0, http://www.R-project.org/.
Ramensky VE, Sunyaev SR (2009) Computational analysis of
human genome polymorphism. Molecular Biology,43, 260–268.
Ray N, Currat M, Foll M, Excoffier L (2010) SPLATCHE2: a
spatially-explicit simulation framework for complex demog-
raphy, genetic admixture and recombination. Bioinformatics,
26, 2993–2994.
Ryan A, Duke E, Fairley JS (1996) Mitochondrial DNA in bank
voles Clethrionomys glareolus in Ireland: evidence for a small
founder population and localized founder effects. Acta Theri-
ologica,41,45–50.
Shine R, Brown GP, Phillips BL (2011) An evolutionary process
that assembles phenotypes through space rather than
through time. Proceedings of the National Academy of Sciences
USA,108, 5708–5711.
Simmons AD, Thomas CD (2004) Changes in dispersal during
species’ range expansions. American Naturalist,164, 378–395.
Slatkin M, Excoffier L (2012) Serial founder effects during
range expansion: a spatial analog of genetic drift. Genetics,
191, 171–181.
Stephens PA, Sutherland WJ (1999) Consequences of the Allee
effect for behaviour, ecology and conservation. Trends in
Ecology and Evolution,14, 401–405.
Stuart P, Mirmin L, Cross TF et al. (2007) The origin of Irish
bank voles Clethrionomys glareolus assessed by mitochondrial
DNA analysis. Irish Naturalists’ Journal,28, 440–446.
Thomas PD, Campbell MJ, Kejariwal A et al. (2008) PANTHER:
a library of protein families and subfamilies indexed by
function. Genome Research,13, 2129–2141.
Travis JMJ, Dytham C (2002) Dispersal evolution during inva-
sions. Evolutionary Ecology Research,4, 1119–1129.
Travis JMJ, Munkemuller T, Burton OJ et al. (2007) Deleterious
mutations can surf to high densities on the wave front of an
expanding population. Molecular Biology and Evolution,24,
2334–2343.
Travis JMJ, Mustin K, Benton TG, Dytham C (2009) Accelerating
invasion rates result from the evolution of density-dependent
dispersal. Journal of Theoretical Biology,259, 151–158.
Tschirren B, Fitze PS, Richner H (2003) Sexual dimorphism in
susceptibility to parasites and cell-mediated immunity in
great tit nestlings. Journal of Animal Ecology,72, 839–845.
Tsutsui ND, Suarez AV, Holway DA, Case TJ (2000) Reduced
genetic variation and the success of an invasive species.
Proceedings of the National Academy of Sciences USA,97, 5948–
5953.
UniProt Consortium (2012) Reorganizing the protein space at
the Universal Protein Resource (UniProt). Nucleic Acids
Research,40, D71–D75.
Velo-Ant
on G, Rodr
ıguez D, Savage AE et al. (2012) Amphib-
ian-killing fungus loses genetic diversity as it spreads across
the New World. Biological Conservation,146, 213–218.
Waters JM, Fraser CI, Hewitt GM (2012) Founder takes all:
density-dependent processes structure biodiversity. Trends in
Ecology and Evolution,28,78–85.
White TA, Perkins SE (2012) The ecoimmunology of invasive
species. Functional Ecology,26, 1313–1323.
White TA, Lundy MG, Montgomery WI et al. (2012) Range
expansion in an invasive small mammal: influence of life-
history and habitat quality. Biological Invasions,14, 2203–
2215.
White TA, Perkins SE, Heckel G, Searle JB. (2013) Data from:
adaptive evolution during an ongoing range expansion: the
invasive bank vole (Myodes glareolus) in Ireland. Dryad Digital
Repository. doi:10.5061/dryad.fb782.
Xu H, Wu XR, Wewer UM, Engvall E (1994) Murine muscular
dystrophy caused by a mutation in the laminin alpha 2
(Lama2) gene. Nature Genetics,8, 297–302.
Yang W-Y, Novembre J, Eskin E, Halperin E (2012) A model-
based approach for analysis of spatial structure in genetic
data. Nature Genetics,44, 725–731.
Zayed A, Whitfield CW (2008) A genome-wide signature of
positive selection in ancient and recent invasive expansions
of the honey bee Apis mellifera.Proceedings of the National
Academy of Sciences USA,105, 3421–3426.
T.A.W., G.H. and J.B.S. designed and planned the
study. T.A.W. and S.E.P. carried out the fieldwork in
Ireland. T.A.W. carried out the analyses. T.A.W., S.E.P.,
G.H. and J.B.S. wrote the manuscript.
Data accessibility
Genotype data are available via Dryad doi:10.5061/
dryad.fb782 (White et al. 2013). Illumina reads are avail-
able from the Sequence Read Archive accession
SRP020629.
Supporting information
Additional supporting information may be found in the online ver-
sion of this article.
Table S1 Results of linear regression of genetic diversity on
distance from the introduction site at Foynes.
Table S2 Potentially deleterious alleles identified by PolyPhen-
2 and correlations with distance from the introduction site.
©2013 John Wiley & Sons Ltd
ADAPTIVE EVOLUTION DURING A RANGE EXPANSION 2985