ArticlePDF Available

Using spatial Bayesian methods to determine the genetic structure of a continuously distributed population: Clusters or isolation by distance?

Wiley
Journal of Applied Ecology
Authors:
  • Administration de la nature et des forêts, Luxembourg
  • Administration de la nature et des forêts

Abstract and Figures

Summary • Spatially explicit Bayesian clustering techniques offer a powerful tool for ecology and wildlife management, as genetic divisions can be correlated with landscape features. We used these methods to analyse the genetic structure of a population of European wild boar Sus scrofa with the aim of identifying effective barriers for disease management units. However, it has been suggested that the methods could produce biased results when faced with deviations from random mating not caused by genetic discontinuities, such as isolation by distance (IBD). • We analysed a data set consisting of 697 wild boar multilocus genotypes using spatially explicit (baps, geneland) and non-explicit (structure) Bayesian methods. We also simulated and analysed data sets characterized by different degrees of IBD, with and without genetic discontinuities. • When analysing the empirical data set, different programs did not converge on the same clustering solution and some clusters were difficult to explain biologically. Results from the simulated data showed that IBD, also present in the empirical data set, could cause the Bayesian methods to overestimate genetic structure. Simulated barriers were identified correctly, but the programs superimposed further clusters at higher IBD levels . • It was not possible to ascertain with confidence whether the clustering solutions offered by the various programs were an accurate reflection of population genetic structure in our empirical data set or were artefacts created by the underlying IBD pattern. • Synthesis and applications: We show that Bayesian clustering methods can overestimate genetic structure when analysing an individual-based data set characterized by isolation by distance. This bias could lead to the erroneous delimitation of management or conservation units. Investigators should be critical and suspicious of clusters that cannot be explained biologically. Data sets should be tested for isolation by distance and conclusions should not be based on the output from just one method.
Content may be subject to copyright.
Journal of Applied Ecology
2009,
46
, 493–505 doi: 10.1111/j.1365-2664.2008.01606.x
© 2009 The Authors. Journal compilation © 2009 British Ecological Society
Blackwell Publishing Ltd
Using spatial Bayesian methods to determine
the genetic structure of a continuously distributed
population: clusters or isolation by distance?
A. C. Frantz
1
*, S. Cellina
2
, A. Krier
2
, L. Schley
2
and T. Burke
1
1
Department of Animal and Plant Sciences, University of Sheffield, Sheffield S10 2TN, UK;
2
Direction des Eaux et Forêts,
16 rue Eugène Ruppert, L-2453 Luxembourg
Summary
1.
Spatially explicit Bayesian clustering techniques offer a powerful tool for ecology and wildlife
management, as genetic divisions can be correlated with landscape features. We used these methods
to analyse the genetic structure of a population of European wild boar
Sus scrofa
with the aim of
identifying effective barriers for disease management units. However, it has been suggested that the
methods could produce biased results when faced with deviations from random mating not caused
by genetic discontinuities, such as isolation by distance (IBD).
2.
We analysed a data set consisting of 697 wild boar multilocus genotypes using spatially explicit
(
baps, geneland
) and non-explicit (
structure
) Bayesian methods. We also simulated and analysed
data sets characterized by different degrees of IBD, with and without genetic discontinuities.
3.
When analysing the empirical data set, different programs did not converge on the same
clustering solution and some clusters were difficult to explain biologically. Results from the
simulated data showed that IBD, also present in the empirical data set, could cause the Bayesian
methods to overestimate genetic structure. Simulated barriers were identified correctly, but the
programs superimposed further clusters at higher IBD levels.
4.
It was not possible to ascertain with confidence whether the clustering solutions offered by the
various programs were an accurate reflection of population genetic structure in our empirical data
set or were artefacts created by the underlying IBD pattern.
5.
Synthesis and applications
: We show that Bayesian clustering methods can overestimate genetic
structure when analysing an individual-based data set characterized by isolation by distance. This
bias could lead to the erroneous delimitation of management or conservation units. Investigators should
be critical and suspicious of clusters that cannot be explained biologically. Data sets should be tested
for isolation by distance and conclusions should not be based on the output from just one method.
Key-words:
classical swine fever, landscape genetics, spatial genetic structure,
Sus scrofa
,
translocation, wildlife diseases, wildlife forensics
Introduction
Bayesian clustering algorithms are prominent computational
tools for inferring genetic structure in molecular ecology.
Mostly, these methods probabilistically assign individuals
to groups based on their multi-locus genotypes by minimizing
Hardy–Weinberg and linkage disequilibria, without presum-
ing pre-defined populations (Pearse & Crandall 2004). Recent
advances explicitly address the spatial nature of the problem
of locating genetic discontinuities by including the geographical
coordinates of individuals in their prior distributions (e.g.,
Guillot
et al.
2005; Corander, Sirén & Arjas 2008). These models
offer a powerful tool to answer questions in ecology, conserva-
tion and wildlife management, as genetic discontinuities within
populations can be correlated with landscape features.
There have been few studies to evaluate the performance of
the various spatial Bayesian methods in biologically realistic
scenarios and there is uncertainty about possible method-
ological biases. Of particular concern in this context is isolation
by distance (IBD) – the regular increase in genetic differentiation
among individuals with geographical distance due to limited
dispersal – which has been shown to potentially confound the
non-spatial Bayesian methods. For example, investigations
of global human genetic diversity have highlighted that the
structure
algorithm (Pritchard, Stephens & Donnelly 2000)
may detect non-existent clusters when geographical sampling
*Correspondence author. E-mail: alainfrantz@yahoo.co.uk
494
A. C. Frantz
et al.
© 2009 The Authors. Journal compilation © 2009 British Ecological Society,
Journal of Applied Ecology
,
46
, 493–505
is clumped along an IBD cline (Serre & Pääbo 2004; Rosenberg
et al.
2005). Similarly, Schwartz & McKelvey (in press) have
recently shown that
structure
may superimpose artificial
clusters on data sets solely characterized by an IBD cline, even
when geographical sampling is conducted evenly over the
more local scale typical of many ecological studies.
The issue as to whether the spatially explicit Bayesian
algorithms are similarly biased, despite imposing additional
spatial constraints on a clustering solution, has yet to be tested.
Some authors have suggested that, when a data set is charac-
terized by isolation by distance, taking the spatial context of
individuals into consideration might improve the efficiency of
the analysis (Frantz
et al.
2006; Fontaine
et al.
2007). However,
Guillot
et al.
(2005) hypothesized that IBD (as well as other
deviations from random mating not caused by genetic dis-
continuities) could negatively affect the performance of their
geneland
application. In particular, the program might over-
estimate genetic clustering of the data and not be capable of
correctly detecting and locating a genuine genetic discontinuity.
Consequently, while it has become possible to test hypotheses
concerning barriers to dispersal and gene flow, the reliability
of spatial Bayesian programs when analysing data sets
characterized by an IBD pattern needs further clarification.
This is important, as inappropriate clustering can lead to the
erroneous delimitation of management or conservation units.
Here, our initial objective was to analyse the genetic struc-
ture of wild boar
Sus scrofa
L. in Wallonia, Luxembourg,
and the Rhineland-Palatinate where there have been recent
outbreaks of classical swine fever, a highly contagious viral
disease of pigs (Schoos 2002). Prevention and control of infec-
tion in wild boar is of great importance, as diseased individuals
represent a permanent threat to domestic pig populations and
the large-scale culling necessary to control the disease on pig
farms can cause major economic losses (Artois
et al.
2002).
We therefore aimed to use spatially explicit Bayesian clustering
methods to correlate genetic discontinuities with landscape
features and thereby identify geographical barriers to gene flow
that could be used to effectively delimit management units (see
Anonymous 1999). However, given the continuous distribution
of the species in the study area, and considering that individual
European wild boar rarely disperse beyond 20 km (e.g., Truvé
& Lemel 2003), it was likely that the data set would exhibit an
IBD pattern. Given the unknown reliability of the spatial
methodologies when confronted with such data, the issue of
artificial clustering needed to be taken seriously. We therefore
simulated data sets characterized by different levels of IBD,
with and without genetic discontinuities. The main objective of
these simulations was to test whether the spatial Bayesian methods
could overestimate genetic structure when faced with IBD and,
consequently, fail to detect genuine genetic discontinuities.
Methods
STUDY
AREA
We aimed to investigate the genetic structure of wild boar from
Luxembourg, Wallonia, and the north-west of Rhineland-Palatinate
(Fig. 1), an area covering approximately 19 500 km
2
with extensive
forest cover accounting for 32·3% of the area of Wallonia (Perrin,
Temmerman & Laitat 2000), 34·5% of Luxembourg (Rondeux 2006)
and 42·0% of the Rhineland-Palatinate (von Rüden 2006). There are
a number of potential barriers to wild boar dispersal in the study
area (Fig. 1). The Moselle river valley has previously been identified
as a dispersal-barrier for red deer
Cervus elaphus
L. (Frantz
et al.
2006). A number of fenced four-lane motorways, all
<
30 years old,
dissected the study area (Fig. 1). Finally, about 10 years ago, wild
boar with an abnormal behaviour suddenly appeared in two hunting
areas in southern Luxembourg, from which the species had previously
been absent (Fig. 1). Clandestine translocation was suspected, but
we did not have a priori suspects among our samples.
LABORATORY
PROCEDURES
Tissue samples (spleen, ear or muscle) were either frozen or stored in
70% ethanol. DNA extractions were performed following Whitlock
et al
. (2008). Genotyping was performed using 14 unlinked micro-
satellite loci (Hampton
et al.
2004):
S0002
,
S0005
,
S0026
,
S0090
,
S0097
,
S0155
,
S0226
,
Sw122
,
Sw240
,
Sw632
,
Sw857
,
SW911
,
Sw936
,
Sw951
. The loci were amplified in three multiplexed Polymerase
Chain Reactions (PCR). Information on which loci were amplified
together and the detailed conditions for these reactions can be found
in Supporting Information, Table S1 and Appendix S1. Reactions
were performed using a DNA Engine Tetrad thermocycler (Bio-Rad,
Hercules, California, USA). Fragments were separated using an ABI
3730 automated DNA sequencer (Applied Biosystems, Carlsbad,
California, USA) and the data were analysed using
genemapper
version 3·5 (Applied Biosystems).
If one locus in a multiplex failed to amplify, the whole multiplex
was re-genotyped. In the majority of cases, it was possible to amplify
all the loci in this second round of amplification and these results
were retained for the final data set. If a locus in a multiplex failed
again, only genotypes with an identical score in the first and second
amplification were retained. In order to assess genotyping errors, 40
samples (out of a total of 697) were chosen randomly from the data
base, re-extracted and re-genotyped, while 10 of the initial extracts
were genotyped in duplicate. Allelic mismatches were identified by
comparing these 50 duplicate genotypes to the initial ones.
POPULATION
GENETIC
ANALYSES
We calculated observed (
H
O
) and expected (
H
E
) heterozygosities
(Nei 1978) for each locus, as well as the average expected hetero-
zygosity, using
genetix
4·05·2 (Belkhir 2004). The data were tested
for linkage disequilibrium using an exact test based on a Markov
chain method as implemented in
genepop
3·4 (Raymond & Rousset
1995). The same program was used to perform the exact tests of Guo
& Thompson (1992) for deviations from Hardy–Weinberg (HW)
genotypic proportions at each locus. The sequential Bonferroni
technique was used to eliminate significance by chance (Rice 1989).
POPULATION
GENETIC
STRUCTURE
We used different Bayesian clustering methods to investigate the
spatial genetic structure of the wild boar in the sampled region.
Firstly, we used the spatial Bayesian clustering method implemented
in program
baps
4·14 (Corander, Sirén & Arjas 2008). The program
was run ten times for each of
K
=
2–10. Given the potential presence
of relatives of wild boar suspected of clandestine translocation (see
Genetic structure of a continuous population
495
© 2009 The Authors. Journal compilation © 2009 British Ecological Society,
Journal of Applied Ecology
,
46
, 493–505
above), we looked for unusually strongly differentiated clusters by
using
phylip
3·66 (Felsenstein 2005)
to construct a neighbour-
joining tree with the Kullback–Leibler divergence matrix provided
as output with
baps
. This matrix can be used as a measure of the
relative genetic distance between the
baps
-inferred clusters. We also
tried to identify suspects by visualizing the genetic relationship between
the individuals in our data set using a factorial correspondence
analysis (FCA) in the program
genetix
version 4·05·2.
Secondly, we analysed population genetic structure using a
Bayesian model executed in a Markov chain Monte Carlo (MCMC)
scheme and implemented in the
geneland
version 2·0·0 extension
(Guillot, Mortier & Estoup 2005) of program
r
2·4·1 (Ihaka &
Gentleman 1996). The number of clusters was determined by
running the MCMC iterations five times, allowing
K
to vary, with
the following parameters: 500 000 MCMC iterations, maximum
rate of the Poisson process fixed to 100, uncertainty attached to the
spatial coordinates fixed at 5 km, minimum
K
=
1, maximum
K
=
10,
maximum number of nuclei in the Poisson–Voronoi tessellation
fixed to 300. The Dirichlet model was used as a prior for all allele
frequencies. After inferring the number of populations in the data
set from these five runs, the MCMC was run 30 times with
K
fixed to
the inferred number of clusters, with the other parameters the same
as above. The mean logarithm of the posterior probability was
calculated for each of the 30 runs and the posterior probability of
population membership for each pixel of the spatial domain was
then computed for the three runs with the highest values.
Finally, program
structure
version 2·2 (Pritchard, Stephens &
Donnelly 2000) was also used to investigate the genetic structure of
the wild boar. The first step of the analysis consisted of estimating
K
, the number of subpopulations or clusters. Ten independent runs
of
K
=
1–10 with 200 000 MCMC iterations and a burn-in period
of 100 000 were performed, using the model with correlated allele
frequencies and assuming admixture. ALPHA, the Dirichlet parameter
for the degree of admixture, was allowed to vary between runs. For
each value of
K
, the log-likelihood values were averaged and standard
deviation calculated. We tried to infer the appropriate number of
clusters by calculating the
Δ
K
statistic (Evanno, Regnaut & Goudet
2005). After placing samples into the cluster for which they showed
the highest percentage of membership (
q
), averaging
q
over the 10
runs, we plotted the
structure
-assigned individuals on a map of the
study region to assess geographical congruence of the clusters.
SIMULATIONS
OF
ISOLATION
-
BY
-
DISTANCE
DATA
SETS
Data sets characterized by IBD were simulated using the program
mutant tracker
0·211
α
(Wilkins 2004). This coalescent program
simulates genealogies of samples drawn from a continuous habitat
with limited gene flow, allowing the user to specify the geographical
location from which each sample is drawn, as well as the effective
population density and the dispersal rate of the simulated popula-
tion. The location of each lineage is tracked explicitly backwards in
time and the lineages move by a Gaussian random walk. Two lineages
coalesce if at a particular time they are within a certain distance of
each other – the inverse of the population density specified by the
user. Mutations were simulated using a stepwise model.
Fig. 1. Outline of the study area and harvest location of tissue samples. B, Belgium; F, France; L, Luxembourg; D, Germany. Meandering
black line, river Moselle; double black lines, motorways; grey circle in the south of Luxembourg, location of suspected illegal introductions.
496
A. C. Frantz
et al.
© 2009 The Authors. Journal compilation © 2009 British Ecological Society,
Journal of Applied Ecology
,
46
, 493–505
We performed simulations based on the geographical coordinates
of the 678 autochthonous individuals (see Results) in the empirical
data set. However, we additionally performed simulations based on
678 artificial sampling locations generated so that they were spread
evenly over our study area. This was done to exclude the possibility
that heterogeneous spatial sampling in the empirical data set caused
the algorithms to overestimate genetic clustering (Serre & Pääbo 2004;
Schwartz & McKelvey in press). The homogenous set of coordinates
was generated using
geneland
version 2·0·0, by setting the limits of
the geographical domain to correspond to the boundaries of our
study area. We performed simulations based on 14 loci using two
different densities: 0·5 and 3 individuals per unit area (equivalent
to 1 km
2
). While the latter choice might appear excessively high,
population densities of wild boar in Western Europe have increased
dramatically over the last 30 years. For example, over 4000 wild boar
were hunted in Luxembourg (area 2586 km
2
) in 2004 (Schley
et al.
2008). For both densities, we simulated three independent data sets
for dispersal distances ranging from 0·1 to 7, increasing distances
incrementally by 0·1 units. For eight dispersal distance, chosen to
cover a range of IBD levels, the data sets were analysed using the
Bayesian clustering methods.
We assessed the level of IBD in the simulated and empirical data
sets by performing individual-based statistical correlation analyses
between a measure of genetic kinship and the (log-transformed)
pairwise spatial distances using SPAGeDi: 1·2 (Hardy & Vekemans
2002). The linear regression slope,
b
, of this relationship offers a con-
venient measure of the degree of spatial genetic structuring (Hardy
& Vekemans 2002). As suggested by Vekemans & Hardy (2004),
Loiselle’s kinship coefficient (
F
ij
) was chosen as a pairwise estimator
of genetic relatedness, as it is a relatively unbiased estimator with low
sampling variance. The standard error and significance of the linear
regression slope were calculated by jackknifing (over loci) and by
10 000 permutations of locations, respectively.
We also simulated data sets that, in addition to an IBD pattern,
contained a barrier to gene flow. This barrier was designed to run
vertically through the middle of the study area, at a longitude of
approximately 6
°
06
. We set the likelihood that a lineage encountering
the barrier passed through it (i.e. its permeability) to 0·01 and assumed
the barrier to have been present for 100 generations in the past. Again,
we assumed population densities of 0·5 and 3, but limited the dispersal
distance to the eight values chosen previously, generating three data
sets for each combination. When calculating IBD slopes in SPAGeDi:,
we limited the analysis to individuals located on the same side of the
simulated barrier.
In order to analyse the simulated data sets within a reasonable time
frame,
geneland
runs were limited to 250 000 MCMC iterations. The
MCMC was run only 10 times with
K
fixed to the inferred number of
clusters and further analyses were performed only on the best-
supported run. This should be sufficient to indicate whether the inferred
clusters were geographically coherent and whether the simulated
barrier was correctly identified and located by the programs.
Partial Mantel tests (Smouse, Long & Sokal 1986) have been used
previously to validate clusters inferred using the Bayesian algorithms
(e.g. Frantz
et al.
2006). This was done by comparing the genetic
differentiation between pairs of individuals assigned to different clusters
to that between pairs from the same cluster, while controlling for the
underlying effect of isolation by distance. However, the validity of
this approach has been debated: assessing the significance of partial
Mantel tests using randomization methods may be problematic
when, in addition to the underlying IBD pattern, the explanatory
variable (in our case the assignment of pairs of individuals to the
same or different clusters) is also spatially non-random (Raufaste &
Rousset 2001; Castellano & Balletto 2002; Rousset 2002). Preliminary
tests on some of the simulated data (density 0·5, dispersal 4·4) did
indeed show that partial Mantel tests always indicated the presence
of a barrier between the inferred clusters, even in the data sets
characterized by isolation by distance alone. We therefore did not
perform systematic analyses using the partial Mantel method. How-
ever, we attempted to assess the validity of the empirical clusters by
comparing their overall level of genetic diversity to the corresponding
values for clusters inferred from a subset of the simulated data in
which the simulated IBD pattern was similar to the one seen in the
empirical data. The level of genetic differentiation between clusters
was quantified with
F
ST
(Weir & Cockerham 1984) in SPAGeDi:. The
standard error and significance of the estimates were calculated by
jackknifing (over loci) and by 10 000 permutations of genotypes
between populations, respectively.
Results
In total, 697 DNA samples were obtained from Wallonia (227
samples), Luxembourg (289 samples) and the German
federal state of Rhineland-Palatine (181 samples; Fig. 1). It
was possible to generate a full 14-locus profile for 692 of the
697 samples (see Supporting Information,Table S1). In order
to assess the reliability of the profiles, 700 genotypes (and
1400 alleles) were compared, i.e. 50 samples (13·9%) typed in
duplicate at 14 loci each. Duplicate genotypes always corre-
sponded to the initial genotype and no allelic dropout was
observed. We therefore expect an error rate of less than 1/1400
=
17·14
×
10
–4
per allele, too small to affect our results.
The mean expected heterozygosity in the complete data set
was
H
E
(
±
SD)
=
0·615
±
0·222. We expected the collective data
set to exhibit signs of a Wahlund effect, either through the presence
of distinct genetic clusters, or due to IBD in the study area. The
global sample of microsatellite loci did indeed show a highly
significant deficit of heterozygotes as compared to Hardy–
Weinberg expectations (
P
<
0·001; see Supporting Information,
Table S1). Similarly, when analysing the whole data set, 29 pairs
of the unlinked loci deviated from linkage disequilibrium at
P < 0·05 before Bonferroni correction, and nine pairs after.
POPULATION GENETIC STRUCTURE
Taking spatial information into account, baps 4·14 gave a
probability of > 0·999 of there being four genetic clusters in
the study area. The inferred clusters were geographically
coherent (Fig. 2a). The German samples to the southeast
of Luxembourg formed a genetic cluster (cluster ‘East’), as
did the Belgian samples to the west of Luxembourg (cluster
‘West’). The third cluster was distributed over a larger area
from Belgium, across Luxembourg, to the northern part of the
Rhineland-Palatinate (cluster ‘Central’). The fourth cluster,
however, was formed by 19 samples collected in the south of
Luxembourg (cluster ‘Suspect’), roughly in the area where the
translocations were suspected to have occurred.
The population dendrogram based on the Kullback–
Leibler divergence matrix showed that the longest branch
separated the suspects from the remaining clusters (Fig. 3a).
The FCA showed that there were a number of genetic profiles
Genetic structure of a continuous population 497
© 2009 The Authors. Journal compilation © 2009 British Ecological Society, Journal of Applied Ecology, 46, 493–505
that differed substantially from the majority of the samples
in the data set (Fig. 3b). These samples corresponded to the
baps-inferred cluster in the south of Luxembourg. Two alleles
were found frequently in the 19 genetic profiles that made up
this cluster, that were not present in any of the 678 remaining
profiles: allele 167 at locus SW911 in 13 individuals and
allele 243 at locus S0097 in six individuals. Eighteen of these
individuals originated from the region where illegal transloca-
tion has been suspected, while one individual was sampled a
little farther north (Figs 1, 2a). We therefore concluded that
these 19 individuals were probably non-autochthonous and
related to individuals illegally translocated 10 years earlier.
When repeating the spatial baps analysis with the suspect
individuals omitted (henceforth referred to as the truncated
data set), we obtained slightly different results, as the program
still gave a probability > 0·999 of there being four genetic clusters
Fig. 2. Modal assignment of individuals to different clusters. Assignments of the spatial model in baps (a) using the complete data set (K = 4)
and (b) the data set with non-autochthonous individuals omitted (K = 4). geneland assignments using (c) the complete data set with K fixed to
K = 3 and (d) on the truncated data set with K fixed to K = 2. Different symbols represent different clusters.
Fig. 3. Genetic differentiation of suspect
individuals estimated using (a) an unrooted
neighbour-joining tree based on the Kullback–
Leibler divergence matrix provided by baps
and (b) a factorial correspondence analysis.
The per cent of the total variation explained
by each of the two axes is given. The various
symbols represent the individual assignment
to the four baps clusters, with the black squares
corresponding to suspect samples.
498 A. C. Frantz et al.
© 2009 The Authors. Journal compilation © 2009 British Ecological Society, Journal of Applied Ecology, 46, 493–505
in the study area. The same clusters were obtained as in the
analysis with the complete data set (Fig. 2b), with the
exception that cluster West was split into two: one cluster
immediately to the west of Luxembourg (‘West One’), and
one west of the motorway (‘West Two’). Further analyses
were performed using geneland and structure to test for
the robustness of this result.
We firstly used the geneland method on the complete data
set. The five initial geneland runs suggested the presence of
three genetic clusters in the study area (Supporting Informa-
tion, Fig. S1a). However, fixing the number of populations to
three, the modal assignments of individuals in the run with
the highest mean logarithm of posterior probability were not
geographically coherent, as the inferred clusters consisted of
geographically distinct groups (Fig. 2c). The second and
third best runs gave rise to similar clusters (results not shown).
When performing the geneland analysis with the truncated
data set, the five initial runs suggested the presence of only two
genetic clusters in the study area (Supporting Information,
Fig. S1a). The modal assignments in the three best-supported
runs roughly corresponded to the baps cluster West and a
combination of Central and East (Fig. 2d).
The spatially explicit Bayesian programs did not converge
on a clustering solution. When using the non-spatial program
structure 2·2, the highest value for ΔK, the rate of change
in the log probability of the data between successive clusters,
was obtained for K = 3 (Supporting Information, Fig. S1b).
However, the corresponding clusters were largely overlapping
and not geographically coherent (Supporting Information,
Fig. S2).
SIMULATIONS OF ISOLATION-BY-DISTANCE
DATA SETS
We chose the data sets generated for eight dispersal distances
to test the performance of the Bayesian clustering methods
(Table 1). We first simulated IBD data sets that did not
contain any genetic discontinuities (i.e. barriers). All three
Bayesian programs overestimated clustering at the higher
levels of IBD (i.e. b –0·01), where one or more loci deviated
from Hardy-Weinberg proportions at P < 0·05 after Bonferroni
correction (Table 1a). Overall, no program appeared to
substantially outperform the others at avoiding the inference
of artificial clusters. For each combination of dispersal and
density, the three replicates and all three programs broadly
gave similar number of clusters. In the case of structure,
however, by applying the correction of Evanno, fewer clusters
were inferred than with the other two methods.
Table 1. Characteristics of simulated data sets and the number of genetic clusters inferred using three Bayesian programs. Simulated data sets
were characterized either by (a) isolation by distance alone (one cluster) or (b) contained one barrier to gene flow (two clusters). The permeability
of the barrier was set to 0·01 and it was assumed to have been present for 100 generations in the past. The numbers in brackets indicate the
number of clusters that had no individuals assigned to them by geneland (so-called ghost populations). ln(X | K), K with the highest log-
likelihood value; ΔK, number of clusters according to the correction by Evanno; HW, number of loci deviating from Hardy–Weinberg
proportions after correction for multiple tests; FST, level of genetic differentiation of the two simulated clusters
Dispersal (unit area)
Density: 0·5 individual unit area–1 Density: 3 individuals unit area–1
IBD slope baps geneland
structure
HW IBD slope baps geneland
structure
HWln(X | K)ΔKln(X | K)ΔK
(a)
1·5 –0·076 10 10 10 2 10 –0·017 5 5 6 2 8
1·5 –0·079 10 10 10 4 10 –0·021 9 8 6 3 9
1·5 –0·077 10 10 10 2 10 –0·021 7 6 9 3 9
2·0 –0·053 10 10 (1) 10 2 10 –0·013 5 5 6 2 5
2·0 –0·068 10 10 10 5 10 –0·014 3 4 6 3 7
2·0 –0·054 10 10 (1) 10 2 10 –0·012 3 4 4 2 4
2·5 –0·044 8 7 10 2 10 –0·008 1 2 3 0
2·5 –0·034 10 8 (1) 10 2 9 –0·008 2 2 2 1
2·5 –0·037 10 6 10 2 8 –0·008 2 2 2 0
3·0 –0·041 9 6 8 2 8 –0·006 1 1 1 0
3·0 –0·034 6 5 5 2 10 –0·006 1 1 1 1
3·0 –0·029 9 5 7 2 9 –0·006 1 2 2 1
3·9 –0·022 4 4 4 2 4 –0·004 1 1 1 0
3·9 –0·018 3 2 2 5 –0·004 1 1 1 0
3·9 –0·018 3 3 3 2 –0·004 1 1 1 0
4·4 –0·014 3 3 3 1 –0·002 1 1 1 0
4·4 –0·011 2 2 1 1 –0·003 1 1 1 0
4·4 –0·014 3 3 3 2 3 –0·003 1 1 1 0
6·0 –0·008 1 2 1 0 –0·002 1 1 1 0
6·0 –0·007 1 2 1 0 –0·001 1 1 1 0
6·0 –0·006 1 1 1 0 –0·002 1 1 1 0
6·8 –0·006 1 1 1 1 –0·001 1 1 1 0
6·8 –0·007 1 1 1 0 –0·001 1 1 1 0
6·8 –0·007 1 1 1 0 –0·001 1 1 1 1
Genetic structure of a continuous population 499
© 2009 The Authors. Journal compilation © 2009 British Ecological Society, Journal of Applied Ecology, 46, 493–505
Dispersal (unit area)
Density: 0·5 individual unit area–1 Density: 3 individuals unit area–1
IBD slope FST baps geneland
structure
IBD slope FST baps geneland
structure
ln(X | K)ΔKln(X | K)ΔK
(b)
1·5 –0·101 0·083 10 10 10 3 –0·022 0·029 9 8 (7) 10 2
1·5 –0·099 0·103 10 10 10 2 –0·022 0·024 8 8 10 2
1·5 –0·080 0·139 10 10 10 2 –0·022 0·021 8 10 10 3
2·0 –0·073 0·078 10 10 10 2 –0·014 0·017 4 4 7 2
2·0 –0·066 0·067 10 10 10 2 –0·011 0·018 3 5 6 2
2·0 –0·056 0·114 10 10 (1) 10 2 –0·013 0·016 3 5 8 2
2·5 –0·049 0·076 10 10 (1) 10 2 –0·008 0·012 2 2 3 2
2·5 –0·045 0·066 10 10 (1) 10 2 –0·008 0·014 2 1 4 2
2·5 –0·049 0·063 10 9 (1) 10 2 –0·009 0·013 2 2 4 2
3·0 –0·035 0·061 10 9 (1) 9 2 –0·006 0·013 2 2 2
3·0 –0·028 0·056 10 6 10 2 –0·006 0·013 2 2 2
3·0 –0·027 0·057 8 5 8 2 –0·005 0·008 1 2 2
3·9 –0·021 0·050 4 4 6 2 –0·003 0·006 1 1 1
3·9 –0·020 0·050 4 4 5 2 –0·004 0·008 1 1 2
3·9 –0·026 0·055 4 4 6 2 –0·004 0·009 1 1 2
4·4 –0·014 0·038 3 3 4 2 –0·003 0·007 1 1 1
4·4 –0·017 0·028 3 3 3 2 –0·002 0·005 1 1 1
4·4 –0·013 0·044 2 2 2 –0·003 0·010 1 1 2
6·0 –0·010 0·022 2 2 2 –0·002 0·007 1 1 1
6·0 –0·008 0·027 2 2 2 –0·002 0·004 1 1 1
6·0 –0·007 0·027 2 2 2 –0·001 0·004 1 1 1
6·8 –0·006 0·018 2 2 2 –0·001 0·006 1 1 1
6·8 –0·007 0·034 2 2 2 –0·001 0·004 1 1 1
6·8 –0·007 0·017 2 2 2 –0·001 0·004 1 1 1
Table 1. Continued
With a few exceptions at the higher levels of IBD (Table 1a),
geneland always assigned individuals to each of the inferred
clusters and did not infer the presence of ‘ghost populations’.
The clusters inferred by the spatial methods were always
geographically coherent, and sometimes, similarly to the
empirical data, could have been explained biologically due
to the presence of roads or rivers at their boundaries (e.g.,
Fig. 4a,b). In the case of structure, however, the coherence
of the inferred clusters decreased with decreasing levels of
IBD (e.g., Fig. 4c,d).
Both spatial Bayesian methods correctly identified and
located the genetic discontinuity that was simulated to bisect
the study area (Table 1b, Fig. 4e,f), but each superimposed
further clusters at the higher levels of IBD (e.g., Table 1b,
Fig. 4g,h). Also, no program identified a barrier in the higher-
density data sets characterized by the longest dispersal distances
(Table 1b). By applying the Evanno correction to the structure
results, the correct number of clusters, K = 2, was inferred
in all but two cases (Table 1b). However, plotting the model
assignments on a map suggested that structure would not
have been very efficient at pinpointing the precise location of
the barrier (e.g., Fig. 4i), especially given the degree of overlap
between the clusters at the lower levels of IBD (e.g., Fig. 4j).
We also simulated IBD data sets using artificial sampling
locations generated so as to be evenly spread over our study
area. We performed simulations at a density of three individuals
per unit area only. These results confirmed the previous
findings that, at higher levels of IBD, all three programs over-
estimated the number of genetic clusters (Table 2), even while
correctly locating a genetic discontinuity (Fig. 5). However,
comparison of Tables 1 and 2 suggests that the programs
inferred fewer artificial clusters when the study area was
sampled homogenously.
We found evidence for significant isolation by distance when
analysing the truncated empirical data set as a whole (b ± SE =
–0·010 ± 0·001; P < 0·001). The overall magnitudes of popu-
lation differentiation for the empirical clusters were fairly
low (baps: FST ± SE = 0·032 ± 0·004; geneland: FST ± SE =
0·014 ± 0·003; structure: FST ± SE = 0·025 ± 0·004) and,
comparing the degrees of differentiation for dispersal distances
4·4 for density 0·5 and 2·0 for density 3 (comparable slopes),
were similar to the corresponding values of the artificial
clusters inferred in the IBD-only data sets (Supporting Infor-
mation, Table S2). Similarly, the pairwise FST values for the
empirical clusters (Supporting Information, Table S3) were
within the range of values obtain for the pairwise comparisons
of those same IBD-only data sets (Supporting Information,
Table S2).
Discussion
POPULATION GENETIC STRUCTURE
We identified a distinct cluster of individuals located in the
same area where an introduction was suspected. Because the
genetic profiles of most of these individuals contained private
500 A. C. Frantz et al.
© 2009 The Authors. Journal compilation © 2009 British Ecological Society, Journal of Applied Ecology, 46, 493–505
Fig. 4. Examples of individual modal assignment to clusters when analysing simulated data sets with three Bayesian methods. Different symbols
represent different clusters. Figures (a) to (d) show results from analyses of IBD-only data sets (i.e. K = 1), with the data set in the other figures
containing a simulated barrier dissecting the study area approximately at 6°06 (i.e. K = 2). (a) Assignments using the spatial model in baps 4·1·4,
analysing simulated data with a density of 3 individuals unit area–1 and a dispersal of 2·0 units, slope of b = 0·014, optimal K = 3 (b) geneland, density
3, dispersal 2·0, b = 0·012, K = 4 (c) structure, density 0·5, dispersal 1·5, b = 0·079, ΔK = 4 (d) structure, density 0·5, dispersal 4·4,
b = 0·014, ln(X | K) = 3 (e) baps and (f) geneland analysis , density 0·5, dispersal 6·0, b = 0·010, K = 2 (g) baps and (h) geneland analysis,
density 0·5, dispersal 3·9, b = 0·021, optimal K = 4 (i) structure, density 0·5, dispersal 2·0, b = 0·073, ΔK = 2 ( j) structure, density 0·5,
dispersal 6·0, b = 0·010, ln(X | K) = 2.
Genetic structure of a continuous population 501
© 2009 The Authors. Journal compilation © 2009 British Ecological Society, Journal of Applied Ecology, 46, 493–505
alleles and the genetic cluster that they formed was substantially
differentiated from the clusters formed by autochthonous
individuals, we consider it unlikely that it resulted from genetic
drift in an isolated population. We could not assess the
likelihood of population membership here (see for example
Frantz et al. 2006; Frantz & Krier 2007), but consider that
there was, nevertheless, convincing genetic evidence for illegal
introduction. Clandestine translocations by private individuals
will, by their very nature, not follow quarantine guidelines and
need to be prevented.
The Bayesian programs did not converge on the same
clustering solution. When analysing the truncated data set,
spatial baps inferred the presence of four clusters, compared
to three populations inferred by structure and two by
geneland. The issue of the results not agreeing has been
reported fairly frequently for comparisons of structure
and the non-spatial algorithm in baps, with the latter having
a tendency to overestimate genetic structure (e.g., Latch et al.
2006; Rowe & Beebee 2007). baps is based on identifying
populations with different allele frequencies, rather than
partitioning individuals into clusters in Hardy–Weinberg
equilibrium. The non-spatial baps algorithm takes weak
stochastic fluctuations in the allele frequencies as evidence of
genetic structure, allowing the number of clusters to increase
without firm support (Corander, Sirén & Arjas 2008). Empirical
studies appear to confirm that the incorporation of a spatial
prior in baps reduces this bias and generates estimates of
K that are comparable to those obtained with structure
(Frantz et al. 2006; Robinson, Waits & Martin 2007).
Recently, some empirical studies have analysed data using
both structure and geneland. For example, Fontaine et al.
(2007) and Latch et al. (2008) found good congruence between
the two algorithms. Both Rowe & Beebee (2007) and Coulon
et al. (2008) reported that, overall, geneland inferred credible
clustering solutions comparable, but not identical, to structure.
In the latter study, however, the geneland analysis had a high
occurrence of ‘ghost’ populations. The authors, therefore,
based their choice of K on the number of clusters that had
individuals assigned to them in the second round of analyses.
However, the 20 independent runs performed at this second
stage were often inconsistent. Performing more runs and
analysing the outputs following the protocol outlined in
Coulon et al. (2008) might have helped solve this problem.
Few simulation studies have as yet compared the clusters
identified by spatial and non-spatial algorithms. One excep-
tion is the work by Chen et al. (2007). However, these authors
do not provide information on the frequency with which
the tested programs inferred the correct number of clusters.
Table 2. Characteristics of the data sets simulated to consist of genotypes sampled homogenously across the study area and the number o
f
genetic clusters inferred using three Bayesian programs. Simulated data sets were characterized either by isolation by distance alone (one cluster)
or contained one barrier to gene flow (two clusters). We simulated a density of three individuals per unit area. The characteristics of the barrier
are explained in Table 1, as also are the table headings
Dispersal
Isolation by distance only With simulated barrier
structure structure
Slope baps geneland ln(X | K)ΔKSlope FST baps geneland ln(X | K)ΔK
1·5 –0·033 5 6 10 2 –0·024 0·028 4 5 10 2
1·5 –0·035 4 5 10 2 –0·026 0·028 4 5 10 2
1·5 –0·032 5 5 8 2 –0·026 0·026 4 5 9 2
2·0 –0·017 2 2 4 2 –0·018 0·015 3 4 5 2
2·0 –0·019 2 4 5 2 –0·016 0·021 3 3 4 2
2·0 –0·016 2 4 6 2 –0·015 0·022 2 2 4 2
2·5 –0·015 2 2 2 –0·011 0·019 2 2 2
2·5 –0·011 2 2 2 –0·011 0·010 2 2 2
2·5 –0·012 2 2 2 –0·009 0·012 2 2 2
3·0 –0·009 1 2 2 –0·009 0·009 2 2 2
3·0 –0·008 1 2 1 –0·007 0·012 2 2 2
3·0 –0·010 1 2 2 –0·008 0·013 2 2 2
3·9 –0·006 1 1 1 –0·005 0·008 1 2 1
3·9 –0·005 1 1 1 –0·005 0·010 1 2 2
3·9 –0·004 1 1 1 –0·004 0·005 1 1 1
4·4 –0·004 1 1 1 –0·003 0·007 1 2 1
4·4 –0·004 1 1 1 –0·005 0·006 1 1 1
4·4 –0·004 1 1 1 –0·003 0·007 1 2 1
6·0 –0·002 1 1 1 –0·002 0·004 1 1 1
6·0 –0·002 1 1 1 –0·001 0·005 1 1 1
6·0 –0·003 1 1 1 –0·002 0·006 1 1 1
6·8 –0·002 1 1 1 –0·001 0·006 1 1 1
6·8 –0·002 1 1 1 –0·002 0·006 1 1 1
6·8 –0·002 1 1 1 –0·001 0·005 1 1 1
502 A. C. Frantz et al.
© 2009 The Authors. Journal compilation © 2009 British Ecological Society, Journal of Applied Ecology, 46, 493–505
Moreover, Chen et al. simulated genetic clusters without
admixture, used extremely short MCMC runs and did not
simulate panmictic populations to test whether the programs
enforced substructure where it did not exist. Finally, the
maximum number of clusters in the analysis was limited to
six, while the data set comprised five clusters; setting the
maximum number of clusters to a larger value might in fact
have led to the inference of a larger number of clusters.
A certain amount of non-convergence between different
Bayesian clustering methods thus appears to be relatively
normal. Additionally, however, the same algorithm produced
different solutions when analysing our full and the truncated
data sets. Both types of non-convergence created interpreta-
tion problems: barriers identified by one program were
not supported by another. This was the case with the river
Moselle and the motorway to the west of Luxembourg. While
Frantz et al. (2006) found evidence for the river Moselle
acting as a barrier for red deer, the section of the motorway
in question only opened in 1988, which might be too recent
to cause population genetic structure through the effects of
genetic drift and mutation. Finally, there was no obvious
biological explanation for a putative barrier located between
the Luxembourg samples and the Belgian samples located to
the west of Luxembourg. Indeed, both areas are connected by
a fairly extensive network of wildlife corridors (Baghli, Moes
& Walzberg 2007). The FST values obtained for the actual data
were not very informative in deciding whether the inferred clusters
were genuine, as they fitted into the range of values obtained
for the artificial clusters inferred from the IBD data sets.
It is not entirely clear what caused the non-convergence of
clustering methods in our study. One possible explanation is
that programs produced different solutions because of dif-
ferences in the underlying algorithms, as is the case for
structure and baps. Similarly to structure, geneland
assumes HW and linkage equilibria within genetic clusters.
However, while structure was run assuming correlated allele
frequencies, geneland was run assuming independent allele
frequencies (following Guillot et al. 2005). The former model
often improves clustering when populations are closely related,
but can also increase the risk of overestimating K (Falush,
Stephens & Pritchard 2003), which might explain the differ-
ences observed between these two algorithms. Alternatively, it
is possible that all the clusters identified represented genuine
genetic sub-divisions, but that the level of differentiation
between them was too low for the clustering to be consistent
(e.g., Latch et al. 2006). Finally, given the IBD pattern in our
data set, the clustering solution could be an artefact super-
imposed on the data (e.g., Guillot et al. 2005).
Fig. 5. Example of clustering inferred by the spatial algorithm in baps 4·1·4 when analysing a simulated data set with homogenous spatial
sampling. A barrier was simulated that dissected the study area approximately at 6°06 (i.e. correct number of genetic clusters K = 2). We
simulated data at a density of 3 individuals unit area–1 and a dispersal rate of 1·5 units, giving rise to an IBD slope of b = 0·024. baps inferred
the presence of four clusters. Different symbols represent different clusters.
Genetic structure of a continuous population 503
© 2009 The Authors. Journal compilation © 2009 British Ecological Society, Journal of Applied Ecology, 46, 493–505
SIMULATIONS OF ISOLATION-BY-DISTANCE
DATA SETS
The spatial Bayesian clustering methods do not take isolation-
by-distance clines explicitly into account. Rather, by including
a spatial prior, all genetic structures are not a priori equally
likely, but the joint probability that any two individuals belong
to the same cluster decreases with the geographical distance
between them (Guillot et al. 2005; Corander, Sirén & Arjas
2008). Similar to structure, the spatial algorithms still assume
that the inferred population genetic clusters are panmictic
units. When we simulated a weak IBD pattern, the loci did not
deviate substantially from HW proportions and no clusters
were imposed. These results are in line with Guillot et al. (2005),
who stated that, when analysing one or several panmictic
populations, geneland did not enforce spatial substructure
when it did not exist. However, a stronger IBD pattern led
to deviations from random mating in our simulated data. This
probably caused the overestimation of genetic clustering by
the Bayesian programs.
In a recent study, Coulon et al. (2006) used geneland in an
attempt to assess the effect of landscape features on the genetic
structure of roe deer Capreolus capreolus L. While structure
did not find evidence for any sub-structuring, geneland inferred
the presence of two biologically feasible clusters. However, the
degree of genetic differentiation between the two clusters was
very low (FST = 0·008) and the study population was characterized
by an IBD pattern. It is therefore possible that the two clusters
inferred using geneland were artefacts, despite a plausible
biological explanation for their presence.
We confirmed the conclusion of Schwartz & McKelvey (in
press), that the non-spatial structure algorithm can overes-
timate the number of genetic clusters when analysing data sets
characterized by isolation by distance. As in our analyses
presented here, these conclusions were based on a relatively
limited number of simulations. However, the fact that two
independent studies found the software to be similarly biased
reinforces this finding. Schwartz & McKelvey (in press) found
that different sampling schemes (at a local scale) affected the
optimal number of clusters inferred by structure. When
we simulated genetic samples that had a more homogenous
spatial distribution, all three methods superimposed clusters
at higher levels of IBD, showing that overestimation of the
number of genetic clusters was an inherent bias of these
Bayesian programs. However, these biases might be less severe
than when samples are more unevenly distributed. Further
simulations are required to assess the extent of the bias of
spatial Bayesian methods under different sampling schemes
and IBD clines. It would also be desirable to test the per-
formance of the spatial methods when sampling has been
discontinuous along an IBD cline on a larger continental or
global scale (sensu Serre & Pääbo 2004).
A second main conclusion from our simulations was that
the spatial Bayesian programs could identify a simulated
barrier correctly but, at higher levels of IBD, the programs
could erroneously infer the presence of further clusters. In
other words, in some cases sub-structure detected by the spatial
programs could accurately reflect a genetic discontinuity,
while other clusters could be artefacts caused by some other
deviation from random mating. An illustration of this may
have been provided by Zannèse et al. (2006): when analysing
genetic data from roe deer using geneland, the authors
inferred the presence of three genetic clusters in their study
area. Two of these clusters corresponded well to two main popu-
lation units inferred using environmental and morphological
data, while there was no obvious biological explanation for
a smaller third cluster. The authors report a significant global
heterozygosity deficiency in the whole sample, which could be
the result of a Wahlund effect or due to IBD.
We simulated a fairly impermeable barrier that had been
present for 100 generations. Nevertheless, in the data sets
characterized by a density of three individuals per unit
area, the Bayesian programs did not identify a barrier when
simulating large dispersal distances. The question as to how
powerful the various methods are in detecting barriers that
have only been present for shorter periods and/or that are
more permeable requires further investigation. Our limited
simulations suggested that the ΔK-corrected structure results
produced the fewest artificial clusters, and even helped to infer
the correct number of clusters in some data sets simulated to
contain a barrier. However, for both types of simulated data
sets (barrier/no barrier), there was a significant amount of
overlap between the various clusters, which would have made
it difficult to pinpoint the precise location of a barrier and to
distinguish between genuine and artificial clusters. Moreover,
it is unclear whether the ΔK statistic would have inferred the
correct result if more than one barrier had been simulated.
Finally, the ΔK statistic represents the second-order rate of
change of the likelihood function with respect to K, and as
such cannot evaluate K = 1 (Evanno, Regnaut & Goudet 2005).
Further simulations under biologically realistic scenarios are
required to assess the performance of this ad hoc statistic.
Conclusions
We show that Bayesian clustering programs can overestimate
genetic structure in data sets characterized by isolation by
distance. This bias could lead to the erroneous delimitation
of management or conservation units. Simulations suggested
that the strength of IBD in our empirical data set was just high
enough to cause artificial clustering. Some clusters in the
empirical data set could be explained biologically, but there
were inconsistencies between programs. It was also possible
that some clusters were genuine, while others were artefacts.
It was thus not possible to ascertain with confidence whether
the clustering solutions offered by the various programs were
an accurate reflection of population genetic structure in
our empirical data set or artefacts created by an IBD pattern.
Wild boar certainly are very mobile and a study over a similar
scale in Australia did not identify any population structure
(Cowled et al. 2006).
Our results suggest that it is very important to test a data
set for isolation by distance before applying the Bayesian
methods evaluated here. This should give an indication as to
504 A. C. Frantz et al.
© 2009 The Authors. Journal compilation © 2009 British Ecological Society, Journal of Applied Ecology, 46, 493–505
whether the results might be biased and help to assess the
validity of biologically non-feasible clusters. We also strongly
agree with Pearse & Crandall (2004) that different Bayesian
clustering approaches should be used to investigate the spatial
genetic structure in a data set. On the one hand, we did find
that the spatial methods could produce very similar, but wrong,
clustering solutions (compare, for example, Fig. 4g,h), implying
that convergence of results between different methods is no
guarantee that results are correct. On the other hand, we
found that baps and geneland produced different solutions
when analysing the empirical data, and that our conclusions
might have been very different had we only used one program.
The main aim of our simulations was to test whether the
clusters derived from the empirical data could be artefacts.
We therefore performed relatively limited simulations in a
context that was specific in terms of number of microsatellite
loci, samples size, sample location and simulated population
densities. Nevertheless, we believe that our results confirm the
warnings of previous authors that deviations from random
mating that are not caused by genetic discontinuities could
cause Bayesian clustering programs to overestimate genetic
structure. It is apparent that future studies of spatial cluster-
ing need to control for the effect of isolation by distance on
the analyses.
Acknowledgments
We are indebted to Hubertus Becker, Nico Bonanni, Erhard Günter, Dr Dieter
Hoff, Metzgerei Schmitt in Mandern, Andreas Michel, and the local services of
the Ministry of the Walloon Region, General Directorate of Natural Resources
and Environment, Nature and Forest Division (forest districts of Bièvre,
Bertrix, Bouillon, Florenville, Habay-la-Neuve, Laroche, Neufchâteau,
Saint-Hubert, Saint-Vith and Vielsalm) for providing us with tissue samples.
Roger Butlin and various anonymous referees provided helpful comments on
earlier versions of the manuscript.
References
Anonymous (1999) Classical Swine Fever in Wild Boar. Report of the European
Commission, Brussels, Belgium.
Artois, M., Depner, K.R., Guberti, V., Hars, J., Rossi, S. & Rutili, D. (2002)
Classical swine fever (hog cholera) in wild boar in Europe. Revue Scientifique
et Technique de l’Office International des Epizooties, 21, 287–303.
Baghli, A., Moes, M. & Walzberg, C. (2007) Les corridors faunistiques du cerf
(Cervus elaphus L.) au Luxembourg. Bulletin de la Société des Naturalistes
Luxembourgeois, 108, 63–80.
Belkhir, K. (2004) Genetix 4.05.2. University of Montpellier II, Laboratoire
Génome et Populations, Montpellier, France.
Castellano, S. & Balletto, E. (2002) Is the partial Mantel test inadequate?
Evolution, 56, 1871–1873.
Chen, C., Durand, E., Forbes, F. & François, O. (2007) Bayesian clustering
algorithms ascertain spatial population structure: a new computer program
and a comparison study. Molecular Ecology Notes, 7, 747–756.
Corander, J., Sirén, J. & Arjas, E. (2008) Spatial modelling of genetic population
structure. Computational Statistics, 23, 111–129.
Coulon, A., Fitzpatrick, J.W., Bowman, R., Stith, B.M., Makarewich, C.A.,
Stenzler, L.M. & Lovette, I.J. (2008) Congruent population structure inferred
from dispersal behaviour and intensive genetic surveys of the threatened Florida
scrub-jay (Aphelocoma coerulescens). Molecular Ecology, 17, 1685–1701.
Coulon, A., Guillot, G., Cosson, J-F., Angibault, J.M., Aulagnier, S.,
Cargnelutti, B., Galan, M. & Hewison, A.J.M. (2006) Genetic structure is
influenced by landscape features: empirical evidence from a roe deer
population. Molecular Ecology, 15, 1669–1679.
Cowled, B.D., Lapidge, S.J., Hampton, J.O. & Spencer, P.B.S. (2006) Measuring
the demographic and genetic effects of pest control in a highly persecuted
feral pig population. Journal of Wildlife Management, 70, 1690–1697.
Evanno, G., Regnaut, S. & Goudet, J. (2005) Detecting the number of clusters
of individuals using the software STRUCTURE: a simulation study.
Molecular Ecology, 14, 2611–2620.
Falush, D., Stephens, M. & Pritchard, J.K. (2003) Inference of population
structure using multilocus genotype data: linked loci and correlated allele
frequencies. Genetics, 164, 1567–1587.
Felsenstein, J. (2005) PHYLIP, version 3.66. Department of Genome Sciences,
University of Washington, Seattle, WA, USA.
Fontaine, M.C., Baird, S.J.E., Piry, S., Ray, N., Tolley, K.A., Duke, S., Birkun, A.,
Ferreira, M., Jauniaux, T., Llavona, A., Ozturk, B., Ozturk, A.A., Ridoux, V.,
Rogan, E., Sequeira, M., Siebert, U., Vikingsson, G.A., Bouquegneau, J.M.
& Michaux, J.R. (2007) Rise of oceanographic barriers in continuous
populations of a cetacean: the genetic structure of harbour porpoises in Old
World waters. BMC Biology, 5, Article no. 30.
Frantz, A.C. & Krier, A. (2007) Further evidence for illegal translocation of red
deer (Cervus elaphus) in Luxembourg. Beiträge zur Jagd- und Wildforschung,
32, 339–344.
Frantz, A.C., Tigel Pourtois, J., Heuertz, M., Schley, L., Flamand, M.C., Krier, A.,
Bertouille, S., Chaumont, F. & Burke, T. (2006) Genetic structure and
assignment tests demonstrate illegal translocation of red deer (Cervus
elaphus) into a continuous population. Molecular Ecology, 15, 3191–
3203.
Guillot, G., Estoup, A., Mortier, F. & Cosson, J.F. (2005) A spatial statistical
model for landscape genetics. Genetics, 170, 1261–1280.
Guillot, G., Mortier, F. & Estoup, A. (2005) Geneland: a computer package for
landscape genetics. Molecular Ecology Notes, 5, 712–715.
Guo, S.W. & Thompson, E.A. (1992) Performing the exact tests of Hardy-
Weinberg proportion for multiple alleles. Biometrics, 48, 361–372.
Hampton, J.O., Spencer, P.B.S., Alpers, D.L., Twigg, L.E., Woolnough, A.P.,
Doust, J., Higgs, T. & Pluske, J. (2004) Molecular techniques, wildlife
management and the importance of genetic population structure and
dispersal: a case study with feral pigs. Journal of Applied Ecology, 41, 735–
743.
Hardy, O. & Vekemans, X. (2002) SPAGeDi: a versatile computer program to
analyse spatial genetic structure at the individual or population levels.
Molecular Ecology Notes, 2, 618–620.
Ihaka, R. & Gentleman, R. (1996) R: a language for data analysis and graphics.
Journal of Computational and Graphical Statistics, 5, 299–314.
Latch, E.K., Dharmarajan, G., Glaubitz, J.C. & Rhodes, O.E. (2006) Relative
performance of Bayesian clustering software for inferring population
substructure and individual assignment at low levels of population differentia-
tion. Conservation Genetics, 7, 295–302.
Latch, E.K., Scognamillo, D.G., Fike, J.A., Chamberlain, M.J. & Rhodes, O.E.
(2008) Deciphering ecological barriers to North American river otter
(Lontra canadensis) gene flow in the Louisiana landscape. Journal of Heredity,
99, 265–274.
Nei, M. (1978) Estimation of average heterozygosity and genetic distance from
a small number of individuals. Genetics, 89, 583–590.
Pearse, D.E. & Crandall, K.A. (2004) Beyond FST: analysis of population
genetic data for conservation. Conservation Genetics, 5, 585–602.
Perrin, D., Temmerman, M. & Laitat, E. (2000) Calculation on the impacts of
forestation, afforestation and reforestation on the C-sequestration potential
in Belgian forests ecosystems. Biotechnology, Agronomy, Society and
Environment, 4, 259–262.
Pritchard, J.K., Stephens, M. & Donnelly, P. (2000) Inference of population
structure using multilocus genotype data. Genetics, 155, 945–959.
Raufaste, N. & Rousset, F. (2001) Are partial Mantel tests adequate? Evolution,
55, 1703–1705.
Raymond, M. & Rousset, F. (1995) GENEPOP (version 1.2): population genetics
software for exact tests and ecumenicism. Journal of Heredity, 86, 248–249.
Rice, W.R. (1989) Analysing tables of statistical tests. Evolution, 43, 223–225.
Robinson, S.J., Waits, L.P. & Martin, I.D. (2007) Evaluating population struc-
ture of black bears on the Kenai peninsula using mitochondrial and nuclear
DNA analyses. Journal of Mammalogy, 88, 1288–1299.
Rondeux, J. (2006) Der Luxemburger Wald in Zahlen. Ergebnisse der Luxemburger
Landeswaldinventur 1998–2000. Forstverwaltung des Großherzogtums
Luxemburg, Luxemburg.
Rosenberg, N.A., Mahajan, S., Ramachandran, S., Zhao, C., Pritchard, J.K. &
Feldman, M.W. (2005) Clines, clusters, and the effect of study design on the
inference of human population structure. Public Library of Science Genetics,
1, 660–671.
Rousset, F. (2002) Partial Mantel tests: reply to Castellano and Balletto.
Evolution, 56, 1874–1875.
Rowe, G. & Beebee, T.J.C. (2007) Defining population boundaries: use of three
Bayesian approaches with microsatellite data from British natterjack toads
(Bufo calamita). Molecular Ecology, 16, 785–796.
Genetic structure of a continuous population 505
© 2009 The Authors. Journal compilation © 2009 British Ecological Society, Journal of Applied Ecology, 46, 493–505
Schley, L., Dufrêne, M., Krier, A. & Frantz, A.C. (2008) Patterns of crop
damage by wild boar (Sus scrofa) over a 10-year period. European Journal of
Wildlife Research, 54, 589–599.
Schoos, J. (2002) Bekämpfungsplan der europäischen Schweinepest beim
Wildschwein. Konzept für Luxemburg. Bulletin de la Société des sciences
médicales du Grand-Duché de Luxembourg, N°2/2002, 171–191.
Schwartz, M.K. & McKelvey, K. (2008) Why sampling scheme matters: the
effect of sampling scheme on landscape genetic results. Conservation Genetics,
in press. DOI: 10.1007/s10592-008-9622-1.
Serre, D. & Pääbo, S. (2004) Evidence for gradients of human genetic diversity
within and among continents. Genome Research, 14, 1679–1685.
Smouse, P.E., Long, J. & Sokal, R.R. (1986) Multiple regression and correla-
tion extensions of the Mantel test of matrix correspondence. Systematic
Zoology, 35, 627–632.
Truvé, J. & Lemel, J. (2003) Timing and distance of natal dispersal for wild boar
Sus scrofa in Sweden. Wildlife Biology, 9 (Supplement 1), 51–57.
Vekemans, X. & Hardy, O.J. (2004) New insights from fine-scale spatial
genetic structure analyses in plant populations. Molecular Ecology, 13, 921–
935.
von Rüden, S.M. (2006) Zur Bekämpfung der Klassischen Schweinepest bei
Schwarzwild – Retrospektive Analyse eines Seuchengeschehens in Rheinland-Pfalz.
DVM Thesis, Tierärztliche Hochschule Hannover.
Weir, B.S. & Cockerham, C.C. (1984) Estimating F-statistics for the analysis of
population structure. Evolution, 38, 1358–1370.
Whitlock, R., Hipperson, H., Mannarelli, M. & Burke, T. (2008) A high-
throughput protocol for extracting high-purity genomic DNA from plants
and animals. Molecular Ecology Resources, 8, 736–741.
Wilkins, J.F. (2004) A separation-of-timescales approach to the coalescent in a
continuous population. Genetics, 168, 2227–2244.
Zannèse, A., Morellet, N., Targhetta, C., Coulon, A., Fuser, S., Hewison, A.J.M.
& Ramanzin, M. (2006) Spatial structure of roe deer populations: towards
defining management units at a landscape scale. Journal of Applied Ecology,
43, 1087–1097.
Received 16 August 2008; accepted 15 December 2008
Handling Editor: E J Milner-Gulland
Supporting Information
Additional Supporting Information may be found in the
online version of this article:
Fig. S1. Inference of the number of genetic clusters in the
study area: posterior distribution of the number of popula-
tions estimated using geneland and mean of log-likelihood
values obtained using the program structure.
Fig. S2. Map of individual structure assignments at K = 3.
Appendix S1. Details of the polymerase chain reaction
conditions
Table S1. Properties of the microsatellite loci used in this study
Table S2. Genetic differentiation between clusters inferred from
IBD-only simulated data sets with IBD levels comparable to
the empirical data set
Table S3. Pairwise FST values for clusters obtained when
analysing the empirical data set
Please note: Wiley-Blackwell are not responsible for the con-
tent or functionality of any supporting materials supplied by
the authors. Any queries (other than missing material) should
be directed to the corresponding author for the article.
... A potential methodological limit is that the low estimated genetic structure and IBD could be associated with high effective population size. Indeed, very high effective population size may hide the signal of clustering, IBD, or landscape effects on gene flow (Frantz et al., 2009;Gauffre et al., 2008). Although simulations could help us to better understand whether we could detect the actual signals of structure or isolation (Frantz et al., 2009;Gauffre et al., 2008;Landguth et al., 2010), information about population dynamics may not be easily extracted from microsatellite data for nonmodel insect species with potentially very large population sizes such as some hoverfly species. ...
... Indeed, very high effective population size may hide the signal of clustering, IBD, or landscape effects on gene flow (Frantz et al., 2009;Gauffre et al., 2008). Although simulations could help us to better understand whether we could detect the actual signals of structure or isolation (Frantz et al., 2009;Gauffre et al., 2008;Landguth et al., 2010), information about population dynamics may not be easily extracted from microsatellite data for nonmodel insect species with potentially very large population sizes such as some hoverfly species. In other words, while there may be an effect of the landscape on the stratification and connectivity of hoverfly populations, their high effective population size could be too large for genetic drift to have a detectable effect using current methods. ...
Article
Full-text available
Hoverflies (Syrphidae) are essential pollinators, and their severe decline jeopardizes their invaluable contribution to plant diversity and agricultural production. However, we know little about the dispersal abilities of hoverflies in urbanized landscapes, limiting our understanding of the spatiotemporal dynamics of plant-pollinator systems and reducing our ability to preserve biodiversity in the context of global change. Previous work has not addressed how urbanization affects the functional connectivity of hoverflies and whether dispersal is a limiting factor in their population dynamics. In this study, we investigated the spatial genetic structure of two nonmigratory species of hoverflies in two urban areas. We collected more than a thousand specimens of each Syritta pipiens and Myathropa florea, each, by hand netting in two western European urbanized study areas of 490 and 460 km 2 in 2021, and we genotyped them at 14 and 24 microsatellite loci, respectively. Based on spatial and nonspatial Bayesian clustering methods, we failed to reject the null hypothesis of panmixia, suggesting that both species exhibited high genetic connectivity despite urbanization. The distribution of allele frequencies was not correlated with geographic distance, implying that isolation-by-distance was negligible at the investigated spatial scale in both species. Although anthropogenic land cover changes generally have dramatic consequences on biodiversity, these hoverfly species retain high connectivity, which suggests that dispersal is not a strong limiting factor in their metapopulational dynamics. However, high effective population size and its confounding effect on signals of genetic drift may limit our ability to conclude confidently about landscape effects on gene flow in those two species. Provided we maintain or restore habitat, recolonization may be prompt even in urban areas.
... Clustering solutions can be sensitive to model parameters and length of the Markov chain Monte Carlo runs (Wang 2017). For instance, Bayesian clustering programs can overestimate genetic structure in data sets characterized by isolation by distance, such that artificial clustering is generated (Frantz et al. 2009). Therefore, we also ran spatial analyses of principal components (sPCA), which make no assumptions about an underlying population model, and unlike clustering analyses incorporate information regarding the spatial location of the samples ). ...
... Bayesian clustering methods can in some cases overestimate genetic structure when individualbased data are characterized by isolation by distance (Frantz et al. 2009), and our study presented a sampling gap in southern Sinaloa that may have caused the cluster analysis to overestimate the degree of clustering within our sample set. However, our sPCA analysis, which did not make any assumptions about an underlying population model, provided support for the existence of genetic structure in the form of isolation by distance, whereby samples assigned to a northern genetic grouping were clearly separated from samples assigned to a southern genetic grouping. ...
Technical Report
Full-text available
A report submitted by AZGFD to the USFWS when the cactus ferruginous pygmy-owl (Glaucidium brasilianum cactorum) was being reconsidered for protection under the Endangered Species Act. We collected blood samples in the southern portion of the owl's range in Mexico, and analyzed them in combination with samples previously collected in Sonora and Sinaloa. We used 11 microsatellite loci to investigate fine-scale population structure and genetic diversity of cactus ferruginous pygmy-owls in western Mexico. We found a pattern of isolation by distance, whereby gradual genetic differentiation occurs from the northern end of our study area (Sonora) to the southern end (Jalisco-Colima).
... Dry shrub forests separate these three protected areas and hence show minimal admixture in individuals. Sometimes the STRU CTU RE analysis may cluster individuals in unpredictable ways (Kalinowski 2010) and may have no biological reality which tends to overestimate genetic structure (Frantz et al. 2009;Benestan et al. 2016). The NJ tree and PCA analyses supported Bayesian genetic groups of P. emblica populations. ...
Article
Full-text available
Phyllanthus emblica is a well-known medicinal and non-timber forest product species, widely distributed in the Indian subcontinent. Multiple disturbances like intensive fruit harvest, the spread of invasive species such as Lantana camara and Taxillus tomentosus, and other anthropogenic disturbances threaten population viability by altering ecological and genetic processes. Studying the genetic diversity and population structure of species harvested intensively and subjected to anthropogenic disturbances is crucial for evaluating their ability to survive under future environmental changes and for establishing conservation strategies. The genetic diversity and population structure of twelve populations of P. emblica that are harvested across three protected areas of the Western Ghats, the world’s most densely populated biodiversity hotspot was evaluated. Three hundred sixty samples were genotyped with nine simple sequence repeat markers. The changes in genetic diversity and genetic structure were assessed between generations by analyzing adults, seedlings, and juvenile samples. Despite intensive harvesting, the results found high genetic diversity in all the populations (mean/pop: Ho = 0.626; Hs = 0.722). However, genetic differentiation was significant between the study regions as well as between adult and seedling populations. The study also indicated a clear clustering of the twelve populations into three distinct genetic clusters. Neighbor-joining tree and hierarchical clustering analysis also showed the same pattern. The genetic data from the study provide information on how local disturbances including harvesting affect the population's genetic diversity and structure, which can provide a basis for implementing programs for conserving and sustainable utilization of P. emblica genetic resources in the future.
... Previous research in capirona 48 , carrot 49,50 and maize 20 found a false highest peak at K = 2 in population structure analysis as the null hypothesis of no structure (K = 1) was strongly rejected. In addition, Waples and Gaggiotti 51 , Frantz et al. 52 and Janes et al. 53 indicated that the Evanno method tends to underestimate the number of genetic clusters. Hence, it is very likely the second highest peak obtained with our dataset of 14,235 SNPs (K = 4) is caused by a strong rejection of the hypothesis of three clusters only. ...
Preprint
Full-text available
Peruvian maize exhibits abundant morphological diversity, with landraces cultivated from sea level (sl) up to 3,500 m above sl. Previous research based on morphological descriptors, defined at least 52 Peruvian maize races, but its genetic diversity and population structure remains largely unknown. Here we used genotyping-by-sequencing (GBS) to obtain single nucleotide polymorphisms (SNPs) that allow inferring the genetic structure and diversity of 423 maize accessions from the genebank of Universidad Nacional Agraria la Molina (UNALM) and Universidad Nacional Autónoma de Tayacaja (UNAT). These accessions represent nine races and one sub-race, along with 15 open-pollinated lines (purple corn) and two yellow maize hybrids. It was possible to obtain 14,235 high-quality SNPs distributed along the 10 maize chromosomes of maize. Gene diversity ranged from 0.33 (sub-race Pachia) to 0.362 (race Ancashino), with race Cusco showing the lowest inbreeding coefficient (0.205) and Ancashino the highest (0.274) for the landraces. Population divergence (FST) was very low (mean = 0.017), thus depicting extensive interbreeding among Peruvian maize. Population structure analysis indicated that these 423 distinct genotypes can be included in 10 groups, with some maize races clustering together. Peruvian maize races failed to be recovered as monophyletic; instead, our phylogenetic tree identified two clades corresponding to the groups of the classification of the races of Peruvian maize based on their chronological origin, i.e., anciently derived or primary races and lately derived or secondary races. Additionally, these two clades are also congruent with the geographic origin of these maize races, reflecting their mixed evolutionary backgrounds and constant evolution. Peruvian maize germplasm needs further investigation with modern technologies to better use them massively in breeding programs that favor agriculture mainly in the South American highlands. We also expect this work will pave a path for establishing more accurate conservation strategies for this precious crop genetic resource.
... as those implemented in Structure (Meirmans, 2012;Schwartz & McKelvey, 2009). When underlying patterns of isolation by distance exist, some have argued that uneven sampling can lead to spurious genetic clusters (Turbek et al., 2023), whereas others have suggested that even sampling may introduce more bias in the estimation of genetic clusters (Frantz et al., 2009). We lacked fine-scale locational data for harvest samples and were therefore unable to fully characterize the evenness of our sampling or explicitly investigate patterns of isolation by distance. ...
Article
Full-text available
Delineating wildlife population boundaries is important for effective population monitoring and management. The bobcat (Lynx rufus) is a highly mobile generalist carnivore that is ecologically and economically important. We sampled 1225 bobcats harvested in South Dakota, USA (2014–2019), of which 878 were retained to assess genetic diversity and infer population genetic structure using 17 microsatellite loci. We assigned individuals to genetic clusters (K) using spatial and nonspatial Bayesian clustering algorithms and quantified differentiation (FST and GST″$$ {G}_{\mathrm{ST}}^{{\prime\prime} } $$) among clusters. We found support for population genetic structure at K = 2 and K = 4, with pairwise FST and GST″$$ {G}_{\mathrm{ST}}^{{\prime\prime} } $$ values indicating weak to moderate differentiation, respectively, among clusters. For K = 2, eastern and western clusters aligned closely with historical bobcat management units and were consistent with a longitudinal suture zone for bobcats previously identified in the Great Plains. We did not observe patterns of population genetic structure aligning with major rivers or highways. Genetic divergence observed at K = 4 aligned roughly with ecoregion breaks and may be associated with environmental gradients, but additional sampling with more precise locational data may be necessary to validate these patterns. Our findings reveal that cryptic population structure may occur in highly mobile and broadly distributed generalist carnivores, highlighting the importance of considering population structure when establishing population monitoring programs or harvest regulations. Our study further demonstrates that for elusive furbearers, harvest can provide an efficient, broad‐scale sampling approach for genetic population assessments.
... Given the large area covered in our study, we were concerned about possible isolation by distance (IBD) patterns in genetic variability creating spurious clusters (Frantz et al., 2009;Schwartz & McKelvey, 2009). To evaluate this possibility, we reran STRUCTURE with our subsample of two deer per sex per county but excluding samples from extreme west and east locations to retain about 60% of the 750 individuals in the subsample. ...
Article
Full-text available
Chronic wasting disease (CWD) can spread among cervids by direct and indirect transmission, the former being more likely in emerging areas. Identifying subpopulations allows the delineation of focal areas to target for intervention. We aimed to assess the population structure of white‐tailed deer (Odocoileus virginianus) in the northeastern United States at a regional scale to inform managers regarding gene flow throughout the region. We genotyped 10 microsatellites in 5701 wild deer samples from Maryland, New York, Ohio, Pennsylvania, and Virginia. We evaluated the distribution of genetic variability through spatial principal component analysis and inferred genetic structure using non‐spatial and spatial Bayesian clustering algorithms (BCAs). We simulated populations representing each inferred wild cluster, wild deer in each state and each physiographic province, total wild population, and a captive population. We conducted genetic assignment tests using these potential sources, calculating the probability of samples being correctly assigned to their origin. Non‐spatial BCA identified two clusters across the region, while spatial BCA suggested a maximum of nine clusters. Assignment tests correctly placed deer into captive or wild origin in most cases (94%), as previously reported, but performance varied when assigning wild deer to more specific origins. Assignments to clusters inferred via non‐spatial BCA performed well, but efficiency was greatly reduced when assigning samples to clusters inferred via spatial BCA. Differences between spatial BCA clusters are not strong enough to make assignment tests a reliable method for inferring the geographic origin of deer using 10 microsatellites. However, the genetic distinction between clusters may indicate natural and anthropogenic barriers of interest for management.
... However, the sub-structuring of wild boar populations in eastern Germany may not only be the result of real barriers, but also of gradients in landscape resistance, and most importantly, isolation by distance (Cushman et al. 2006). Bayesian clustering methods generally tend to overestimate the number of clusters in the presence of isolation by distance (Frantz et al. 2009;Safner et al. 2011). Structuring of genetic diversity due to isolation by distance at the largest scale can be detected by the Mantel test. ...
Article
Full-text available
In the European Union, African swine fever (ASF) affects wild boar (Sus scrofa) populations in several Member States. Knowledge of population connectivity is important for the implementation of control measures, in particular the establishment of effective barriers. Population genetic comparisons of neighbouring populations can be very helpful in this respect. The present study investigated the genetic differentiation of wild boar in eastern Germany. This region has been affected by ASF since September 2020. A total of 1,262 wild boars from 31 hunting grounds (populations) in ASF-affected and ASF-free districts were sampled over a total area of almost 100,000 km². The study area encompassed a network of geographical factors that promote (roads, rivers, cities) or inhibit (natural areas, habitat corridors) genetic differentiation between wild boar populations. The genetic differentiation of the areas was based on 12 microsatellite markers. Three different Bayesian algorithms were used to analyse the data. The results were combined into a common approach with 9 clusters. Based on the cluster distribution in each population, the connectivity between the areas was quantified. The strongest differentiation was found along an imaginary line along the lower Elbe valley through Berlin and the A11 freeway to the Szczecin Lagoon. In contrast, the Mecklenburg Lake District and the south-east of the study area showed strong connectivity between areas. The special features of the landscapes along the lower Elbe valley, which was assessed as highly connective, and the high barrier effect of the A11 freeway in contrast to the other freeways in the study area show that barrier effects cannot be generalised in principle, but are actually determined by the circumstances of individual structures. The results of the connectivity analysis were compared with the distribution of viral lineages and variants. The genotypes of the wild boar populations and the ASFV lineages and variants showed a good explanatory approach for the observed disease dynamics in the study area. The newly gained knowledge on barriers and regionally different connectivity between wild boar populations can support considerations and measures for the containment of ASF in the affected areas by improving the understanding of wild boar dispersal dynamics.
Article
Full-text available
Characterizing spatial patterns in allele frequencies is fundamental to evolutionary biology because these patterns contain evidence of underlying processes. However, the spatial scales at which gene flow, changing selection, and drift act are often unknown. Many of these processes can operate inconsistently across space, causing nonstationary patterns. We present a wavelet approach to characterize spatial pattern in allele frequency that helps solve these problems. We show how our approach can characterize spatial patterns in relatedness at multiple spatial scales, i.e. a multilocus wavelet genetic dissimilarity. We also develop wavelet tests of spatial differentiation in allele frequency and quantitative trait loci (QTL). With simulation, we illustrate these methods under different scenarios. We also apply our approach to natural populations of Arabidopsis thaliana to characterize population structure and identify locally adapted loci across scales. We find, for example, that Arabidopsis flowering time QTL show significantly elevated genetic differentiation at 300–1,300 km scales. Wavelet transforms of allele frequencies offer a flexible way to reveal geographic patterns and underlying evolutionary processes.
Preprint
Full-text available
Although GWAS has been a key technology to identify causal genes, the current standard GWAS model still has problems that need to be solved. Among them, the population structure is one of the most severe problems when detecting QTLs in GWAS since the GWAS model is statistically confounded by effects derived from the population structure. Further, the existence of QTLs, whose effects depend on the genetic background, also affects the conventional GWAS results by causing many false positives. Although the model to detect these population-specific QTLs has already been developed, this model requires prior information on the population structure, which may only sometimes be available. Also, the previous model only assumed the situation where QTLs interact with the discrete population structure. However, target populations of GWAS often consist of genetic resources with a more continuous population structure, and there has been no model that can consider such QTLs interacting with the continuous structure. In this study, by explicitly including an interaction term between a SNP/haplotype block and the genetic background in the conventional SNP-based/haplotype block-based GWAS model, we developed two models, named SNPxGB and HBxGB, that can detect QTLs interacting with the discrete and continuous structure. Our developed models were compared to the previous models by a simulation study assuming some types of QTLs, i.e., QTLs with effects common to all the backgrounds, specific to one genetic background, and interacting with polygenes. The simulation study showed that the models assuming the same situation as the simulation settings for each QTL type were suitable for detecting the corresponding QTLs. Primarily, our second HBxGB model could detect QTLs interacting with polygenes, i.e., continuous population structure, better than the previous model utilizing the prior population structure information. Our developed models are expected to help unravel the unknown genetic architecture of many complex traits. Author summary GWAS aims at detecting candidate genes associated with a target trait via statistical testing. Since a classical GWAS starts with the constitution of a panel of individuals, usually gathered from different populations, many methods have been proposed to control the false positives in large datasets with a strong population structure. However, most methods assume the same QTL effect across populations, which is not always true in the natural biological process. One study has proposed a method to consider population-specific QTL effects by assuming marker effects depend on each subpopulation with prior information on population membership for each individual. This information on the population structure, however, may only sometimes be available, and sometimes the population structure is more continuous rather than discrete, where their methodology cannot be applied. We successfully developed two novel models that do not require prior knowledge of the population structure by explicitly including an interaction term between a SNP/haplotype block of interest and the genetic background in the conventional SNP-based/haplotype block-based GWAS model. The developed models, named SNPxGB and HBxGB, were suitable for capturing gene effects interacting with the discrete and continuous population structure, leading to the clarification of the genetic architecture of complex traits.
Article
Full-text available
Molecular forensic methods are being increasingly used to help enforce wildlife conservation laws. Using multilocus genotyping, illegal translocation of an animal can be demonstrated by excluding all potential source populations as an individual's population of origin. Here, we illustrate how this approach can be applied to a large continuous population by defining the population genetic structure and excluding suspect animals from each identified cluster. We aimed to test the hypothesis that recreational hunters had illegally introduced a group of red deer into a hunting area in Luxembourg. Reference samples were collected over a large area in order to test the possibility that the suspect individuals might be recent immigrants. Due to isolation-by-distance relationships in the data set, inferring the number of genetic clusters using Bayesian methods was not straightforward. Biologically meaningful clusters were only obtained by simultaneously analysing spatial and genetic information using the program baps 4.1. We inferred the presence of three genetic clusters in the study region. Using partial Mantel tests, we detected barriers to gene flow other than distance, probably created by a combination of urban areas, motorways and a river valley used for viticulture. The four focal animals could be excluded with a high certainty from the three genetic subpopulations and it was therefore likely that they had been released illegally.
Article
We describe extensions to the method of Pritchard et al. for inferring population structure from multilocus genotype data. Most importantly, we develop methods that allow for linkage between loci. The new model accounts for the correlations between linked loci that arise in admixed populations (“admixture linkage disequilibium”). This modification has several advantages, allowing (1) detection of admixture events farther back into the past, (2) inference of the population of origin of chromosomal regions, and (3) more accurate estimates of statistical uncertainty when linked loci are used. It is also of potential use for admixture mapping. In addition, we describe a new prior model for the allele frequencies within each population, which allows identification of subtle population subdivisions that were not detectable using the existing method. We present results applying the new methods to study admixture in African-Americans, recombination in Helicobacter pylori, and drift in populations of Drosophila melanogaster. The methods are implemented in a program, structure, version 2.0, which is available at http://pritch.bsd.uchicago.edu.
Article
We describe a model-based clustering method for using multilocus genotype data to infer population structure and assign individuals to populations. We assume a model in which there are K populations (where K may be unknown), each of which is characterized by a set of allele frequencies at each locus. Individuals in the sample are assigned (probabilistically) to populations, or jointly to two or more populations if their genotypes indicate that they are admixed. Our model does not assume a particular mutation process, and it can be applied to most of the commonly used genetic markers, provided that they are not closely linked. Applications of our method include demonstrating the presence of population structure, assigning individuals to populations, studying hybrid zones, and identifying migrants and admixed individuals. We show that the method can produce highly accurate assignments using modest numbers of loci—e.g., seven microsatellite loci in an example using genotype data from an endangered bird species. The software used for this article is available from http://www.stats.ox.ac.uk/~pritch/home.html.
Article
Many empirical studies have assessed fine-scale spatial genetic structure (SGS), i.e. the nonrandom spatial distribution of genotypes, within plant populations using genetic markers and spatial autocorrelation techniques. These studies mostly provided qualitative descriptions of SGS, rendering quantitative comparisons among studies difficult. The theory of isolation by distance can predict the pattern of SGS under limited gene dispersal, suggesting new approaches, based on the relationship between pairwise relatedness coefficients and the spatial distance between individuals, to quantify SGS and infer gene dispersal parameters. Here we review the theory underlying such methods and discuss issues about their application to plant populations, such as the choice of the relatedness statistics, the sampling scheme to adopt, the procedure to test SGS, and the interpretation of spatial autocorrelograms. We propose to quantify SGS by an ' Sp ' statistic primarily dependent upon the rate of decrease of pairwise kinship coefficients between individuals with the logarithm of the distance in two dimensions. Under certain conditions, this statistic estimates the reciprocal of the neighbourhood size. Reanalysing data from, mostly, published studies, the Sp statistic was assessed for 47 plant species. It was found to be significantly related to the mating system (higher in selfing species) and to the life form (higher in herbs than trees), as well as to the population density (higher under low density). We discuss the necessity for comparing SGS with direct estimates of gene dispersal distances, and show how the approach presented can be extended to assess (i) the level of biparental inbreeding, and (ii) the kurtosis of the gene dispersal distribution.