ArticlePDF Available

Computer note. BOTTLENECK: a computer program for detecting recent reductions in the effective size using allele frequency data

Authors:
499
Computer Notes
Simulation of Effects of
Dominance on Estimates of
Population Genetic Diversity
and Differentiation
K. V. Krutovskii, S. Y. Erofeeva,
J. E. Aagaard, and S. H. Strauss
The advent of PCR-based molecular mark-
ers has led to a rapid expansion in studies
describing the levels and distribution of
genetic variation among populations at
the DNA level. Randomly amplified poly-
morphic DNA (RAPD; Williams et al. 1990)
and amplified fragment length polymor-
phism (AFLP; Vos et al. 1995) markers are
now commonly used in population genetic
studies (e.g., Aagaard et al. 1998; Isabel et
al. 1995; Liu and Furnier 1993; Mosseler et
al. 1992; Peakall et al. 1995; Szmidt et al.
1996; Travis et al. 1996; Wu et al., in press).
However, these PCR-based markers have
limitations compared to allozymes, which
had been the prevalent means for popu-
lation studies prior to the use of PCR. At
the majority of RAPD and AFLP loci the
dominant allele masks the presence of the
null allele in heterozygotes when assaying
diploid tissues (e.g., about 97%–98%; Kru-
tovskii et al. 1998), thus sampling variance
for dominant allele frequencies is typically
greater than that for codominant alleles
(Lynch and Milligan 1994). The frequen-
cies of null and dominant alleles are
inferred from the frequency of null allele
homozygotes; the precision of their esti-
mation thus depends on mating system as-
sumptions and is strongly affected by the
sample size. Empirical studies have also
suggested that dominant markers can bias
estimates of genetic diversity and differ-
entiation among populations (e.g., Isabel
et al. 1995; Szmidt et al. 1996).
Although RAPD markers have proved to
be useful for population studies, and their
gross patterns of diversity usually agree
with that of allozymes, the levels of genet-
ic variation, differentiation, and fine-scale
genetic structures often differ (e.g., Baruffi
et al. 1995; Dawson et al. 1996; Heun et al.
1994; Lanne´r-Herrera et al. 1996; Latta and
Mitton 1997; le Corre et al. 1997; Liu and
Furnier 1993; Peakall et al. 1995; Puterka
et al. 1993). To help assess whether these
differences are biological or a simple con-
sequence of the dominance and biallelism
of RAPD and AFLP markers, we developed
a dominance simulation program, DOM-
SIM, that transforms codominant popula-
tion data into a biallelic dominant dataset.
The program then estimates population
genetic statistics with which dominant
and codominant markers can be directly
compared. We use data from a widespread
North American conifer, Douglas-fir [Pseu-
dotsuga menziesii (Mirb.) Franco], and
three California closed-cone pine species
to illustrate the program’s function. The
test simulation suggests that dominant
biallelic markers, such as RAPDs, can
strongly underestimate population diver-
sity but can still reasonably estimate pop-
ulation differentiation (G
ST
), if sample
sizes are larger than about 30 individuals.
Program Functions
The program DOMSIM uses multiallelic da-
tasets with a maximum number of six al-
leles per locus for which population allele
frequencies are defined. Assuming Hardy–
Weinberg equilibrium and no linkage
among loci, the program generates Nbasic
populations (N
max
5
20) of up to 1,000 in-
dividuals each with multilocus genotypes
that maintain the specified allele frequen-
cies within populations. A total of Ssub-
populations (S
max
5
400) of nindividuals
(n
5
10–200) are then drawn with replace-
ment for each of the Npopulations. The
sampling is done in two different ways: by
sampling subpopulations of size nwith
replacement directly from the initially gen-
erated basic population, and by resam-
pling subpopulations of size nwith
replacement within the first sampled sub-
population of nindividuals ( bootstrap re-
sampling). Population genetic parameters
(H
S
,H
T
, and G
ST
) are calculated for each
cycle of resampling in three ways. First,
for a codominant dataset, calculations are
made considering all alleles and geno-
types present in the subpopulations. Sec-
ond, the same subpopulations and data
are used to simulate a dominant biallelic
dataset by randomly selecting one allele
as dominant, with the rest treated as re-
cessive to it. The synthetic null allele fre-
quency is then calculated from the null
homozygote frequencies assuming Hardy–
Weinberg equilibrium. Average parame-
ters and their variance are calculated for
each set of Ssubpopulations. Gene diver-
sity is evaluated using H
S
and H
T
, either
unmodified (Nei 1973) or modified ( Nei
and Chesser 1983) for the sample size. Ge-
netic differentiation is evaluated via
u
w
(Weir and Cockerham 1984) and G
ST
pa-
rameters that are either unmodified ( Nei
1973), modified for the sample size (Nei
and Chesser 1983), or modified for both
the sample size and population number
(Nei 1986). Finally, null allele frequencies
are corrected for dominance using Lynch
and Milligan’s (1994) equation 2a, and
their asymptotically unbiased estimate of
F
ST
recommended for dominant markers is
also calculated following equation 14a.
Installing and Running the Program
The program DOMSIM is written in FOR-
TRAN-77 (simulation routines) and in Lab-
Windows CVI (interface routines). The
source code file, domsimd.f, was compiled
using Microsoft FORTRAN Power Station
Compiler version 1.0. DOMSIM runs on
IBM PCs and compatibles under MS Win-
dows 95 and NT for 32-bit operating envi-
ronments. To install the program run the
compressed self-extracting file dom-
simpr.exe which can be downloaded from
the web site http://www.fsl.orst.edu/tgerc/
protocol.htm. It will automatically decom-
500
The Journal of Heredity 1999:90(4)
Figure 1. Levels of diversity and differentiation for codominant, multiallelic allozymes versus biallelic, dominant
markers, as simulated from an allozyme dataset from Douglas-fir studied with varying sample sizes. Standard
deviations (error bars) were calculated from the variance among 400 bootstrap subsamples and represent the
variance due to resampling of individuals at each level of sampling from the master population of 1,000individuals.
The arrow shows the population sample size between 30 and 40 needed to eliminate the tendency for overesti-
mation of population differentiation caused by dominance and biallelism.
press five files domsimd.f, domsim.001,
domsim.002, read.me, and setup.exe. Next,
run the setup file and follow the instruc-
tions on your screen during installation.
Run the program by either clicking the
icon or executing the program file dom-
sim.exe. A read.me file contains additional
instructions for installing and running the
program.
Input and Output Files
The input format is an ASCII file similar to
GeneStat input files (Lewis 1994), but
does not require population, locus, and al-
lele names, and there should be no empty
lines. An example (sample.dat) and brief
help, which explicitly explains an input file
structure, are provided with the program.
The output file has all the parameters cal-
culated for each resampled and bootstrap
set, their average values, and standard de-
viations.
Examples of Simulation Based on
Allozyme Data in Douglas-fir and
California Closed-Cone Pines
In order to facilitate comparisons between
dominant and codominant markers, and
to help understand the effects of RAPD
dominance and biallelism on our studies
of genetic diversity and differentiation in
Douglas-fir (Aagaard et al. 1998) and the
California closed-cone pines (Wu et al., in
press), we simulated dominance and bial-
lelism in these two allozyme datasets (Li
and Adams 1989; Wu et al., in press). The
first allozyme dataset included six popu-
lations of three races of Douglas-fir—
coastal, north interior, and south interi-
or—with two populations per race. The
second one included four, five, and three
populations of Pinus attenuata, P. muricata,
and P. radiata, respectively. These popu-
lations are described in detail elsewhere
(Aagaard et al. 1998; Wu et al., in press).
From allozyme allele frequencies within
populations we generated simulated pop-
ulations of 1,000 individuals each, and a
total of 400 subpopulations of nindividu-
als were drawn with replacement from
each of the populations. The program also
performed 400 bootstrap resamplings us-
ing a subpopulation of size n. Population
genetic parameters (H
S
,H
T
,G
ST
,
u
w
, and
F
ST
) were then calculated for each set of
400 subpopulations in the three ways de-
scribed above. We varied the number of
individuals (n) within the subsamples
from 10 to 200 to bracket the range of sam-
ple sizes that might reasonably be em-
ployed in population studies, and the sam-
ple size of 30–50 trees per population that
was used in our RAPD studies (Aagaard et
al. 1998; Wu et al., in press). The results of
the simulations are summarized in Figures
1 and 2. The simulations showed that di-
versity measurements (H
S
and H
T
) were
likely to be underestimated by dominant
biallelic markers approximately twofold
regardless of sample size.
When 30 or more diploid individuals per
population were sampled, there was little
effect on differentiation estimates (G
ST
,
u
w
,
Computer Notes
501
Figure 2. Genetic diversity (H
S
) and differentiation (G
ST
; Nei 1986) values averaged over populations of each
California closed-cone pine species for codominant multiallelic allozyme and dominant biallelic markers simulated
in the samples of different sizes. Standard deviations (error bars) were calculated from the variance among 400
bootstrap subsamples simulated for each population of each species. Observed RAPD values are also shown as a
star.
and F
ST
) in Douglas-fir. However, though
still very similar to the estimates for co-
dominant markers, the estimates for the
simulated dominant markers began to di-
verge slightly but significantly downward
at large population sizes in Douglas-fir. In
the California closed-cone pines, the esti-
mates for the simulated dominant markers
converge toward the estimates for codom-
inant multiallelic markers at large popula-
tion sizes, but were always significantly
higher (Wu et al., in press). Our simula-
tions were in close agreement with our
empirical studies of Douglas-fir where, de-
spite dominance and biallelism of RAPD
markers, we have found that RAPDs and
allozymes exhibit similar levels of differ-
entiation at the population and race levels
with adequate sample sizes (Aagaard et al.
1998). However, the California closed-cone
pine allozyme data showed that the larger
sample sizes than we employed in our
RAPD study (Wu et al., in press) are desir-
able for a fair comparison of RAPD and
allozyme data. Finally, despite the expec-
tation of much reduced diversity for dom-
inant biallelic markers predicted by the
simulations, our RAPD data gave higher
estimates of diversity than did allozymes
in both Douglas-fir (Aagaard et al. 1998)
and the California closed-cone pines (Wu
et al., in press). This suggests that RAPD
markers may have much higher intrinsic
genetic diversity than do allozyme mark-
ers. Our results demonstrate the impor-
tance of simulations to help compare and
interpret the results of population studies
with dominant markers.
From the Departments of Forest Science (Aagaard, Kru-
tovskii, and Strauss) and the College of Oceanic and
Atmospheric Sciences (Erofeeva), Oregon StateUniver-
sity, Corvallis, OR 97331-7501. We thank Tom Adams for
providing allozyme data and Vladislav Erofeev for help
with computer software. This work was supported in
part by NSF grants DEB 9300083 and BSR 895702 to
S.H.S. The dominance simulation program (DOMSIM)
is available for public use and can be downloaded as
a self-extracting file domsimpr.exe from the TGERC web
site: http://www.fsl.orst.edu/tgerc/protocol.htm. Ad-
dress correspondence to Dr. K. V. Krutovskii at the ad-
dress above or e-mail: krutovskiik@fsl.orst.edu.
q
1999 The American Genetic Association
References
Aagaard JE, Krutovskii KV, and Strauss SH, 1998. RAPDs
and allozymes exhibit similar levels of diversity and
differentiation among populations and races of Doug-
las-fir. Heredity 81:69–78.
Baruffi L, Damiani G, Guglielmino CR, Bandi C, Malacri-
da AR, and Gasperi G, 1995. Polymorphism within and
between populations of Ceratitis capitata: comparison
between RAPD and multilocus enzyme electrophoresis
data. Heredity 74:425–437.
Dawson IK, Simons AJ, Waugh R, and Powell W, 1996.
Diversity and genetic differentiation among subpopu-
lations Gliricidia sepium revealed by PCR-based assays.
Heredity 74:10–18.
Heun M, Murphy JP, and Phillips TD, 1994. A compari-
son of RAPD and isozyme analyses for determiningthe
genetic relationships among Avena sterilis L. acces-
sions. Theor Appl Genet 87:689–696.
Isabel N, Beaulieu J, and Bousquet J, 1995. Complete
congruence between gene diversity estimates derived
from genotypic data at enzyme and random amplified
polymorphic DNA loci in black spruce. Proc Natl Acad
Sci USA 92:6369–6373.
Krutovskii KV, Vollmer SS, Sorensen FC, Adams WT,
Knapp SJ, and Strauss SH, 1998. RAPD genome maps of
Douglas-fir. J Hered 89:197–205.
Lanne´r-Herrera C, Gustafsson M, Fa¨lt AS, and Bryngels-
son T, 1996. Diversity of wild Brassica oleraceae as es-
timated by isozyme and RAPD analysis. Genet Re-
sources Crop Evol 43:13–23.
Latta RG and Mitton JB, 1997. A comparison of popu-
lation differentiation across four classes of gene mark-
er in limber pine (Pinus flexilis James). Genetics 146:
1153–1163.
le Corre V, Dumolin-Lape`gue S, and Kremer A, 1997.
Genetic variation at allozyme and RAPD loci in sessile
oak Quercus petraea (Matt.) Liebl.: the role of history
and geography. Mol Ecol 6:519–529.
Lewis PO, 1994. GeneStat-PC 3.3. Raleigh, North Caro-
lina: Department of Statistics, North Carolina StateUni-
versity.
Li P and Adams WT, 1989. Range-wide patterns of allo-
zyme variation in Douglas-fir (Pseudotsuga menziesii).
Can J For Res 19:149–161.
Liu Z and Furnier GR, 1993. Comparison of allozyme,
RFLP, and RAPD markers for revealing genetic variation
within and between trembling aspen and bigtooth as-
pen. Theor Appl Genet 87:97–105.
Lynch M and Milligan BG, 1994. Analysis of population
genetic structure with RAPD markers. Mol Ecol 3:91–
99.
Mosseler A, Egger KN, and Hughes GA, 1992. Low levels
of genetic diversity in red pine confirmed by random
amplified polymorphic DNA markers. Can J For Res 22:
1332–1337.
Nei M, 1973. Analysis of gene diversity in subdivided
populations. Proc Natl Acad Sci USA 70:3321–3323.
Nei M, 1986. Definition and estimation of fixation indi-
ces. Evolution 40:643–645.
502
The Journal of Heredity 1999:90(4)
Nei M and Chesser RK, 1983. Estimation of fixation in-
dices and gene diversities. Ann Hum Genet 47:253–259.
Peakall R, Smouse PE, and Huff DR, 1995. Evolutionary
implications of allozyme and RAPD variation in diploid
populations of dioecious buffalograss Buchloe¨ dactylo-
ides. Mol Ecol 4:135–147.
Puterka GJ, Black IV WC, Steiner WM, and Burton RL,
1993 Genetic variation and phylogenetic relationships
among worldwide collections of the Russian wheat
aphid, Diuraphis noxia (Mordvilko), inferred from allo-
zyme and RAPD-PCR markers. Heredity 70:604–618.
Szmidt AE, Wang X, and Lu M, 1996. Empirical assess-
ment of allozyme and RAPD variation in Pinus sylvestris
(L.) using haploid tissue analysis. Heredity 76:412–420.
Travis SE, Maschinski J, and Keim P, 1996. An analysis
of genetic variation in Astragalus cremnophylax var.
cremnophylax, a critically endangered plant, using
AFLP markers. Mol Ecol 5:735–745.
Vos P, Hogers R, Bleeker M, Reijans M, van de Lee T,
Hornes M, Frijters A, Pot J, Peleman J, Kuiper M, and
Zabeau M, 1995. AFLP: a new technique for DNA fin-
gerprinting. Nucleic Acids Res 23:4407–4414.
Weir BS and Cockerham CC, 1984. Estimating F-statis-
tics for the analysis of population structure. Evolution
38:1358–1370.
Williams JG, Kubelik AR, Livak KJ, Rafalski JA, and Tin-
gey SV, 1990. DNA polymorphisms amplified by arbi-
trary primers are useful as genetic markers. Nucleic
Acids Res 18:6531–6535.
Wu J, Krutovskii KV, and Strauss SH, in press. Nuclear
DNA diversity, population differentiation and phyloge-
netic relationships in the California closed-cone pines
based on RAPD and allozyme markers. Genome.
Received April 12, 1998
Accepted February 18, 1999
Corresponding Editor: Robert Angus
BOTTLENECK: A Computer
Program for Detecting
Recent Reductions in the
Effective Population Size
Using Allele Frequency Data
S. Piry, G. Luikart, and J-M.
Cornuet
BOTTLENECK (current version 1.2) is a
population genetics computer program
that conducts four tests for identifying
populations that have recently experi-
enced a severe reduction in effective pop-
ulation size (N
e
). ‘‘Recently’’ is defined as
within approximately the past 2N
e
–4N
e
generations, depending on several factors
such as the severity of the bottleneck and
the mutation rate of the loci being studied
(Cornuet and Luikart 1996). The program
runs on Windows 95
y
. It requires allele
frequency data obtained from one sample
of individuals (e.g., 20–30 diploid individ-
uals) and at least four polymorphic loci.
Significant deviations from population
mutation-drift equilibrium (e.g., bottle-
necks) are important to detect because
equilibrium is an assumption required for
numerous analyses of population genetics
data (e.g., see Nei 1987, p. 251). Bottle-
necks are important to detect in conser-
vation biology because they can increase
the risk of population extinction. Founder-
flush events (i.e., short but severe bottle-
necks) are important to detect because
they may play a role in some modes of
speciation [for reviews see Harrison
(1991) and Howard (1993)].
Principle
Populations that have experienced a re-
cent reduction of their effective popula-
tion size exhibit a correlative reduction of
the allele number and heterozygosity at
polymorphic loci. But the allele number is
reduced faster than the heterozygosity
(H
e
). Thus the H
e
becomes larger than the
heterozygosity (H
eq
) expected at mutation-
drift equilibrium because H
eq
is calculated
from the allele number (and the sample
size; see Description below and Cornuet
and Luikart 1996). Note that H
e
is calcu-
lated from allele frequencies (e.g., 1
2
S
p
i2
, where p
i
is the frequency of the ith
allele). Here both the measured heterozy-
gosity (H
e
) and the expected equilibrium
heterozygosity (H
eq
) refer to heterozygos-
ity in the sense of Nei’s (1987) gene diver-
sity. Heterozygosity never refers to the
proportion of heterozygotes observed
(H
o
). Thus we are not testing for an excess
of heterozyogotes (H
o
.
H
e
), but rather an
excess of heterozygosity (H
e
.
H
eq
).
Strictly speaking, heterozygosity excess
has been demonstrated only for loci evolv-
ing under the infinite allele model ( IAM;
Kimura and Crow 1964) by Maruyama and
Fuerst (1985). If the locus evolves under
the strict one-step stepwise mutation
model (SMM; Ohta and Kimura 1973),
there can be situations where this hetero-
zygosity excess is not observed (Cornuet
and Luikart 1996). However, few loci fol-
low the strict SMM, and as soon as loci
depart slightly from the SMM toward the
IAM they will exhibit a heterozygosity ex-
cess as a consequence of a genetic bottle-
neck. When testing for bottlenecks, the
BOTTLENECK program uses both the
SMM and IAM independently, because
they represent two extreme models of mu-
tation along a continuum of possible mod-
els (Chakraborty and Jin 1992). All loci will
follow a mutation model somewhere in-be-
tween the two extreme models.
For selectively neutral loci in a popula-
tion near mutation-drift equilibrium (i.e., a
population in which N
e
has remained fairly
constant in the past), there is approxi-
mately an equal probability that a locus
will show a slight heterozygosity excess or
a heterozygosity deficit. However, in re-
cently bottlenecked populations, a major-
ity of loci will exhibit an excess of hetero-
zygosity (Luikart and Cornuet 1998). To
determine if a population exhibits a signif-
icant number of loci with heterozygosity
excess, we proposed three statistical
tests: sign test, a standardized differences
test (Cornuet and Luikart 1996; Luikart
and Cornuet 1998), and a Wilcoxon’s
signed rank test (Luikart et al., submitted;
Luikart 1997). We also proposed a graph-
ical descriptor of the shape of the allele
frequency distribution (‘‘mode-shift’’ indi-
cator) which can differentiate between
bottlenecked and stable populations (Lui-
kart et al. 1998).
Interpretation of output from the sign
and standardized differences tests is thor-
oughly discussed in Cornuet and Luikart
(1996) and Luikart and Cornuet (1998). In-
terpretation of output from the graphical
descriptor is discussed in Luikart et al.
(1998). Guidelines for interpreting the out-
put from the Wilcoxon’s test are less easy
to find ( Luikart 1997: chapter 4; Luikart et
al., submitted), although this test is anal-
ogous to the sign test. The Wilcoxon’s test
is generally the most useful of all the tests
because it is the most powerful (along
with the standardized differences test),
and robust (like the sign test) when used
with few (
,
20) polymorphic loci. When
testing for bottlenecks, the null hypothe-
sis of the Wilcoxon’s test is no significant
heterozygosity excess (on average across
loci). Thus the alternate hypothesis is sig-
nificant heterozygosity excess (and thus
evidence of a recent bottleneck). This is a
one-tailed test that requires at least four
polymorphic loci to have any possibility
of obtaining a significant (P
,
.05) test re-
sult.
Description
The BOTTLENECK program computes for
each population sample and for each lo-
cus the distribution of the heterozygosity
(H
eq
) expected from the observed number
of alleles (k), given the sample size (n) un-
der the assumption of mutation-drift equi-
librium. This distribution is obtained
through simulating the coalescent process
of ngenes under each of two possible mu-
tation models, the IAM and the SMM. This
distribution enables the computation of the
average expected equilibrium heterozygos-
ity (H
eq
) for each locus which is compared
to the Hardy–Weinberg heterozygosity (H
e
,
Computer Notes
503
i.e., gene diversity) in order to establish
whether there is a heterozygosity excess
or deficit at each locus. In addition, the
standard deviation (SD) of the mutation-
drift equilibrium distribution of the het-
erozygosity is used to compute the stan-
dardized difference for each locus [(H
e
2
H
eq
)/SD]. The distribution obtained
through simulation also enables the com-
putation of a P-value for the measured het-
erozygosity (H
e
). The P-value is the prob-
ability of obtaining the measured H
e
in a
sample (n) from an equilibrium population
that has the observed number of alleles
(k).
The way in which the coalescent pro-
cess is simulated is unconventional due to
conditioning by the observed number of
alleles. The phylogeny of the ngenes is
simulated as usual (Hudson 1990). Under
the IAM, a single mutation is allocated at
a time and the resulting number of alleles
is computed. The process is repeated until
the simulation reaches the number of al-
leles (k) observed in the population sam-
ple. Under the SMM, a Bayesian approach
is used as explained in Cornuet and Lui-
kart (1996). Briefly, the likelihood distri-
bution of the parameter
u
(
5
4N
e
m
) given
the number of alleles (k) and the sample
size (n) is evaluated as the proportion of
iterations (in the simulation process) pro-
ducing exactly kalleles for a varying set
of
u
’s. As a second step, drawing random
values of
u
according to the likelihood dis-
tribution, the coalescent process is simu-
lated as usual. Only heterozygosities
found in iterations producing exactly kal-
leles are considered. Once all loci in a pop-
ulation sample have been processed the
three statistical tests are performed for
each mutation model, as explained in Cor-
nuet and Luikart (1996), and the allele fre-
quency distribution is graphed to deter-
mine whether a bottleneck-induced mode
shift has recently occurred. Note that a
mode shift is a transient distortion in the
distribution of allele frequencies such that
the frequency of alleles at low frequency
(frequency
,
0.10) becomes lower than
the frequency of alleles in an intermediate
allele frequency class (see Luikart et al.
1998).
Input File Format
Five input data file formats are accepted
and automatically recognized by BOTTLE-
NECK. All are text files. One is the GENE-
POP computer program format (Raymond
and Rousset 1995). The second is the GE-
NETIX computer program format ( Belkhir
et al. 1996). The other three formats con-
cern single population data and are de-
scribed in the help file of the program.
General Comments
BOTTLENECK is written in the Delphi 4
y
(Inprise Co.) computer language. The per-
formance of BOTTLENECK has been thor-
oughly evaluated using simulated datasets
(Cornuet and Luikart 1996; Luikart et al.
1998) and allozyme and microsatellite da-
tasets (Luikart and Cornuet 1998). To
achieve reasonably high statistical power
(
.
0.80), we recommend typing at least 10
polymorphic loci (microsatellites or allo-
zymes) and sampling at least 30 individu-
als. The standardized differences test is
recommended when using approximately
20 or more polymorphic loci (Cornuet and
Luikart 1996). For fewer than 20 loci, the
Wilcoxon’s test is the most appropriate
and powerful. The IAM is recommended
for allozyme data and the SMM is gener-
ally more appropriate when testing micro-
satellite loci (i.e., dinucleotide repeat loci)
(Luikart and Cornuet 1998). For most mi-
crosatellites, the TPM (two-phase model)
is apparently even more appropriate than
the SMM (Di Rienzo et al. 1994; Luikart G,
unpublished data). The TPM was recently
added as an option in BOTTLENECK.
When using microsatellites we recom-
mend the TPM with 95% single-step mu-
tations and 5% multiple-step mutations
(and a variance among multiple steps of
approximately 12). When using the quali-
tative test for mode-shift distortion, we
recommend using at least 30 individuals
and 10–20 polymorphic loci to avoid un-
reasonably high type 1 error rates (i.e., to
avoid concluding that a stable population
has been recently bottlenecked).
BOTTLENECK runs on any computer with
Windows 95
y
. However, we recommend
a computer at least as fast as a pentium
PC. A fast pentium is especially recom-
mended for analyzing datasets containing
many individuals (
..
30) and loci with
many alleles (e.g.,
.
3). Analyzing data un-
der the SMM is far slower than analyses
assuming only the IAM. On a Pentium 166
it takes about 15 minutes to analyze a da-
taset of 44 individuals and 7 loci (with 2–
8 alleles) when using both mutation mod-
els and 1000 simulation iterations. The
number of iterations influences the preci-
sion of the H
eq
estimates. A minimum of
1000 iterations is recommended. The pro-
gram and example input and help files can
be obtained from the World Wide Web at
http://www.ensam.inra.fr/URLB.
From the Laboratoire de Mode´lisation et de Biologie
Evolutive, INRA-URLB, 488 rue de la Croix-Lavit, F-34090
Montpellier, France (Piry and Cornuet), and the Division
of Biological Sciences, University of Montana, Missoula,
Montana (Luikart). G. Luikart is now at the Laboratoire
de Biologie des Populations d’Altitude, Universite´ Joseph
Fourier, Grenoble, France. This work wasfunded by the
Institut National de la Recherche Agronomique,the Ful-
bright Foundation (to G.L.), and the Graduate School
of the University on Montana (to G.L.). I. Till-Bottraud
provided helpful comments. Address correspondence
to J-M. Cornuet at the address above or e-mail:
Cornuet@ensam.inra.fr.
q
1999 The American Genetic Association
References
Belkhir K, Borsa P, Goudet J, Chikhi L, and Bonhomme
F, 1996. GENETIX, logiciel sous Windows
y
pour la ge´-
ne´tique des populations. Version 3.0. Montpellier,
France: Universite´ Montpellier II.
Chakraborty R and Jin L, 1992. Heterozygote deficiency,
population substructure and their implications in DNA
fingerprinting. Hum Genet 88:267–272.
Cornuet J-M and Luikart G, 1996. Description and pow-
er analysis of two tests for detecting recent population
bottlenecks from allele frequency data. Genetics 144:
2001–2014.
Di Rienzo A, Peterson AC, Garza JC, Valdes AM, Slatkin
M, and Freimer NB, 1994. Mutational processes of sim-
ple sequence repeat loci in human populations. Proc
Natl Acad Sci USA 91:3166–3170.
Harrison RG, 1991. Molecular changes at speciation.
Annu Rev Ecol Syst 22:281–308.
Howard DJ, 1993. Small populations, inbreeding, and
speciation. In: The natural history of inbreeding and
outbreeding (Thornhill NW, ed). Chicago: University of
Chicago Press; 118–142.
Hudson RR, 1990. Gene genealogies and the coalescent
process. In: Oxford survey in evolutionary biology,vol.
7 (Futyma D and Antonovics J, eds). Oxford: Oxford
University Press; 1–42.
Kimura M and Crow JF, 1964. The number of allelesthat
can be maintained in a finite population. Genetics 49:
725–738.
Luikart G, 1997. Usefulness of molecular markers for
detecting population bottlenecks and monitoring ge-
netic change (PhD dissertation). Missoula, Montana:
University of Montana.
Luikart G and Cornuet J-M, 1998. Empirical evaluation
of a test for identifying recently bottlenecked popula-
tions from allele frequency data. Conserv Biol 12:228–
237.
Luikart G, Allendorf FW, Sherwin B, and Cornuet J-M,
1998. Distortion of allele frequency distributions pro-
vides a test for recent population bottlenecks. J Hered
12:238–247.
Maruyama T and Fuerst PA, 1985. Population bottle-
necks and non-equilibrium models in population ge-
netics. II. Number of alleles in a small population that
was formed by a recent bottleneck. Genetics 111:675–
689.
Nei M, 1987. Molecular evolutionary genetics. New
York: Columbia University Press.
Ohta T and Kimura M, 1973. A model of mutation ap-
propriate to estimate the number of electrophoretically
detectable alleles in a finite population. Genet Res
Cambr 22:201–204.
Raymond M and Rousset F, 1995. GENEPOP (version
1.2): population genetics software for exact tests and
ecumenicism. J Hered 86:248–249.
Received December 15, 1997
Accepted February 26, 1999
Corresponding Editor: Robert Angus
... We computed neutrality tests for the overall dataset, for each sampled geographic population 650 and for CFR removing the three individuals harbouring the most divergent haplotype (Hap_6) 651 accounting for the possibility of a recent introduction of the haplotype in the population by migrant 652 individuals.653We used BOTTLENECK v.1.2.02(Piry et al., 1999) to test for population size reduction654 using microsatellite data. The stepwise (SMM) and two-phase mutation models (TPM) (Luikart & 655 Cornuet, 1998) were used for datasets of samples grouped by geographic populations and for the 656 overall dataset. ...
... Variance for mutation size was set to 12(Piry et al., 1999). We run one hundred thousand 658 simulations. ...
Preprint
Full-text available
Investigating primates’ behavioral variation at the inter-population level is important for the understanding of the evolutionary processes leading to species-specific patterns. The study of behavioral diversity among populations also contributes to improving’ primate conservation efforts. Dispersal patterns tend to be similar among close phylogenetic lineages but may vary in response to individual-based responses. Here, we investigate dispersal patterns of chacma baboons (Papio ursinus griseipes) living in Gorongosa National Park (GNP) and the Catapu Forest Reserve (CFR) in central Mozambique. The park consists of a mosaic landscape, located in a seasonally variable area. GNP was the epicenter of a major war, which severely reduced most apex predators resulting in limited mammalian predation on baboons and a steep increase in number of groups and/or group’s fission. We used a genetic dataset of 121 non-invasive DNA samples analyzed for uni- and bi-parentally inherited markers aiming to characterize the spatial distribution of genetic variation and investigate the extent and direction of sex-mediated gene flow at different time scales. We found high levels of genetic diversity as estimated using autosomal microsatellite loci data and no evidence for a significant contraction of the population size in the last generations. A very distinct mitochondrial DNA haplotype was sampled in CFR. We found evidence for historical and instantaneous male-biased dispersal and female philopatry, estimated among localities and at short distances in GNP, respectively. Our study highlights the strong conservation of sex-biased dispersal patterns and philopatry in chacma baboons and suggests that dispersal behaviors in chacma baboons are resilient to environmental changes and seasonality.
... BOTTLENECK V1.2.02 was used to validate whether the BCLT population had undergone a bottleneck effect. Under a two-phase model (TPM), we constrained the model by defining 90% of mutations conforming to a stepwise mutation model and 10% as multistep (Piry et al., 1999). AMOVA analyses were conducted by GenAIEx with 999 simulations. ...
Article
Full-text available
To evaluate the genetic quality and provide available management strategies for Blue‐crowned laughingthrush (BCLT), fifteen polymorphic microsatellite loci were developed and applied. The genetic diversity of wild individuals was indicated to be higher than the two captive populations. The average number of alleles (5.50 ± 0.317), the number of effective alleles (3.417 ± 0.222), observed heterozygosity (0.828 ± 0.04), and genetic differentiation index (0.028 ± 0.007) of 64 wild individuals showed high genetic diversity despite drastic bottleneck and low genetic differentiation. The number of effective migrants (22.737 ± 8.318) indicated the intriguing wintering grounds may be surrounded by the breeding sites where the syncheimadia occurred in Wuyuan. Efficient conservation, winter flocking, and cooperative breeding may facilitate gene exchange and inclusive fitness. We recommend that monitoring concentrated distribution areas for BCLT should be strengthened, and geographical barriers, interference types, and the inner mechanism of distribution patterns should be further explored.
... Two methods were used to estimate bottlenecks. The first method involved the BOTTLENECK software v. 1.2.02 [50], a program for estimating bottlenecks through heterozygous excess testing, and the infinite allele model (IAM) [51]. A two-phase model (TPM) and stepwise mutation model (SMM) [52] were used to estimate, and TPM was performed with 10% variance and 90% SMM. ...
Article
Full-text available
This study is the first report to characterize the Rhodus uyekii genome and study the development of microsatellite markers and their markers applied to the genetic structure of the wild population. Genome assembly was based on PacBio HiFi and Illumina HiSeq paired-end sequencing, resulting in a draft genome assembly of R. uyekii. The draft genome was assembled into 2652 contigs. The integrity assessment of the assemblies indicates that the quality of the draft assemblies is high, with 3259 complete BUSCOs (97.2%) in the database of Verbrata. A total of 31,166 predicted protein-coding genes were annotated in the protein database. The phylogenetic tree showed that R. uyekii is a close but distinct relative of Onychostoma macrolepis. Among the 10 fish genomes, there were significant gene family expansions (8–2387) and contractions (16–2886). The average number of alleles amplified by the 21 polymorphic markers ranged from 6 to 23, and the average PIC value was 0.753, which will be useful for evolutionary and genetic analysis. Using population genetic analysis, we analyzed genetic diversity and the genetic structures of 120 individuals from 6 populations. The average number of alleles per population ranged from 7.6 to 9.9, observed heterozygosity ranged from 0.496 to 0.642, and expected heterozygosity ranged from 0.587 to 0.783. Discriminant analysis of principal components According to the analysis method, the population was divided into three populations (BS vs. DC vs. GG, GC, MS, DC). In conclusion, our study provides a useful resource for comparative genomics, phylogeny, and future population studies of R. uyekii.
... BayesAss v.3.0.4 (Wilson and Rannala, 2003) was used to compute recent migration rate between populations. In addition, the presence of population bottlenecks was computed using BOTTLENECK v.1.2.02 (Piry et al., 1999). A Two-Phase Mutation Model (TPM) was applied for the Wilcoxon (Cornuet and Luikart, 1996) and standardized differences tests. ...
... The probability of recent genetic bottlenecks was examined using BOTTLENECK ver. 1.2.02 (Piry et al., 1999) based on assumptions for both the infinite allele model (IAM) and the two-phase model (TPM; 30% IAM and 70% stepwise mutation model). For this analysis, we adopted Wilcoxon's signedrank test with 1000 replications. ...
Article
Full-text available
For endemic benthos inhabiting hydrothermal vent fields, larval recruitment is critical for population maintenance and colonization via migration among separated sites. The vent‐endemic limpet, Lepetodrilus nux, is abundant at deep‐sea hydrothermal vents in the Okinawa Trough, a back‐arc basin in the northwestern Pacific; nonetheless, it is endangered due to deep‐sea mining. This species is associated with many other vent species and is an important successor in these vent ecosystems. However, limpet genetic diversity and connectivity among local populations have not yet been examined. We conducted a population genetics study of L. nux at five hydrothermal vent fields (maximum geographic distance, ~545 km; depths ~700 m to ~1650 m) using 14 polymorphic microsatellite loci previously developed. Genetic diversity has been maintained among these populations. Meanwhile, fine population genetic structure was detected between distant populations, even within this back‐arc basin, reflecting geographic distances between vent fields. There was a significant, positive correlation between genetic differentiation and geographic distance, but no correlation with depth. Contrary to dispersal patterns predicted by an ocean circulation model, genetic migration is not necessarily unidirectional, based on relative migration rates. While ocean circulation contributes to dispersal of L. nux among vent fields in the Okinawa Trough, genetic connectivity may be maintained by complex, bidirectional dispersal processes over multiple generations.
... We tested for the signature of demographic bottlenecks in WCA (excluding hybrid and admixed individuals) using both the Single Mutation Model (SMM) and the Two Phase Model (TPM) in BOTTLENECK 1.2.02 66 . We applied the Wilcoxon sign-rank test to estimate heterozygote excess/deficit using 10,000 replications. ...
Article
Full-text available
The white-bellied pangolin is subject to intense trafficking, feeding both local and international trade networks. In order to assess its population genetics and trace its domestic trade, we genotyped 562 pangolins from local to large bushmeat markets in western central Africa. We show that the two lineages described from the study region (WCA and Gab) were overlapping in ranges, with limited introgression in southern Cameroon. There was a lack of genetic differentiation across WCA and a significant signature of isolation-by-distance possibly due to unsuspected dispersal capacities involving a Wahlund effect. We detected a c. 74.1–82.5% decline in the effective population size of WCA during the Middle Holocene. Private allele frequency tracing approach indicated up to 600 km sourcing distance by large urban markets from Cameroon, including Equatorial Guinea. The 20 species-specific microsatellite loci provided individual-level genotyping resolution and should be considered as valuable resources for future forensic applications. Because admixture was detected between lineages, we recommend a multi-locus approach for tracing the pangolin trade. The Yaoundé market was the main hub of the trade in the region, and thus should receive specific monitoring to mitigate pangolins’ domestic trafficking. Our study also highlighted the weak implementation of CITES regulations at European borders.
... Recent bottleneck detection was performed through two approaches: Bottleneck v.1.2.02 [101] for calculating excess heterozygosity (H E > H O mutational equilibrium assumption) under the infinite alleles (IAM), stepwise mutational (SMM), and two-phase (TPM; parameter settings: IAM: 10%, SMM: 90%, Variance: 10.00, Probability: 90%) models, through Wilcoxon signed rank test with 1000 iterations [102], and ARLEQUIN v3.5.2.2 [72] for calculating the M Ratio [103]. Moreover, the population effective size was estimated with the linkage disequilibrium method [104] implemented in NeEstimator v2.1 [105] considering the allelic frequency of 0.05. ...
Article
Full-text available
The adaptative responses and divergent evolution shown in the environments habited by the Cichlidae family allow to understand different biological properties, including fish genetic diversity and structure studies. In a zone that has been historically submitted to different anthropogenic pressures, this study assessed the genetic diversity and population structure of cichlid Caquetaia kraussii, a sedentary species with parental care that has a significant ecological role for its contribution to redistribution and maintenance of sedimentologic processes in its distribution area. This study developed de novo 16 highly polymorphic species-specific microsatellite loci that allowed the estimation of the genetic diversity and differentiation in 319 individuals from natural populations in the area influenced by the Ituango hydroelectric project in the Colombian Cauca River. Caquetaia kraussii exhibits high genetic diversity levels (Ho: 0.562–0.885; He: 0.583–0.884) in relation to the average neotropical cichlids and a three group-spatial structure: two natural groups upstream and downstream the Nechí River mouth, and one group of individuals with high relatedness degree, possibly independently formed by founder effect in the dam zone. The three genetic groups show recent bottlenecks, but only the two natural groups have effective population size that suggest their long-term permanence. The information generated is relevant not only for management programs and species conservation purposes, but also for broadening the available knowledge on the factors influencing neotropical cichlids population genetics.
... Uttarakhand underwent a recent bottleneck using the Bottleneck v1.2.02 software (Piry et al. 1999). ...
Article
Full-text available
This study traced the maternal lineage of the domestic swine populations using mitochondrial DNA control region markers and genetic diversity using microsatellite markers in Uttarakhand, an Indian state situated at the foothills of the world’s youngest (geo-dynamically sensitive) mountain system, “the Himalayas”. Analysis of 68 maternally unrelated individuals revealed 20 haplotypes. The maternal signature of the Pacific, Southeast Asian, European, and ubiquitously distributed Chinese haplotypes was present in Uttarakhand’s domestic pig population. The D3 haplotype reported in wild pigs from North India was also identified in 47 domestic samples. A unique gene pool, UKD (Uttarakhand Domestic), as another lineage specific to this region has been proposed. Genotypes were analyzed, using 13 sets of microsatellite markers. The observed (Ho) and expected (He) heterozygosities were 0.83 ± 0.02 and 0.84 ± 0.01, respectively. The average polymorphic information content value of 0.83 ± 0.01 indicated the high informativeness of the marker. The overall mean FIS value for all the microsatellite markers was low (F = 0.04, P < 0.01). Seven loci deviated from Hardy-Weinberg equilibrium (HWE) at a significant level (p < 0.05). Two clusters were identified, indicating overlapping populations. These results suggested that though belonging to different maternal lineages, the traditional management practices in Uttarakhand have allowed for genetic mixing and the sharing of genetic material among pig populations. It could contribute to increased genetic diversity but might also result in the loss of distinct genetic characteristics or breed purity of the local breeds if not carefully managed.
Article
Full-text available
In order to describe large-scale spatial structure of sockeye salmon on the Asian part of the range, the variability of 45 SNP loci was analyzed in 22 samples from the Northwest coast of the Pacific Ocean. Three large regional population complexes were identified: Southwest Kamchatka, Kamchatka River basin, and the Northeast (comprising stocks from Koryak Highlands). Populations within the identified complexes are connected by gene migration and have a common origin, close geographic proximity, comparable climatic, landscape, and environmental conditions in the freshwater and early marine periods of sockeye salmon life. Populations confined to watersheds of the North coast of the Sea of Okhotsk (Palana and Okhota rivers), along with island populations, displayed distinctions from the isolated population complexes. It is hypothesized that the marked divergence observed in island populations is primarily caused by genetic drift occurring during long periods of isolation. The pronounced divergence of Palana River population may be the result of both genetic drift and natural selection, driven by the challenging smoltification and specific conditions of freshwater period in this watershed. At the same time in the Okhota River population, demographic factors such as genetic drift and bottlenecks played a key role.
Article
Full-text available
Cinnamomum parthenoxylon is an endemic and endangered species with significant economic and ecological value in Vietnam. A better understanding of the genetic architecture of the species will be useful when planning management and conservation. We aimed to characterize the transcriptome of C. parthenoxylon, develop novel molecular markers, and assess the genetic variability of the species. First, transcriptome sequencing of five trees (C. parthenoxylon) based on root, leaf, and stem tissues was performed for functional annotation analysis and development of novel molecular markers. The transcriptomes of C. parthenoxylon were analyzed via an Illumina HiSeqTM 4000 sequencing system. A total of 27,363,199 bases were generated for C. parthenoxylon. De novo assembly indicated that a total of 160,435 unigenes were generated (average length = 548.954 bp). The 51,691 unigenes were compared against different databases, i.e. COG, GO, KEGG, KOG, Pfam, Swiss-Prot, and NR for functional annotation. Furthermore, a total of 12,849 EST-SSRs were identified. Of the 134 primer pairs, 54 were randomly selected for testing, with 15 successfully amplified across nine populations of C. parthenoxylon. We uncovered medium levels of genetic diversity (PIC = 0.52, Na = 3.29, Ne = 2.18, P = 94.07%, Ho = 0.56 and He = 0.47) within the studied populations. The molecular variance was 10% among populations and low genetic differentiation (Fst = 0.06) indicated low gene flow (Nm = 2.16). A reduction in the population size of C. parthenoxylon was detected using BOTTLENECK (VP population). The structure analysis suggested two optimal genetic clusters related to gene flow among the populations. Analysis of molecular variance (AMOVA) revealed higher genetic variation within populations (90%) than among populations (10%). The UPGMA approach and DAPC divided the nine populations into three main clusters. Our findings revealed a significant fraction of the transcriptome sequences and these newlydeveloped novel EST-SSR markers are a very efficient tool for germplasm evaluation, genetic diversity and molecular marker-assisted selection in C. parthenoxylon. This study provides comprehensive genetic resources for the breeding and conservation of different varieties of C. parthenoxylon.
Article
Full-text available
We use population genetics theory and computer simulations to demonstrate that population bottlenecks cause a characteristic mode-shift distortion in the distribution of allele frequencies at selectively neutral loci. Bottlenecks cause alleles at low frequency (< 0.1) to become less abundant than alleles in one or more intermediate allele frequency class (e.g., 0.1-0.2). This distortion is transient and likely to be detectable for only a few dozen generations. Consequently only recent bottlenecks are likely to be detected by tests for distortions in distributions of allele frequencies. We illustrate and evaluate a qualitative graphical method for detecting a bottleneck-induced distortion of allele frequency distributions. The simple novel method requires no information on historical population sizes or levels of genetic variation; it requires only samples of 5 to 20 polymorphic loci and approximately 30 individuals. The graphical method often differentiates between empirical datasets from bottlenecked and nonbottlenecked natural populations. Computer simulations show that the graphical method is likely (P > .80) to detect an allele frequency distortion after a bottleneck of < or = 20 breeding individuals when 8 to 10 polymorphic microsatellite loci are analyzed.
Article
Full-text available
Note that an updated reference for Genepop is Rousset (2008) genepop’007: a complete re-implementation of the genepop software for Windows and Linux (DOI: 10.1111/j.1471-8286.2007.01931.x)
Article
When a population experiences a reduction of its effective size, it generally develops a heterozygosity excess at selectively neutral loci, i.e., the heterozygosity computed from a sample of genes is larger than the heterozygosity expected from the number of alleles found in the sample if the population were at mutation drift equilibrium. The heterozygosity excess persists only a certain number of generations until a new equilibrium is established. Two statistical tests for detecting a heterozygosity excess are described. They require measurements of the number of alleles and heterozygosity at each of several loci from a population sample. The first test determines if the proportion of loci with heterozygosity excess is significantly larger than expected at equilibrium. The second test establishes if the average of standardized differences between observed and expected heterozygosities is significantly different from zero. Type I and II errors have been evaluated by computer simulations, varying sample size, number of loci, bottleneck size, time elapsed since the beginning of the bottleneck and level of variability of loci. These analyses show that the most useful markers for bottleneck detection are those evolving under the infinite allele model (IAM) and they provide guidelines for selecting sample sizes of individuals and loci. The usefulness of these tests for conservation biology is discussed.
Article
Book
Spectacular progress has been made recently in the study of evolution at the molecular level, primarily due to new biochemical techniques such as gene cloning and DNA sequencing. In this book, the author summarizes new developments and seeks to unify studies of evolutionary histories of organisms and the mechanisms of evolution into a single science - molecular evolutionary genetics.