ArticlePDF Available

GenAIEx V5: Genetic Analysis in Excel. Populations Genetic Software for Teaching and Research

Authors:

Abstract

Summary: GenAlEx: Genetic Analysis in Excel is a cross-platform package for population genetic analyses that runs within Microsoft Excel. GenAlEx offers analysis of diploid codominant, haploid and binary genetic loci and DNA sequences. Both frequency-based (F-statistics, heterozygosity, HWE, population assignment, relatedness) and distance-based (AMOVA, PCoA, Mantel tests, multivariate spatial autocorrelation) analyses are provided. New features include calculation of new estimators of population structure: G′ST, G′′ST, Jost’s Dest and F′ST through AMOVA, Shannon Information analysis, linkage disequilibrium analysis for biallelic data and novel heterogeneity tests for spatial autocorrelation analysis. Export to more than 30 other data formats is provided. Teaching tutorials and expanded step-by-step output options are included. The comprehensive guide has been fully revised.Availability and implementation: GenAlEx is written in VBA and provided as a Microsoft Excel Add-in (compatible with Excel 2003, 2007, 2010 on PC; Excel 2004, 2011 on Macintosh). GenAlEx, and supporting documentation and tutorials are freely available at: http://biology.anu.edu.au/GenAlEx.Contact: rod.peakall@anu.edu.au
Vol. 28 no. 19 2012, pages 2537–2539
BIOINFORMATICS APPLICATIONS NOTE doi:10.1093/bioinformatics/bts460
Genetics and population analysis Advance Access publication July 20, 2012
GenAlEx 6.5: genetic analysis in Excel. Population genetic
software for teaching and research—an update
Rod Peakall
1,
*and Peter E. Smouse
2
1
Evolution, Ecology and Genetics, Research School of Biology, The Australian National University, Canberra ACT 0200,
Australia and
2
Department of Ecology, Evolution and Natural Resources, School of Environmental and Biological
Sciences, Rutgers University, New Brunswick, NJ 08901-8551, USA
Associate Editor: Jonathan Wren
ABSTRACT
Summary: GenAlEx: Genetic Analysis in Excel is a cross-platform
package for population genetic analyses that runs within Microsoft
Excel. GenAlEx offers analysis of diploid codominant, haploid and
binary genetic loci and DNA sequences. Both frequency-based
(F-statistics, heterozygosity, HWE, population assignment, related-
ness) and distance-based (AMOVA, PCoA, Mantel tests, multivariate
spatial autocorrelation) analyses are provided. New features include
calculation of new estimators of population structure: G0
ST
,G00
ST
,
Jost’s D
est
and F0
ST
through AMOVA, Shannon Information analysis,
linkage disequilibrium analysis for biallelic data and novel heterogen-
eity tests for spatial autocorrelation analysis. Export to more than 30
other data formats is provided. Teaching tutorials and expanded
step-by-step output options are included. The comprehensive guide
has been fully revised.
Availability and implementation: GenAlEx is written in VBA and
provided as a Microsoft Excel Add-in (compatible with Excel 2003,
2007, 2010 on PC; Excel 2004, 2011 on Macintosh). GenAlEx, and
supporting documentation and tutorials are freely available at:
http://biology.anu.edu.au/GenAlEx.
Contact: rod.peakall@anu.edu.au
Received on June 1, 2012; revised on July 12, 2012; accepted on
July 13, 2012
1 INTRODUCTION
GenAlEx 6 was originally developed as a teaching tool to facili-
tate teaching population genetic analysis at the graduate level
(Peakall and Smouse, 2006). GenAlEx operates within
Microsoft Excel—the widely used spreadsheet software that
forms part of the cross-platform Microsoft Office suite.
Packaging genetic analysis within a familiar and flexible envir-
onment resulted in quick understanding and effective perform-
ance of population genetic analyses. Taking advantage of the
rich graphical options available within Excel, GenAlEx offers a
wide range of graphical outputs that aid genetic data analysis
and interpretation. GenAlEx is now widely used by university
teachers at both undergraduate and graduate levels around the
world. Moreover, the software has also attracted a large number
of researchers who utilize its unique features. Here we provide an
update on the new features offered in GenAlEx 6.5 that we be-
lieve will be welcomed by students, teachers and researchers.
GenAlEx offers population genetic analysis of diploid codo-
minant, haploid, haplotypic and binary genetic data from ani-
mals, plants and microorganisms. It accommodates a wide range
of genetic markers, including microsatellites (SSRs), single-
nucleotide polymorphisms (SNPs), amplified fragment length
polymorphisms and DNA sequences. Both allele frequency-
based and distance-based analysis options are provided. The
former includes estimates of heterozygosity and genetic diversity,
F-statistics, Nei’s genetic distance, population assignment and
relatedness. The latter includes Analysis of Molecular Variance
(AMOVA), Principal Coordinates Analysis (PCoA), Mantel
tests, TWOGENER, multivariate and 2D spatial autocorrelation.
Readers are referred to Peakall and Smouse (2006) for a more
comprehensive outline of these standard procedures, data for-
mats and data import options.
GenAlEx 6.5 maintains backward compatibility, but it pro-
vides access to the expanded spreadsheet of Excel 2007
onward. Thus, the maximum numbers of loci and samples are
vastly expanded and only constrained by memory. More than 30
different Excel graphs summarize the outcomes of genetic ana-
lyses. Graphics can be further manipulated with Excel options
and easily converted to pdf or other publication-quality formats.
2 NEW FEATURES
2.1 New estimators of population structure
There has been much recent debate about the utility of F
ST
as a
measure of population genetic structure (Jost, 2008; Ryman and
Leimar, 2009; Whitlock, 2011). GenAlEx 6.5 offers the calcula-
tion of G0
ST
,G00
ST
and Jost’s D
est
, providing [0,1]-standardized
allele frequency-based estimators of population genetic structure,
following Meirmans and Hedrick (2011), testing the null by
random permutation and estimating variances via jackknifing
and bootstrapping over loci. New AMOVA routines now
enable the estimation of standardized F0
ST
, following
Meirmans (2006). The calculation of these statistics was vali-
dated by comparison with the software GenoDive v2.0b22
(Meirmans and Van Tienderen, 2004).
2.2 Shannon’s information statistics
Shannon information indices have been widely used in ecology
but largely overlooked in genetics despite offering a framework
for quantifying biological diversity across multiple scales (genes
to landscapes). GenAlEx offers the calculation of a series of
*To whom correspondence should be addressed.
!The Author(s) 2012. Published by Oxford University Press.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits
unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
at Australian National University on October 29, 2012http://bioinformatics.oxfordjournals.org/Downloaded from
Shannon indices, including the mutual information index
S
H
UA
,
an alternative estimator of population structure. The methods
follow Sherwin et al. (2006) who assessed the performance of
Shannon indices for estimating genetic diversity. Smouse and
Ward (1978) extend to multiple hierarchical levels, with a
unique three-level partition option and statistical testing by
random permutation offered in GenAlEx 6.5.
2.3 Tools for comparing pairwise population statistics
The Mantel test capability of GenAlEx has been extended to
allow multiple comparison among pairwise population statistics
such as F
ST
,F0
ST
,G0
ST
,G00
ST
,D
est
and
S
H
UA
. This will allow
informed comparison of the new estimators of population
structure.
2.4 Heterogeneity testing for spatial autocorrelation
GenAlEx 6.5 introduces novel heterogeneity tests (Smouse et al.,
2008), extending application of the multiallelic, multilocus spatial
autocorrelation analysis methods of Smouse and Peakall (1999),
Peakall et al. (2003) and Double et al. (2005). These new methods
provide valuable insights into fine-scale genetic processes across
a wide range of animals and plants. Banks and Peakall (2012)
have confirmed the statistical power and performance of this
heterogeneity test by spatially explicit computer simulations.
2.5 Linkage disequilibrium tests (LD) for biallelic data
Despite its importance, there is no universal test for disequilib-
rium (Slatkin, 2008). GenAlEx 6.5 offers pairwise tests for dis-
equilibrium between biallelic markers such as SNPs. When phase
is known, this includes the calculation of D, D0,rand r
2
, follow-
ing Hedrick (2005). Maximum likelihood estimation is used to
calculate Dand rwhen phase is unknown (Weir, 1990, p. 310).
The results were validated against GDA (Lewis and Zaykin,
2001). Inclusion of LD fills an important technical gap, particu-
larly for teachers. For large SNP sets, or multiallelic data,
GenAlEx users are encouraged to take advantage of the options
to export their data to other packages such as Arlequin 3.5
(Excoffier and Lischer, 2010).
2.6 New allele frequency format
Retrospective calculation of the new estimators of population
structure such as G0
ST
,D
est
and Shannon indices are now pos-
sible from published allele frequency data. Teachers will also find
this a helpful option for the re-analysis of textbook examples.
2.7 Import and export options
GenAlEx offers data import from several popular formats and
tools for importing and manipulating raw data from DNA se-
quencers. Export to more than 30 other data formats is provided,
enabling access to myriad other software packages. For example,
direct export is offered to programs such as GENEPOP
(Rousset, 2008) and STRUCTURE (Pritchard et al., 2000),
and via these same formats to many other programs, including
genetic packages in R such as adegenet (Jombart, 2008) and
pegas (Paradis, 2010). The full list of export options, along
with notes on the export process, can found at the website.
3 SPECIAL FEATURES FOR TEACHING
Offering a user-friendly software package for university stu-
dents and teachers remains an ongoing goal of GenAlEx. We
continue to expand the popular step-by-step output options that
allow students to follow the steps in the analytical pathway.
Teaching-specific menu options are also provided. For example,
the Rand menu allows students to permute and bootstrap hypo-
thetical datasets with color tracking, to aid an understanding of
how these statistical tests work. Finally, we have made freely
available a set of tutorial notes and supporting datasets drawn
from the graduate workshops that we have offered (both jointly
and independently) around the world.
4 DOCUMENTATION
More than 150 pages of documentation are provided. This
includes Appendix 1 that outlines the statistical analyses
used and their supporting references. The revised guide to
GenAlEx 6.5 fully cross-links with the GenAlEx tutorials
and Appendix 1.
5 CONCLUSION
GenAlEx 6.5 offers a wide range of population genetic analysis
options for the full spectrum of genetic markers within the
Microsoft Excel environment on both PC and Macintosh com-
puters. When combined with its user-friendly interface, rich
graphical outputs for data exploration and publication, tools
for data manipulation and export options to many other soft-
ware packages, we believe that GenAlEx offers an ideal launch-
ing pad for population genetic analysis by students, teachers and
researchers alike.
ACKNOWLEDGEMENTS
We thank the many students, teachers and researchers who have
enthusiastically adopted GenAlEx as one of their tools, especially
those who have offered suggestions for improvement. Michaela
Blyton revised the guide, performed extensive beta-testing and
offered crucial advice on improving the user interface. Sasha
Peakall re-designed the GenAlEx logo.
Conflict of Interest: none declared.
REFERENCES
Banks,S.C. and Peakall,R. (2012) Genetic spatial autocorrelation can readily detect
sex-biased dispersal. Mol. Ecol.,21, 2092–2105.
Double,M.C. et al. (2005) Dispersal, philopatry and infidelity: dissecting local gen-
etic structure in superb fairy-wrens (Malurus cyaneus). Evolution,59, 625–635.
Excoffier,L. and Lischer,H.E.L. (2010) Arlequin suite ver 3.5: a new series of pro-
grams to perform population genetics analyses under Linux and Windows. Mol.
Ecol. Res.,10, 564–567.
Hedrick,P.W. (2005) Genetics of Populations. 3rd edn. Sudbury, MA: Jones and
Bartlett Publishers.
Jost,L. (2008) G
ST
and its relatives do not measure differentiation. Mol. Ecol.,17,
4015–4026.
Jombart,T. (2008) adegenet: a R package for the multivariate analysis of genetic
markers. Bioinformatics,24, 1403–1405.
2538
R.Peakall and P.E.Smouse
at Australian National University on October 29, 2012http://bioinformatics.oxfordjournals.org/Downloaded from
Lewis,P.O. and Zaykin,D. (2001) Genetic Data Analysis V1.1. Available at http://
www.eeb.uconn.edu/people/plewis/software.php (30 May 2012, date last
accessed).
Meirmans,P.G. (2006) Using the AMOVA framework to estimate a standardized
genetic differentiation measure. Evolution,60, 2399–2402.
Meirmans,P.G. and Hedrick,P.W. (2011) Assessing population structure: F
ST
and
related measures. Mol. Ecol. Res.,11, 5–18.
Meirmans,P.G. and Van Tienderen,P.H. (2004) GENOTYPE and GENODIVE:
two programs for the analysis of genetic diversity of asexual organisms. Mol.
Ecol. Notes,4, 792–794.
Paradis,E. (2010) pegas: an R package for population genetics with an integrated-
modular approach. Bioinformatics,26, 419–420.
Peakall,R. et al. (2003) Spatial autocorrelation analysis offers new insights into gene
flow in the Australian bush rat, Rattus fuscipes.Evolution,57, 1182–1195.
Peakall,R. and Smouse,P.E. (2006) GenAlEx 6: genetic analysis in Excel.
Population genetic software for teaching and research. Mol. Ecol. Notes,6,
288–295.
Pritchard,J.K. et al. (2000) Inference of population structure using multilocus geno-
type data. Genetics,155, 945–959.
Rousset,F. (2008) GENEPOP’007: a complete re-implementation of the genepop
software for Windows and Linux. Mol. Ecol. Res.,8, 103–106.
Ryman,N. and Leimar,O. (2009) G
ST
is still a useful measure of genetic differenti-
ation—a comment on Jost’s D. Mol. Ecol.,18,20842087.
Sherwin,W. et al. (2006) Measurement of biological information with applications
from genes to landscapes. Mol. Ecol.,15, 2857–2869.
Slatkin,M. (2008) Linkage disequilibrium—understanding the evolutionary past
and mapping the medical future. Nat. Rev. Genet.,9, 477–485.
Smouse,P.E. and Peakall,R. (1999) Spatial autocorrelation analysis of individual
multiallele and multilocus genetic structure. Heredity,82, 561–573.
Smouse,P.E. and Ward,R.H. (1978) A comparison of the genetic infrastructure of
the Ye’cuana and Yanomama: a likelihood analysis of genotypic variation
among populations. Genetics,88, 611–631.
Smouse,P.E. et al. (2008) A heterogeneity test for fine-scale genetic structure. Mol.
Ecol.,17, 3389–3400.
Weir,B.S. (1990) Genetic Data Analysis. Sunderland, MA: Sinauer Associates, Inc.
Whitlock,M.C. (2011) G
ST
and Ddo not replace F
ST
.Mol. Ecol.,20, 1083–1091.
2539
Genetic analysis in Excel
at Australian National University on October 29, 2012http://bioinformatics.oxfordjournals.org/Downloaded from
... The individual amplified DNA fragment sizes in each SSR marker were recorded using UVI-TEC software. The genotypic data were used to analyze locus-based diversity indices such as the number of alleles (Na), effective number of alleles (Ne), percentage of polymorphic loci, observed heterozygosity (Ho), expected heterozygosity (He), Shannon information index (I) and gene flow (Nm) using GenAlEx software v 6.5 [47]. To check the informativeness of the SSR markers used in this study, polymorphic information content (PIC) was analyzed using Power-Marker 3.25 [48]. ...
... To check the informativeness of the SSR markers used in this study, polymorphic information content (PIC) was analyzed using Power-Marker 3.25 [48]. To partition total genetic variation within and among populations; estimates of genetic differentiation were computed by analysis of molecular variance (AMOVA) using GenAlEx 6.5 [47]. Principal coordinate analysis (PCoA) was conducted from the distance matrix of each accession using GenAlEx software. ...
Article
Full-text available
Understanding the genetic diversity of existing genetic resources at the DNA level is an effective approach for germplasm conservation and utilization in breeding programs. However, the patterns of genetic diversity and population structure remain poorly characterized, making germplasm conservation and breeding efforts difficult to succeed. Thus, this study is aimed to evaluate the genetic diversity and population structure of 49 barley accessions collected from different geographic origins in Ethiopia. Twelve SSR markers were used to analyze all accessions and a total of 61 alleles were found, with a mean of 5.08 alleles per locus. The analysis pointed out the existence of moderate to high values of polymorphic information content ranging from 0.39 to 0.91 and the mean Shannon diversity index(I) was 1.25, indicating that they were highly informative markers. The highest Euclidean distance (1.32) was computed between accession 9950 and two accessions (247011 and 9949), while the lowest Euclidean distance (0.00) was estimated between accessions 243191 and 243192. The result of molecular variance analysis revealed that the highest variation was found among accessions (47) relative to within accessions (44) and among geographic origins (9). Cluster analysis grouped the 49 barley accessions into three major clusters regardless of their geographic origin which could be due to the presence of considerable gene flow (2.72). The result of the STRUCTURE analysis was consistent with neighbor-joining clustering and principal coordinate analysis. Generally, this study concluded that the variation among accessions was more important than the difference in geographical regions to develop an appropriate conservation strategy and for parental selection to use in breeding programs. This information will be helpful for barley conservation and breeding, and it may speed up the development of new competing barley varieties.
... The primers used were vm04084, vm26877, vm28527, vm31701, vm38401, vm51985, vm52682, and vm78806 (Table S1). Genotype diversity was examined using GenAlEx 6.5 (Peakall & Smouse, 2012). Unique clones were initially detected by calculating a simple genetic distance matrix between pairs of individuals based on the microsatellite loci as estimated GenAlEx 6.5. ...
... When necessary, missing data points were imputed based on the clonal information using the data matrix consisting of the samples. The final pairwise genetic distance between plants was estimated and used to construct a principal coordinate analysis (PCoA) using a standardized covariance matrix of the genotypes identified plus a set of control genotypes (GenAlEx 6.5; Peakall & Smouse, 2012). For fingerprint identification, the genotypes identified in this study were compared to a database of cranberry cultivar variants present at the National Clonal Germplasm Repository (NCGR; Matusinec et al., 2022;Schlautman et al., 2018). ...
Article
Full-text available
Compared to conventional crops, less is known about how genetic and environmental variability affect the yield and quality of specialty crops like cranberry (Vaccinium macrocarpon Ait.). Herein, we performed a multifaceted analysis of six commercial cranberry beds planted to the Stevens cultivar. The six beds included three with above‐average multiyear yields and three that were lower than average. We considered genotype, edaphic factors, and plant nutrient content as driving variables of yield and fruit quality. We found that genetic purity within beds raised the odds of obtaining above‐average yields over an 8‐year period. The highest levels of genetic contamination (38%–75%) were found at the low‐yield beds, where significant differences in yield and fruit quality were observed between genotypes, within beds. Across all beds, focusing only on plots genetically confirmed to be Stevens cultivar, we also found that plot‐scale yield in 2020 was significantly higher for two of three high‐yield beds, suggesting other factors besides genetic contamination influenced differences in bed‐scale yield. A factor analysis of mixed data that jointly included genotype, edaphic variables, and plant tissue nutrient content revealed complex relations among these variables that were tied to grouping plots based on long‐term yield. Findings highlight the need for further research into the complex genetic and environmental factors that control cranberry yield and fruit quality.
... (Crow and Kimura 1970) for each pollen cloud were calculated based on the paternal (haploid) contribution from pollinator trees to all seedlings from each of the four seed parents. The genetic differentiation between the effective pollen contributions to each mother tree was calculated as G'' ST , Hedrick's standardized G ST (2005), further corrected for bias when the number of populations is small (Peakall and Smouse 2012) and as delta, the estimation of differentiation among effective pollen clouds based on the differentiation of each pollen cloud to its complement (D j ) (Gregorius and Roberds 1986). ...
... The analyses were performed using the programs GSED (delta; D j ) (Gillet 2010) v3.0 and GenAlex v6.503 (G'' ST ) (Peakall and Smouse 2012). The program input file requires knowledge about alleles coming from the seed and from the pollen parent. ...
Article
Full-text available
Gene flow affects the genetic diversity and structure of tree species and can be influenced by stress related to changing climatic conditions. The study of tree species planted in locations outside their natural range, such as arboreta or botanical gardens, allows us to analyse the effect of severe fragmentation on patterns and distances of gene flow. Paternity analysis based on microsatellite marker genotyping was used to analyse how fragmentation affects gene flow among individuals of Quercus rubra L. distributed in a small isolated group of trees (15 trees) planted in the arboretum on the North Campus of the University of Göttingen. For paternity analysis, 365 seedlings from four seed parents were selected and genotyped using 16 microsatellites. The analysis revealed the majority of pollen (84.89%) originated from trees within the site and identified three large full-sib families consisting of 145, 63 and 51 full-sibs. The average pollen dispersal distance for the four seed parents ranged from 17.3 to 103.6 meters. We observed substantial genetic differentiation among effective pollen clouds of the four seed parents (G’’ST = 0.407) as a result of cross pollination between neighboring trees. No self-fertilization was observed. Gene dispersal via pollen followed the expected distance-dependent pattern, and we observed a significant influx of external pollen (15.11%, ranging from 8.64 to 26.26% for individual seed parents) from a diverse set of donors (30). Long-distance pollen dispersal could explain the presence of significant genetic variation even in isolated natural Q. rubra populations.
... Analysis of molecular variance (AMOVA) was carried out using the GenAlEx 6.5 program (Peakall & Smouse, 2012) to describe the distribution of genetic variation among (AP) and within (WP) investigated populations. ...
Article
Full-text available
The Start Codon Targeted Polymorphism (SCoT) and CAAT box-derived polymorphism (CBDP) techniques were used to analyze the genetic diversity and variation of two bigfin reef squid populations in waters surrounding the Con Dao and Phu Quoc islands of Vietnam for technical comparison. The two used techniques reflected different levels of pairwise genetic similarity among individuals depended on the investigated population. Gene differentiation (GST) between the two investigated populations was 0.0767 and 0.0373 leaded to the genetic distance between them was 0.0381 and 0.0228, and the gene flow was Nm = 6.0195 and 12.9061 migrants per generation between the populations based on SCoT and CBDP techniques, respectively. Genetic variation within individuals of both populations (WP) played the key role in the total genetic variation of whole species in surveyed geographic regions with the distribution of 91.44% based on SCoT data and 93.76% based on CBDP data, the distribution of genetic variation among populations (AP) was small. For whole species in the surveyed region, the CBDP markers showed higher genetic diversity, while the SCoT markers reflected the differentiation and genetic distance between the two investigated populations better. Overall, the abilities to detect polymorphisms and the number of revealed loci using SCoT markers were better than using CBDP markers, while the ability to distinguish samples and the primer combination to detect the differences among investigated samples using CBDP markers were better than using SCoT markers, and the overall utility was comparable between these two marker systems. The results from this study prove that the CBDP technique can also be used in studies of animal population genetics.
... To guarantee genotyping quality, the data were tested for null alleles and allelic dropout using MICRO-CHECKER (Van Oosterhout et al., 2004). Departures from Hardy-Weinberg equilibrium (HWE) were determined per locus using GenAlEx v6.502 (Peakall & Smouse, 2012) and per sampling point using GENEPOP v1.2 (Raymond & Rousset, 1995), assuming the alternative hypothesis of a deficit of heterozygotes (MICRO-CHECKER identified an excess of homozygotes within the dataset) with the Markov Chain parameters set to the program's default settings (dememorisation: 10,000; batches: 20; and iterations per batch: 5000). ...
Article
Full-text available
Biodiversity patterns are shaped by the interplay between geodiversity and organismal characteristics. Superimposing genetic structure onto landscape heterogeneity (i.e., landscape genetics) can help to disentangle their interactions and better understand population dynamics. Previous studies on the sub‐Antarctic Prince Edward Islands (located midway between Antarctica and Africa) have highlighted the importance of landscape and climatic barriers in shaping spatial genetic patterns and have drawn attention to the value of these islands as natural laboratories for studying fundamental concepts in biology. Here, we assessed the fine‐scale spatial genetic structure of the springtail, Cryptopygus antarcticus travei, which is endemic to Marion Island, in tandem with high‐resolution geological data. Using a species‐specific suite of microsatellite markers, a fine‐scale sampling design incorporating landscape complexity and generalised linear models (GLMs), we examined genetic patterns overlaid onto high‐resolution digital surface models and surface geology data across two 1‐km sampling transects. The GLMs revealed that genetic patterns across the landscape closely track landscape resistance data in concert with landscape discontinuities and barriers to gene flow identified at a scale of a few metres. These results show that the island's geodiversity plays an important role in shaping biodiversity patterns and intraspecific genetic diversity. This study illustrates that fine‐scale genetic patterns in soil arthropods are markedly more structured than anticipated, given that previous studies have reported high levels of genetic diversity and evidence of genetic structing linked to landscape changes for springtail species and considering the homogeneity of the vegetation complexes characteristic of the island at the scale of tens to hundreds of metres. By incorporating fine‐scale and high‐resolution landscape features into our study, we were able to explain much of the observed spatial genetic patterns. Our study highlights geodiversity as a driver of spatial complexity. More widely, it holds important implications for the conservation and management of the sub‐Antarctic islands.
Article
Full-text available
The geographical variation and domestication of tree species are an important part of the theory of forest introduction, and the tracing of the germplasm is the theoretical basis for the establishment of high‐quality plantations. Chinese pine (Pinus tabuliformis Carr.) is an important native timber tree species widely distributed in northern China, but it is unclear exactly where germplasm of the main Chinese pine plantation populations originated. Here, using two mtDNA markers, we analyzed 796 individuals representing 35 populations (matR marker), and 873 individuals representing 38 populations (nad5‐1 marker) of the major natural and artificial populations in northern China, respectively (Shanxi, Hebei and Liaoning provinces). The results confirmed that the core position of natural SX* populations (“*” means natural population) in the Chinese pine populations of northern China, the genetic diversity of HB and LN plantations was higher than that of natural SX* populations, and there was a large difference in genetic background within the groups of SX* and LN, HB showed the opposite. More importantly, we completed the “point by point” tracing of the HB and LN plantings. The results indicated that almost all HB populations originated from SX* (GDS*, ZTS*, GCS*, and THS*), which resulted in homogeneity of the genetic background of HB populations. Most of germplasm of the LN plantations originated from LN* (ZJS* and WF*), and the other part originated from GDS* (SX*), resulting in the large differences in the genetic background within the LN group. Our results provided a reliable theoretical basis for the scientific allocation, management, and utilization of Chinese pine populations in northern China, and for promoting the high‐quality establishment of Chinese pine plantations.
Preprint
Leptospirosis (caused by pathogenic bacteria in the genus Leptospira) is prevalent worldwide but more common in tropical and subtropical regions. Transmission can occur following direct exposure to infected urine from reservoir hosts, such as rats, or a urine-contaminated environment, which then can serve as an infection source for additional rats and other mammals, including humans. The brown rat, Rattus norvegicus, is an important reservoir of leptospirosis in urban settings. We investigated leptospirosis among brown rats in Boston, Massachusetts and hypothesized that rat dispersal in this urban setting influences the movement, persistence, and diversity of Leptospira. We analyzed DNA from 328 rat kidney samples collected from 17 sites in Boston over a seven-year period (2016-2022); 59 rats representing 12 of 17 sites were positive for Leptospira. We used 21 neutral microsatellite loci to genotype 311 rats and utilized the resulting data to investigate genetic connectivity among sampling sites. We generated whole genome sequences for 28 Leptospira isolates obtained from frozen and fresh tissue from some of the 59 Leptospira-positive rat kidneys. When isolates were not obtained, we attempted Leptospira genomic DNA capture and enrichment, which yielded 14 additional Leptospira genomes from rats. We also generated an enriched Leptospira genome from a 2018 human case in Boston. We found evidence of high genetic structure and limited dispersal among rat populations that is likely influenced by major roads and/or other unknown dispersal barriers, resulting in distinct rat population groups within the city; at certain sites these groups persisted for multiple years. We identified multiple distinct phylogenetic clades of L. interrogans among rats, with specific clades tightly linked to distinct rat populations. This pattern suggests L. interrogans persists in local rat populations and movement of leptospirosis in this urban rat community is driven by rat dispersal. Finally, our genomic analyses of the 2018 human leptospirosis case in Boston suggests a link to rats as the source. These findings will be useful for guiding rat control and human leptospirosis mitigation efforts in this and other urban settings.
Article
El conocimiento sobre la diversidad genética de las plantas cultivadas es de gran importancia ante escenarios de cambio climático y presencia de plagas y enfermedades. México es considerado uno de los países con más diversidad de especies y con una alta producción agrícola; por lo tanto, los cultivos con baja diversidad genética se encuentran en una situación de vulnerabilidad. El estudio de la diversidad genética en plantas puede abordarse mediante dos metodologías: la caracterización morfológica y la molecular. La primera metodología depende de las interacciones con el ambiente, pudiendo generar datos ambiguos, mientras que la segunda, al detectar secuencias específicas en el ADN, se considera estable y de mayor confianza. El objetivo de esta investigación fue evaluar, por primera vez, la capacidad de los marcadores moleculares SRAP (Polimorfismos Amplificados Relacionados con la Secuencia) para determinar el nivel de diversidad genética de 88 genotipos de guanábana (Annona muricata L.) provenientes de cinco huertos comerciales del municipio de Compostela, Nayarit, México (principal productor a nivel mundial). Se seleccionaron tres combinaciones polimórficas y reproducibles de marcadores SRAP que revelaron 126 alelos y 76.67 % de polimorfismo, el número promedio de bandas observadas con estas combinaciones fue de 42. Los valores de heterocigosidad esperada (He) variaron entre las poblaciones de 0.144 a 0.176. Los análisis de agrupamiento mostraron cuatro grupos principales. Los resultados del análisis molecular de varianza indicaron que la mayor variación se encuentra dentro de las poblaciones (88 %). La información generada de esta investigación puede ser útil para construir estrategias de conservación y manejo del cultivo.
Article
Pinus cembroides es el pino piñonero con la más amplia distribución e importancia económica y social en México. El objetivo del presente estudio fue evaluar los niveles y patrones de variación genética de ocho poblaciones de P. cembroides de la región central de México mediante marcadores ISSR (Inter-secuencias simples repetidas). Las poblaciones se distribuyen en dos provincias fisiográficas, la Sierra Madre Oriental y la Mesa del Centro. Los iniciadores utilizados generaron un total de 154 bandas, de las cuales 88.3 % fueron polimórficas a nivel de especie. La diversidad genética promedio en las poblaciones fue alta (He = 0.22), el porcentaje de polimorfismo promedio fue 59.2 %. Los parámetros de diversidad genética fueron más altos en las poblaciones de la Sierra Madre Oriental. La mayor parte de la diversidad genética se encontró dentro de las poblaciones (74 %) y sólo el 26 % entre éstas. De acuerdo con el valor de GST = 0.28, existe alta diferenciación entre las poblaciones. El nivel de flujo génico, considerado como el número de individuos migrantes, fue bajo (Nm = 1.27) entre las poblaciones en las dos regiones. Se encontró baja pero significativa correlación entre la distancia geográfica y la distancia genética de las poblaciones en las dos regiones.
Article
The package adegenet for the R software is dedicated to the multivariate analysis of genetic markers. It extends the ade4 package of multivariate methods by implementing formal classes and functions to manipulate and analyse genetic markers. Data can be imported from common population genetics software and exported to other software and R packages. adegenet also implements standard population genetics tools along with more original approaches for spatial genetics and hybridization. Availability: Stable version is available from CRAN: http://cran.r-project.org/mirrors.html. Development version is available from adegenet website: http://adegenet.r-forge.r-project.org/. Both versions can be installed directly from R. adegenet is distributed under the GNU General Public Licence (v.2). Contact:jombart@biomserv.univ-lyon1.fr Supplementary information:Supplementary data are available at Bioinformatics online.
Article
We describe a model-based clustering method for using multilocus genotype data to infer population structure and assign individuals to populations. We assume a model in which there are K populations (where K may be unknown), each of which is characterized by a set of allele frequencies at each locus. Individuals in the sample are assigned (probabilistically) to populations, or jointly to two or more populations if their genotypes indicate that they are admixed. Our model does not assume a particular mutation process, and it can be applied to most of the commonly used genetic markers, provided that they are not closely linked. Applications of our method include demonstrating the presence of population structure, assigning individuals to populations, studying hybrid zones, and identifying migrants and admixed individuals. We show that the method can produce highly accurate assignments using modest numbers of loci—e.g., seven microsatellite loci in an example using genotype data from an endangered bird species. The software used for this article is available from http://www.stats.ox.ac.uk/~pritch/home.html.
Article
Comparison of population structure between studies can be difficult, because the value of the often-used FST-statistic depends on the amount of genetic variation within populations. Recently, a standardized measure of genetic differentiation was developed based on GST, which addressed this problem, though no method was provided to estimate this standardized measure without bias. Here I present a method to estimate a standardized measure of population differentiation based on the analysis of molecular variance framework. One advantage of the method is that it can be readily expanded to include different hierarchical levels in the tested population structure.
Article
Dispersal is a fundamental process that influences the response of species to landscape change and habitat fragmentation. In an attempt to better understand dispersal in the Australian bush rat, Rattus fuscipes, we have combined a new multilocus autocorrelation method with hypervariable microsatellite genetic markers to investigate fine-scale (≤1 km) patterns of spatial distribution and spatial genetic structure. The study was conducted across eight trapping transects at four sites, with a total of 270 animals sampled. Spatial autocorrelation analysis of bush rat distribution revealed that, in general, animals occurred in groups or clusters of higher density (≤200 m across), with intervening gaps or lower density areas. Spatial genetic autocorrelation analysis, based on seven hypervariable microsatellite loci (He = 0.8) with a total of 80 alleles, revealed a consistent pattern of significant positive local genetic structure. This genetic pattern was consistent for all transects, and for adults and sub-adults, males and females. By testing for autocorrelation at multiple scales from 10 to 800 m we found that the extent of detectable positive spatial genetic structure exceeded 500 m. Further analyses detected significantly weaker spatial genetic structure in males compared with females, but no significant differences were detected between adults and sub adults. Results from Mantel tests and hierarchical AMOVA further support the conclusion that the distribution of bush rat genotypes is not random at the scale of our study. Instead, proximate bush rats are more genetically alike than more distant animals. We conclude that in bush rats, gene flow per generation is sufficiently restricted to generate the strong positive signal of local spatial genetic structure. Although our results are consistent with field data on animal movement, including the reported tendency for males to move further than females, we provide the first evidence for restricted gene flow in bush rats. Our study appears to be the first microsatellite-based study of fine-scale genetic variation in small mammals and the first to report consistent positive local genetic structure across sites, age-classes, and sexes. The combination of new forms of autocorrelation analyses, hypervariable genetic markers and fine-scale analysis (<1 km) may thus offer new evolutionary insights that are overlooked by more traditional larger scaled (>10 km) population genetic studies.
Article
Abstract Dispersal influences evolution, demography, and social characteristics but is generally difficult to study. Here we combine long-term demographic data from an intensively studied population of superb fairy-wrens(Malurus cyaneus) and multivariate spatial autocorrelation analyses of microsatellite genotypes to describe dispersal behavior in this species. The demographic data revealed: (1) sex-biased dispersal: almost all individuals that dispersed into the study area over an eight-year period were female (93%; n 5 153); (2) high rates of extragroup infidelity (66% of offspring), which also facilitated local gene dispersal; and (3) skewed lifetime reproductive success in both males and females. These data led to three expectations concerning the patterns of fine-scale genetic structure: (1) little or no spatial genetic autocorrelation among females, (2) positive spatial genetic autocorrelation among males, and (3) a heterogeneous genetic landscape. Global autocorrelation analysis of the genotypes present in the study population confirmed the first two expectations. A novel two-dimensional local autocorrelation analysis confirmed the third and provided new insight into the patterns of genetic structure across the two-dimensional landscape. We highlight the potential of autocorrelation analysis to infer evolutionary processes but also emphasize that genetic patterns in space cannot be fully understood without an appropriate and intensive sampling regime and detailed knowledge of the individuals genotyped.
Article
Investigating diversity in asexual organisms using molecular markers involves the assignment of individuals to clonal lineages and the subsequent analysis of clonal diversity. Assignment is possible using a distance matrix in combination with a user-specified threshold, defined as the maximum distance between two individuals that are considered to belong to the same clonal lineage. Analysis of clonal diversity requires tests for differences in diversity and clonal composition between populations. We developed two programs, genotype and genodive for such analyses of clonal diversity in asexually reproducing organisms. Additionally, genotype can be used for detecting genotyping errors in studies of sexual organisms.
Article
GENALEX is a user-friendly cross-platform package that runs within Microsoft Excel, enabling population genetic analyses of codominant, haploid and binary data. Allele frequency-based analyses include heterozygosity, F statistics, Nei&apos;s genetic distance, population assignment, probabilities of identity and pairwise relatedness. Distance-based calculations include AMOVA, principal coordinates analysis (PCA), Mantel tests, multivariate and 2D spatial autocorrelation and TWOGENER. More than 20 different graphs summarize data and aid exploration. Sequence and genotype data can be imported from automated sequencers, and exported to other software. Initially designed as tool for teaching, GENALEX 6 now offers features for researchers as well. Documentation and the program are available at http://www.anu.edu.au/BoZo/GenAlEx/