ArticlePDF Available

APE: Analyses of Phylogenetics and Evolution in R Language

Authors:
  • Université de Montpellier / University of Chulalongkorn

Abstract and Figures

Analysis of Phylogenetics and Evolution (APE) is a package written in the R language for use in molecular evolution and phylogenetics. APE provides both utility functions for reading and writing data and manipulating phylogenetic trees, as well as several advanced methods for phylogenetic and evolutionary analysis (e.g. comparative and population genetic methods). APE takes advantage of the many R functions for statistics and graphics, and also provides a flexible framework for developing and implementing further statistical methods for the analysis of evolutionary processes. Availability: The program is free and available from the official R package archive at http://cran.r-project.org/src/contrib/PACKAGES.html#ape. APE is licensed under the GNU General Public License.
Content may be subject to copyright.
BIOINFORMATICS APPLICATIONS NOTE
Vol. 20 no. 2 2004, pages 289–290
DOI: 10.1093/bioinformatics/btg412
APE: Analyses of Phylogenetics and Evolution
in R language
Emmanuel Paradis
1,
, Julien Claude
1
and Korbinian Strimmer
2
1
Laboratoire de Paléontologie, Paléobiologie and Phylogénie, Institut des Sciences
de l’Évolution, Université Montpellier II, F-34095 Montpellier cédex 05, France and
2
Department of Statistics, University of Munich, Ludwigstrasse 33, D-80539 Munich,
Germany
Received on April 11, 2003; revised on July 11, 2003; accepted on July 29, 2003
ABSTRACT
Summary: Analysis of Phylogenetics and Evolution (APE) is
a package written in the R language for use in molecular evol-
ution and phylogenetics.APE providesboth utility functions for
reading and writing data and manipulating phylogenetic trees,
as well as several advanced methods for phylogenetic and
evolutionaryanalysis(e.g.comparativeand populationgenetic
methods). APE takes advantage of the many R functions for
statistics and graphics, and also provides a flexible framework
for developing and implementing further statistical methodsfor
the analysis of evolutionary processes.
Availability: The program is free and available from the offi-
cial R package archive at http://cran.r-project.org/src/contrib/
PACKAGES.html#ape. APE is licensed under the GNU
General Public License.
Contact: paradis@isem.univ-montp2.fr
Phylogenetic analysis, in its broad sense, covers a very wide
range of methods from computing evolutionary distances,
reconstructing gene trees, estimating divergence dates, to
the analysis of comparative data, estimation of evolutionary
rates and analysis of diversification. All these diverse tasks
have one particular aspect in common: they rely heavily on
computational statistics.
The R system, a free platform-independent open-source
analysis environment, has recently emerged as the de facto
standard for statistical computing and graphics (Ihaka and
Gentleman, 1996). One advantage of R is that it can be easily
tailored to a particular application area by writing specialized
packages. In particular, the usefulness of R in bioinformatics
has already been impressively demonstrated in the analysis of
gene expression data (http://www.bioconductor.org).
Analysis of Phylogenetics and Evolution (APE) is the first
joint effort to utilize the power of R also in the analysis of
phylogeneticandevolutionarydata. APEfocusesonstatistical
analyses using phylogenetic and genealogical trees as input.
To whom correspondence should be addressed.
In Version 1.1, APE provides functions for reading, writ-
ing, plotting and manipulating phylogenetic trees, analyses
of comparative data in a phylogenetic framework, analysis
of diversification, computing distances from allelic and nuc-
leotide data, reading nucleotide sequences and several other
tools, such as Mantel’s test, computation of minimum span-
ning tree or estimation of population genetics parameters.
Table 1 gives an overview of the functions currently available
in APE. Note that some of the methods (e.g. comparative
method, skyline plot, etc.) have previously been available
only in specialized softwares. External tree reconstruction
programs (such as PHYLIP) can be called from R through
standard shell commands.
One strength of R is that it is straightforward to obtain
publication-quality graphical output, particularly with its
PostScriptdevice. Forinstance, theplottingfunctionofphylo-
genies in APE handles colors, line thickness, font, spacing of
labels, whichcanbedefinedseparatelyforeachbranch, sothat
three different variables can be represented on a single phylo-
geny plot. APE also produces complex population genetics
plots, such as the generalized skyline plot (Strimmer and
Pybus, 2001), with a single command.
APE, like any R package, is command-line driven. The
functions are called by the user, possibly with arguments and
options. Any session using APE in R starts with the command
library(ape)
which makes the functions of APE available in the R envir-
onment. The list of these functions can be displayed with the
command
library(help = ape)
which displays their names with a brief description. An evol-
utionary tree saved on the disk in the text file tree1.txt in
the standard Newick parenthetic format can then be read by
tree1 <- read.tree(‘tree1.txt’)
Thisstores the phylogenetictree is inan object namedtree1
of class phylo’. The information stored in this object
Bioinformatics 20(2) © Oxford University Press 2004; all rights reserved. 289
by guest on July 14, 2011bioinformatics.oxfordjournals.orgDownloaded from
E.Paradis et al.
Table 1. Special functions available in APE 1.1
Application Available commands
Input/output read.dna, write.dna, read.nexus,
write.nexus, read.tree, write.tree,
read.GenBank
Graphics add.scale.bar, plot.mst, plot.phylo,
plot.skyline, lines.skyline,
ltt.plot
Tree manipulation bind.tree, drop.tip, is.binary.tree,
is.ultrametric
Comparative method compar.gee, compar.lynch, pic,
vcv.phylo
Diversification birthdeath, cherry, diversi.gof,
diversi.time, gamma.stat
Population genetics branching.times,
coalescent.intervals,
collapsed.intervals,
find.skyline.epsilon,
heterozygosity, skylineplot,
skyline, theta.h, theta.k, theta.s
Molecular dating chronogram, ratogram, NPRS.criterion
Miscellaneous all.equal.phylo, balance, base.freq,
dist.dna, dist.gene, dist.phylo,
GC.content, klastorin, mantel.test,
mst, summary.phylo
Data sets bird.families, bird.orders, hivtree,
landplants, opsin, woodmouse,
xenarthra
Detailed information about each function can be accessed with the online help [e.g.
help(mantel.test)].
(e.g. branch lengths) can be inspected by typing tree1 and
graphical output in form of a cladogram can be obtained by
executing
plot(tree1)
which actually calls the function plot.phylo of APE to
draw the phylogenetic tree tree1 [due to the object-oriented
nature of R the command plot(x) may give a completely
different result depending on the class of x]. The tree is
plotted,bydefault, onagraphicalwindow, butcanbeexported
in various file formats depending on the operating system.
In addition to this trivial example, the representation of
a phylogenetic tree in an object-oriented structure results
in straightforward manipulation of the phylogenetic data for
variouscomputationsusedinevolutionaryanalyses.Currently
implementedinAPEareapproaches, suchasphylogenetically
independent contrasts (Felsenstein, 1985; Harvey and Pagel,
1991), fitting birth–death models (Nee et al., 1994; Pybus
and Harvey, 2000), population-genetic analysis (Nee et al.,
1995; Strimmer and Pybus, 2001), non-parametric smooth-
ing of evolutionary rates (Sanderson, 1997) and estimation
of groups of genes in phylogenetic trees using Klastorin’s
method (Misawa and Tajima, 2000). Furthermore, distance-
based clustering methods as implemented in the R function
hclust can be used by APE using functions converting to
and from objects of class ‘phylo’ and ‘hclust’.
All R functions available in APE (Table 1) are documented
in the R hypertext format and information regarding their use
can be retrieved by applying the help command, e.g.
help(read.tree)
The classes and methods in APE (like phylo) can also
easily be further extended to include other functionalities, for
instance to annotate phylogenetic trees. Thus, APEisnot only
a data analysis package, it is also an environment for develop-
ing and implementing new methods. Furthermore, programs
written in C, C++ or Fortran77 can be linked and called
from R. This is particularly useful for computer intensive
calculations.
ACKNOWLEDGEMENTS
We thank two anonymous referees for their constructive com-
ments on a previous version of this paper. This research was
financially supported by the Programme inter-EPST ‘Bioin-
formatique’(E.P. and J.C.)and by anEmmy-Noether research
grant from the DFG (K.S.). This is publication 2003–053
of the Institute des Sciences de l’Evolution (Unite Mixte de
Recherche 5554 du Centre National Recherche Scientifique).
REFERENCES
Felsenstein,J.(1985) Phylogenies andthe comparativemethods. Am.
Nat., 125, 1–15.
Harvey,P.H. and Pagel,M.D. (1991) The Comparative Method in
Evolutionary Biology. Oxford University Press, Oxford.
Ihaka,R. and Gentleman,R. (1996) R: a language for data analysis
and graphics. J. Comput. Graph. Statist., 5, 299–314.
Misawa,K. and Tajima,F. (2000) A simple method for classifying
genes and a bootstrap test for classifications. Mol. Biol. Evol., 17,
1879–1884.
Nee,S., Holmes,E.C., Rambaut,A. and Harvey,P.H. (1995) Inferring
population history from molecular phylogenies. Phil. Trans. R.
Soc. Lond. B, 349, 25–31.
Nee,S., May,R.M. and Harvey,P.H. (1994) The reconstructed evolu-
tionary process. Phil. Trans. R. Soc. Lond. B, 344, 305–311.
Pybus,O.G. and Harvey,P.H. (2000) Testing macro-evolutionary
models using incomplete molecular phylogenies. Proc. R. Soc.
Lond B, 267, 2267–2272.
Sanderson,M.J. (1997) A nonparametric approach to estimating
divergence times in the absence of rate constancy. Mol. Biol.
Evol., 14, 1218–1231.
Strimmer,K. and Pybus,O.G.(2001) Exploring the demographic his-
tory of a sample of DNA sequences using the generalized skyline
plot. Mol. Biol. Evol., 18, 2298–2305.
290
by guest on July 14, 2011bioinformatics.oxfordjournals.orgDownloaded from
... To further understand where changes in state are happening along the tree, I also generated stochastic character maps-which sample character histories from their posterior probability distribution using an MCMC approach-sampling 100 trees using the best fitting model, and plotted the character changes estimated across all 100 trees onto a single phylogeny. All analyses were carried out using phytools (Revell 2012) and all plotting was done with ape (Paradis et al. 2004), both executed in R. To study patterns in ploidy, I repeated the above analyses, replacing the reproductive mode character data with the ploidy data. ...
Preprint
Full-text available
Biologists have long pondered species' geographical distributions and sought to understand what factors drive dispersal and determine species ranges. In considering plant species with large ranges, a question that has remained underexplored is whether large ranges are attained primarily through many instances of short scale dispersal or whether instead widespread ranges are attained by propagules with increased dispersal distances. Ferns provide an ideal system to explore this question as their propagules are very small spores, which have been theorised can be carried by wind to essentially anywhere on the planet. Unfortunately, population-level genetic data in ferns is relatively uncommon, limiting our ability to answer this and related questions. For this work, I focus on Cheilanthes distans (Pteridaceae) as a study system, a widespread fern with extensive spore variation that occurs over Australia and into New Zealand/Aotearoa, New Caledonia, and other Pacific islands. I sampled widely across the species' range, in addition to across Australasian Cheilanthes (as a robust tree for the genus does not exist), ultimately building a phylogeny based on the GoFlag 451 bait set. With these data, we can investigate additional questions, including whether reproductive mode, polyploidy, or lineage influence dispersal, as well as whether movement is occurring randomly or is instead asymmetrical. I explored the relationships between sexual and apomictic specimens to understand whether the former are the parental lineages to apomictic plants and whether we find evidence for apomictic plants dispersing out of a small parental range. I investigated how many times polyploid lineages have arisen in C. distans and whether they are each limited geographically, perhaps forming isolated ranges that collectively result in C. distans' larger range. Additionally, I generated estimates for ancestral ranges and dispersal between populations to understand whether certain lineages are limited to particular geographic regions, to explore the directionality of dispersal, and to assess whether most movement is happening over short or long distances. Particularly interestingly, I find that most dispersal in this species appears to occur over smaller steps rather than longer jumps, underscoring how short movement can nevertheless allow for establishment of large ranges; this dispersal is not limited phylogenetically and seems to occur equally for all lineages. What is more, I find evidence for asymmetrical dispersal directionality, apparently most frequently tracking trade winds. These findings demonstrate the importance of population-level data, and provide concrete results that add nuance to long-standing dispersibility hypotheses in the fern community that have, until now, lacked empirical data.
... Trees showing only experimentally tested sequences were created by removing all tips corresponding to non-experimentally tested sequences using the keep.tip function in ape v5.6-2 (Paradis et al., 2004). Measurable evolution was tested using the clustered, resolved date randomization test (Hoehn et al., 2021) in dowser v1.0.0. ...
Article
Full-text available
Germinal centers (GC) are microanatomical lymphoid structures where affinity-matured memory B cells and long-lived bone marrow plasma cells are primarily generated. It is unclear how the maturation of B cells within the GC impacts the breadth and durability of B cell responses to influenza vaccination in humans. We used fine needle aspiration of draining lymph nodes to longitudinally track antigen-specific GC B cell responses to seasonal influenza vaccination. Antigen-specific GC B cells persisted for at least 13 wk after vaccination in two out of seven individuals. Monoclonal antibodies (mAbs) derived from persisting GC B cell clones exhibit enhanced binding affinity and breadth to influenza hemagglutinin (HA) antigens compared with related GC clonotypes isolated earlier in the response. Structural studies of early and late GC-derived mAbs from one clonal lineage in complex with H1 and H5 HAs revealed an altered binding footprint. Our study shows that inducing sustained GC reactions after influenza vaccination in humans supports the maturation of responding B cells.
... However these data could represent correlation based on both functional adaptation and phylogenetic relatedness of species. To remove the covariance caused by species relatedness, a phylogenetically independent contrast was performed using the method of Felsenstein (1985) implemented through the R package "Analyses of Phylogenetics and Evolution" (ape; Paradis et al. 2004). This method provided correction for the non-independence of data points (species) which is assumed in linear correlations. ...
Article
Full-text available
Kinesins are eukaryotic microtubule motor proteins subdivided into conserved families with distinct functional roles. While many kinesin families are widespread in eukaryotes, each organismal lineage maintains a unique kinesin repertoire composed of many families with distinct numbers of genes. Previous genomic surveys indicated that land plant kinesin repertoires differ markedly from other eukaryotes. To determine when repertoires diverged during plant evolution, we performed robust phylogenomic analyses of kinesins in 24 representative plants, two algae, two animals, and one yeast. These analyses show that kinesin repertoires expand and contract coincident with major shifts in the biology of algae and land plants. One kinesin family and five subfamilies, each defined by unique domain architectures, emerged in the green algae. Four of those kinesin groups expanded in ancestors of modern land plants, while six other kinesin groups were lost in the ancestors of pollen-bearing plants. Expansions of different kinesin families and subfamilies occurred in moss and angiosperm lineages. Other kinesin families remained stable and did not expand throughout plant evolution. Collectively these data support a radiation of kinesin domain architectures in algae followed by differential positive and negative selection on kinesins families and subfamilies in different lineages of land plants.
... The aforementioned phylogenetic tree from Hedges et al. [27] was used for phylogenetic random effects. In this model, the main effect of LPS treatment (β 1 ) can be interpreted as the change in expression after LPS treatment for each gene at the ancestral body size (somewhat arbitrary for the purposes of this study, and determined by optimization with the 'ace()' function from the R package ape v. 5.7-1 [49]), the main effect of species' average body mass (β 2 ) can be interpreted as the allometric effect on the baseline relative abundance of each gene (e.g. LPS(−)), the interaction between LPS treatment and species' average body mass (β 3 ) can be interpreted as the allometric effect on change in expression after LPS treatment (e.g. ...
Article
Full-text available
Empirical data relating body mass to immune defence against infections remain limited. Although the metabolic theory of ecology predicts that larger organisms would have weaker immune responses, recent studies have suggested that the opposite may be true. These discoveries have led to the safety factor hypothesis, which proposes that larger organisms have evolved stronger immune defences because they carry greater risks of exposure to pathogens and parasites. In this study, we simulated sepsis by exposing blood from nine primate species to a bacterial lipopolysaccharide (LPS), measured the relative expression of immune and other genes using RNAseq, and fitted phylogenetic models to determine how gene expression was related to body mass. In contrast to non-immune-annotated genes, we discovered hypermetric scaling in the LPS-induced expression of innate immune genes, such that large primates had a disproportionately greater increase in gene expression of immune genes compared to small primates. Hypermetric immune gene expression appears to support the safety factor hypothesis, though this pattern may represent a balanced evolutionary mechanism to compensate for lower per-transcript immunological effectiveness. This study contributes to the growing body of immune allometry research, highlighting its importance in understanding the complex interplay between body size and immunity over evolutionary timescales.
... First, anpan calculates the Euclidean distance between samples based on the gene presence or absence matrices in each species after dimension reduction by principal components analysis. Second, phylogenetic trees were generated using the neighbor-joining function in the R packages ape 76 and ggtree (v3.4.4). Third, using the phylogenetic trees as inputs, we fitted PGLMMs using the T2D status as the dependent variable and the phylogeny as the independent variable, with age, sex, BMI, study and metformin use as covariates. ...
Article
Full-text available
The association of gut microbial features with type 2 diabetes (T2D) has been inconsistent due in part to the complexity of this disease and variation in study design. Even in cases in which individual microbial species have been associated with T2D, mechanisms have been unable to be attributed to these associations based on specific microbial strains. We conducted a comprehensive study of the T2D microbiome, analyzing 8,117 shotgun metagenomes from 10 cohorts of individuals with T2D, prediabetes, and normoglycemic status in the United States, Europe, Israel and China. Dysbiosis in 19 phylogenetically diverse species was associated with T2D (false discovery rate < 0.10), for example, enriched Clostridium bolteae and depleted Butyrivibrio crossotus. These microorganisms also contributed to community-level functional changes potentially underlying T2D pathogenesis, for example, perturbations in glucose metabolism. Our study identifies within-species phylogenetic diversity for strains of 27 species that explain inter-individual differences in T2D risk, such as Eubacterium rectale. In some cases, these were explained by strain-specific gene carriage, including loci involved in various mechanisms of horizontal gene transfer and novel biological processes underlying metabolic risk, for example, quorum sensing. In summary, our study provides robust cross-cohort microbial signatures in a strain-resolved manner and offers new mechanistic insights into T2D.
... When l values are closer to 0, it is more likely that traits evolved independently. We performed these analyses using the pgls function in the caper package (Freckleton et al., 2002) and employed the Cataglyphis phylogeny ( Figure S1; from Lecocq de Pletincx et al., 2021), which was handled using the ape package (Paradis et al., 2004). All analyses were performed in R (v. 4.2.1; ...
Article
Full-text available
In many species, females have multiple mates, whose sperm compete for paternity. Males may subsequently invest in the increased production of sperm and/or seminal fluid. The latter is a complex mixture of proteins, peptides, and other compounds generated by the accessory glands (AGs) and is transferred to females along with a male’s sperm. Seminal fluid is known to be a key determinant of competitive outcomes among sperm, and its production may trade off with that of sperm. We show that AG size—a proxy for seminal fluid production—has a positive and phylogenetically robust correlation with both sperm competition intensity and sperm production in nine species of Cataglyphis desert ants. These results indicate a lack of trade-off between sperm production and seminal fluid production. They underscore that sperm competition may strongly shape sperm traits and could drive reproductive performance in eusocial hymenopterans.
... The genetic diversity among 560 accessions was determined based on different genetic parameters including minor allele frequency (MAF), polymorphic information content (PIC), expected heterozygosity (He), observed heterozygosity (Ho) and inbreeding coefficient (FIS) using PLINK version 1.90 (Purcell et al. 2007) and TASSEL v5 (Glaubitz et al. 2014). Population genetic structure among accessions was described using four complementary approaches: 1) distance-based hierarchical clustering analysis performed by calculating pairwise genetic distance (identity-bystate, IBS) matrix among all individual accessions using PLINK 1.90 (Purcell et al. 2007) followed by a Ward's minimum variance hierarchical cluster dendrogram from the IBS matrix using the Analyses of Phylogenetics and Evolution (APE) package (Paradis et al. 2004) in R (R Core Team 2017); 2) a model-based maximum likelihood estimation of ancestry fraction of individual accessions using ADMIXTURE (Alexander and Lange 2011) which assumes linkage equilibrium among loci and Hardy-Weinberg equilibrium within ancestral populations. In ADMIXTURE analysis, we implemented the number of subpopulations (K) varying from 2 to 20 for the analysis. ...
Article
Full-text available
The wealth of sorghum genetic resources in Africa has not been fully exploited for cultivar development in the continent. Hybrid cultivars developed from locally evolved germplasm are more likely to possess a well-integrated assembly of genes for local adaptation, productivity, quality, as well as for defensive traits and broader stability. A subset of 560 sorghum accessions of known fertility reaction representing the major botanical races and agro-ecologies of Ethiopia were characterized for genetic, agronomic and utilization parameters to lay a foundation for cultivar improvement and parental selection for hybrid breeding. Accessions were genotyped using a genotyping by sequencing (GBS) generating 73,643 SNPs for genetic analysis. Significant genetic variability was observed among accessions with Admixture and Discriminant Analysis of Principal Components where 67% of the accessions fell into K=10 clusters with membership coefficient set to > 0.6. The pattern of aggregation of the accessions partially overlapped with racial category and agro-ecological adaptation. Majority of the non-restorer (B-line) accessions primarily of the bicolor race from the wet highland ecology clustered together away from two clusters of fertility restorer (R-line) accessions. Small members of the B accessions were grouped with the R clusters and in vice-versa while significant numbers of both B and R accessions were spread between the major clusters. Such pattern of diversity along with the complementary agronomic data based information indicate the potential for heterosis providing the foundation for initiating hybrid breeding program based on locally adapted germplasm.
Article
Spiders and ants are infrequent types of prey in the diet of spiders. Both spider- and ant-eating were found in thomisid (crab) spiders but their origin remains unclear. Our goal was to gather data on spider- and ant-eating habits in thomisid spiders, construct a family-level phylogeny, and estimate when these habits evolved. Using prey acceptance experiments, we found 21 spider- and 18 ant-eating genera; based on photographic evidence there were 14 spider- and 20 ant-eating genera; and based on literature there were six spider- and seven ant-eating genera. Altogether we found evidence for 28 spider- and 30 ant-eating genera. We performed the most extensive molecular phylogenetic analysis of Thomisidae to date, using representatives of 75 nominal genera. The resulting topology was congruent with previous studies: Thomisidae were shown to be monophyletic; the genus Borboropactus was identified as a sister group to the remaining thomisids; the current subfamilies emerged as para- or polyphyletic, and Aphantochilinae was monophyletic and rendered Strophiinae paraphyletic within the ‘Thomisus clade’. Ancestral state reconstruction estimated both spider- and ant-eating as ancestral states, suggesting that common ancestors of Thomisidae were euryphagous predators that included spiders but also ants in their diet.
Article
Full-text available
Phylogenies reconstructed from contemporary taxa do not contain information about lineages that have gone extinct. We derive probability models for such phylogenies, allowing real data to be compared with specified null models of evolution, and lineage birth and death rates to be estimated.
Article
Phylogenies reconstructed from gene sequences can be used to investigate the tempo and mode of species diversification. Here we develop and use new statistical methods to infer past patterns of speciation and extinction from molecular phylogenies. Specifically, we test the null hypothesis that per-lineage speciation and extinction rates have remained constant through time. Rejection of this hypothesis may provide evidence for evolutionary events such as adaptive radiations or key adaptations. In contrast to previous approaches, our methods are robust to incomplete taxon sampling and are conservative with respect to extinction. Using simulation we investigate, first, the adverse effects of failing to take incomplete sampling into account and, second, the power and reliability of our tests. When applied to published phylogenies our tests suggest that, in some cases, speciation rates have decreased through time.
Article
A new method for estimating divergence times when evolutionary rates are variable across lineages is proposed. The method, called nonparametric rate smoothing (NPRS), relies on minimization of ancestor-descendant local rate changes and is motivated by the likelihood that evolutionary rates are autocorrelated in time. Fossil information pertaining to minimum and/or maximum ages of nodes in a phylogeny is incorporated into the algorithms by constrained optimization techniques. The accuracy of NPRS was examined by comparison to a clock-based maxi-mum-likelihood method in computer simulations. NPRS provides more accurate estimates of divergence times when (1) sequence lengths are sufficiently long, (2) rates are truly nonclocklike, and (3) rates are moderately to highly autocorrelated in time. The algorithms were applied to estimate divergence times in seed plants based on data from the chloroplast rbcL gene. Both constrained and unconstrained NPRS methods tended to produce divergence time estimates more consistent with paleobotanical evidence than did clock-based estimates.
Article
In this article we discuss our experience designing and implementing a statistical computing language. In developing this new language, we sought to combine what we felt were useful features from two existing computer languages. We feel that the new language provides advantages in the areas of portability, computational efficiency, memory management, and scoping.
Article
Comparative studies of the relationship between 2 phenotypes, or between a phenotype and an environment, are frequently carried out by invalid statistical methods. Most regression, correlation, and contingency table methods, including nonparametric methods, assume that the points are drawn independently from a common distribution. When species are taken from a branching phylogeny, they are manifestly nonindependent. Use of a statistical method that assumes independence will cause overstatement of the significance in hypothesis tests. Some illustrative examples of these phenomena are given, and limitations of previous proposals of ways to correct for the nonindependence discussed. A method of correcting for the phylogeny is proposed. It requires that we know both the tree topology and the branch lengths, and that we be willing to allow the characters to be modeled by Brownian motion on a linear scale. Given these conditions, the phylogeny specifies a set of contrasts among species, contrasts that are statistically independent and can be used in regression or correlation studies. -from Author
Book
From Darwin onward, it has been second nature for evolutionary biologists to think comparatively because comparisons establish the generality of evolutionary phenomena. Do large genomes slow down development? What lifestyles select for large brains? Are extinction rates related to body size? These are all questions for the comparative method, and this book is about how such questions can be answered. The first chapter elaborates on suitable questions for the comparative approach and shows how it complements other approaches to problem-solving in evolution. The second chapter identifies the biological causes of similarity among closely related species for almost any observed character. The third chapter discusses methods for reconstructing phylogenetic trees and ancestral character states. The fourth chapter sets out to develop statistical tests that will determine whether different characters that exist in discrete states show evidence for correlated evolution. Chapter 5 turns to comparative analyses of continuously varying characters. Chapter 6 looks at allometry to exemplify the themes and methods discussed earlier, while the last chapter looks to future development of the comparative approach in both molecular and organismic biology. Japanese translation (1997) The Comparative Method in Evolutionary Biology. Hokkaido University Press in cooperation with Oxford University Press.
Article
Variable molecular sequences sampled from a population can be used to infer its dynamic history. Graphical methods are developed and applied to real data, illustrating ways of navigating through hypothesis space with two landmarks for reference: constant population size and exponentially growing population size.
Article
A new simple method for classifying genes is proposed based on Klastorin's method. This method classifies genes into monophyletic groups which are made distinct from each other by evolutionary changes. The method is applicable as long as the phylogenetic tree of genes is obtained. There is a fast algorithm for obtaining the classification. A bootstrap test of a classification is also presented. As an example, we classified opsin genes. The classification obtained by this method is the same as the previous classification based on the function of opsins.
Article
Phylogenies reconstructed from gene sequences can be used to investigate the tempo and mode of species diversification. Here we develop and use new statistical methods to infer past patterns of speciation and extinction from molecular phylogenies. Specifically, we test the null hypothesis that per-lineage speciation and extinction rates have remained constant through time. Rejection of this hypothesis may provide evidence for evolutionary events such as adaptive radiations or key adaptations. In contrast to previous approaches, our methods are robust to incomplete taxon sampling and are conservative with respect to extinction. Using simulation we investigate, first, the adverse effects of failing to take incomplete sampling into account and, second, the power and reliability of our tests. When applied to published phylogenies our tests suggest that, in some cases, speciation rates have decreased through time.
Article
We present an intuitive visual framework, the generalized skyline plot, to explore the demographic history of sampled DNA sequences. This approach is based on a genealogy inferred from the sequences and provides a nonparametric estimate of effective population size through time. In contrast to previous related procedures, the generalized skyline plot is more applicable to cases where the underlying tree is not fully resolved and the data is not highly variable. This is achieved by the grouping of adjacent coalescent intervals. We employ a small-sample Akaike information criterion to objectively choose the optimal grouping strategy. We investigate the performance of our approach using simulation and subsequently apply it to HIV-1 sequences from central Africa and mtDNA sequences from red pandas.