Article

On incomplete sampling under birth-death models and connections to the sampling-based coalescent

Authors:
To read the full-text of this research, you can request a copy directly from the author.

Abstract

The constant rate birth-death process is used as a stochastic model for many biological systems, for example phylogenies or disease transmission. As the biological data are usually not fully available, it is crucial to understand the effect of incomplete sampling. In this paper, we analyze the constant rate birth-death process with incomplete sampling. We derive the density of the bifurcation events for trees on n leaves which evolved under this birth-death-sampling process. This density is used for calculating prior distributions in Bayesian inference programs and for efficiently simulating trees. We show that the birth-death-sampling process can be interpreted as a birth-death process with reduced rates and complete sampling. This shows that joint inference of birth rate, death rate and sampling probability is not possible. The birth-death-sampling process is compared to the sampling-based population genetics model, the coalescent. It is shown that despite many similarities between these two models, the distribution of bifurcation times remains different even in the case of very large population sizes. We illustrate these findings on an Hepatitis C virus dataset from Egypt. We show that the transmission times estimates are significantly different-the widely used Gamma statistic even changes its sign from negative to positive when switching from the coalescent to the birth-death process.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

... Phylodynamic models can be classified into two main families: coalescent (Volz et al., 2009;Drummond et al., 2005;Pybus et al., 2000) and birth-death (BD) (Kendall, 1948;Maddison et al., 2007;Stadler, 2009Stadler, , 2010. Coalescent models are often preferred for estimating deterministic population dynamics; however, BD models are better adapted for highly stochastic processes, such as the dynamics of emerging pathogens (Macpherson et al., 2021). ...
... Models of the BD family are phylodynamic analogies of compartmental models in classical epidemiology (e.g., SIR, Susceptible-Infectious-Recovered (Hethcote, 2000)). Many extensions of the classical BD model with incomplete sampling (BDS (Stadler, 2009)) were developed over time, including multi-type birth-death (MTBD) models . They add a population structure to the classical birth-death process by allowing for different types of individuals. ...
... The model parameters can be estimated with maximum-likelihood or Bayesian methods (Bouckaert et al., 2019) by exploring the likelihood (or posterior probability) landscape of trees. However, the closed form solution of the master equations exists only for the initial BDS model (Stadler, 2009), while for its extensions (like the BDEI model and MTBD models in general) the master equations for likelihood calculation need to be resolved with numerical methods. The complexity of the master equations and their boundary conditions (which recursively depend on the tree evolution later in time), make their numerical resolution challenging and time consuming (Scire et al., 2022;Voznica et al., 2022). ...
Article
Full-text available
Multi-type birth-death (MTBD) models are phylodynamic analogies of compartmental models in classical epidemiology. They serve to infer such epidemiological parameters as the average number of secondary infections Re and the infectious time from a phylogenetic tree (a genealogy of pathogen sequences). The representatives of this model family focus on various aspects of pathogen epidemics. For instance, the birth-death exposed-infectious (BDEI) model describes the transmission of pathogens featuring an incubation period (when there is a delay between the moment of infection and becoming infectious, as for Ebola and SARS-CoV-2), and permits its estimation along with other parameters. With constantly growing sequencing data, MTBD models should be extremely useful for unravelling information on pathogen epidemics. However, existing implementations of these models in a phylodynamic framework have not yet caught up with the sequencing speed. Computing time and numerical instability issues limit their applicability to medium data sets ( 500 samples), while the accuracy of estimations should increase with more data. We propose a new highly parallelizable formulation of ordinary differential equations for MTBD models. We also extend them to forests to represent situations when a (sub-)epidemic started from several cases (e.g., multiple introductions to a country). We implemented it for the BDEI model in a maximum likelihood framework using a combination of numerical analysis methods for efficient equation resolution. Our implementation estimates epidemiological parameter values and their confidence intervals in two minutes on a phylogenetic tree of 10 000 samples. Comparison to the existing implementations on simulated data shows that it is not only much faster, but also more accurate. An application of our tool to the 2014 Ebola epidemic in Sierra-Leone is also convincing, with very fast calculation and precise estimates. As MTBD models are closely related to Cladogenetic State Speciation and Extinction (ClaSSE)-like models, our findings could also be easily transferred to the macroevolution domain.
... The coalescent, with the assumption of an underlying deterministic population growth model such as logistic growth, provides the basis for clone growth rate estimation using the phylodyn R package (Fabre et al. 2022;Karcher et al. 2017;Van Egeren et al. 2021), as well as a method called Phylofit (Mitchell et al. 2022;Williams et al. 2022). Alternatively, the package BEAST 2 (Bouckaert et al. 2019) enables phylodynamic inference either by using the coalescent method or by modeling the population as a birth-death process, which allows the population size to vary stochastically without relying on coalescent approximations (Boskova et al. 2014;Stadler 2009). Due to the lack of an analytical solution for confidence intervals, these previous approaches estimate the growth rate using Markov chain Monte Carlo (MCMC), Integrated nested Laplace approximations (INLA), or Approximate Bayesian computation (ABC). ...
... In this section, we provide our estimates for clonal growth parameters under a wide range of applicable modeling assumptions, then apply them to simulated and real data. We also compare our results to those produced using Phylofit, a recent coalescent-based MCMC approach (Williams et al. 2022), and a birth-death MCMC approach introduced by Stadler (Stadler 2009). ...
... We did not compare to the performance of their Approximate Bayesian computation (ABC)-based estimates, but note that the authors show a strong correlation between estimates from Phylofit and the ABC-based method (correlation coefficient r = 0.96) (Williams et al. 2022). We also compared our methods to another MCMC approach based on the birth-death model using the likelihood given in Equation (5) by Stadler (Stadler 2009). Whereas Phylofit is based on Kingman's coalescent assuming logistic population growth, the method based on Stadler's work models the population as a birth-death process, and assumes each individual is sampled with some fixed probability ρ. ...
Article
Full-text available
Motivation While evolutionary approaches to medicine show promise, measuring evolution itself is difficult due to experimental constraints and the dynamic nature of body systems. In cancer evolution, continuous observation of clonal architecture is impossible, and longitudinal samples from multiple timepoints are rare. Increasingly available DNA sequencing datasets at single-cell resolution enable the reconstruction of past evolution using mutational history, allowing for a better understanding of dynamics prior to detectable disease. There is an unmet need for an accurate, fast, and easy-to-use method to quantify clone growth dynamics from these datasets. Results We derived methods based on coalescent theory for estimating the net growth rate of clones using either reconstructed phylogenies or the number of shared mutations. We applied and validated our analytical methods for estimating the net growth rate of clones, eliminating the need for complex simulations used in previous methods. When applied to hematopoietic data, we show that our estimates may have broad applications to improve mechanistic understanding and prognostic ability. Compared to clones with a single or unknown driver mutation, clones with multiple drivers have significantly increased growth rates (median 0.94 vs. 0.25 per year; p = 1.6×10−6). Further, stratifying patients with a myeloproliferative neoplasm (MPN) by the growth rate of their fittest clone shows that higher growth rates are associated with shorter time to MPN diagnosis (median 13.9 vs. 26.4 months; p = 0.0026). Availability and Implementation We developed a publicly available R package, cloneRate, to implement our methods (Package website: https://bdj34.github.io/cloneRate/). Source code: https://github.com/bdj34/cloneRate/. Supplementary information Supplementary material is available at Bioinformatics online.
... Chang et al. 2020) a stochastic polytomy resolver. In brief, this program uses taxonomic information to add unsampled taxa as polytomies and resolves them using birth and death rates of the birth-death estimator (Stadler 2009). The program requires a taxonomic list designating all species (sampled and unsampled) to ranks (e.g., family, genus), and a time-calibrated phylogeny to serve as a backbone. ...
... It uses the taxonomic list to determine speciation events of all the ranks in the taxonomic list on the time-calibrated tree (i.e., wait times). These wait times are used to calculate maximum likelihood estimates of birth and death rates from the birth-death sampling equation (Stadler 2009). It then adds unsampled taxa to appropriate clade and re-estimates waiting times from the birth-death distribution (Stadler 2009) using the birth and death rates that were initially estimated. ...
... These wait times are used to calculate maximum likelihood estimates of birth and death rates from the birth-death sampling equation (Stadler 2009). It then adds unsampled taxa to appropriate clade and re-estimates waiting times from the birth-death distribution (Stadler 2009) using the birth and death rates that were initially estimated. TACT and methods like it are naïve to trait evolution. ...
Article
Full-text available
Embioptera display variability in egg-handling as part of their defense against natural enemies. Because species living in tropical regions experience potentially higher risks of predation than those in temperate climes, we hypothesized that variable risk might explain this variability. We used actual evapotranspiration (AET) rates as a stand-in for climate, region, and potential interactions with natural enemies. We predicted that more complex investments, such as coating individual eggs, organizing them, and topping the cluster with thick silk would co-occur with greater predation threats in tropical regions, scored as higher AET. We predicted that simpler organization of eggs would occur where predator risk would be lower, as in temperate regions (lower AET). We used phylogenetic comparative methods to assess whether more complex egg handling behavior correlated with high AET scores. We quantified five traits of egg handling from field and laboratory evidence for 29 species from habitats ranging from low to high AET. Initial pGLS and pGLM analyses showed a weak effect of AET on parental care index. Upon exclusion of three exotic species spread artificially by trade and collected outside their native ranges, we found strong effects of predation threat in both pGLS and pGLM analyses. These analyses revealed that species that experience potentially greater predation threats exhibited behaviors that corresponded to more complex handling and organization of eggs by the mother. These results align nicely with analyses that also detected that additional lines of defense of eggs typify the behavior of tropical species of other primitively social arthropods.
... BD models have been utilized in thousands of published studies (11)(12)(13), despite possessing known and somewhat troubling limitations. Stadler (14) showed there exist different birth-death models that have the same likelihood in terms of observable data. In statistical terms, this implies that the BD model is unidentifiable without further assumptions. ...
... Extant timetrees are assumed to be stochastically generated by a BD process (4,14). This process has three parameters: two positive rate functions λ : R ≥0 → R >0 and μ : R ≥0 → R >0 and an initial sampling fraction ρ ∈ (0, 1]. ...
... In the definition and in what follows, we assume that the sampling fraction ρ ∈ (0, 1] is a fixed, known parameter. This is necessary because if ρ is allowed to vary, then as noted in the Introduction, Stadler (14) has shown that even the constant-rates BD model is unidentifiable. ...
Article
Full-text available
In a striking result, Louca and Pennell [S. Louca, M. W. Pennell, Nature 580, 502–505 (2020)] recently proved that a large class of phylogenetic birth–death models is statistically unidentifiable from lineage-through-time (LTT) data: Any pair of sufficiently smooth birth and death rate functions is “congruent” to an infinite collection of other rate functions, all of which have the same likelihood for any LTT vector of any dimension. As Louca and Pennell argue, this fact has distressing implications for the thousands of studies that have utilized birth–death models to study evolution. In this paper, we qualify their finding by proving that an alternative and widely used class of birth–death models is indeed identifiable. Specifically, we show that piecewise constant birth–death models can, in principle, be consistently estimated and distinguished from one another, given a sufficiently large extant timetree and some knowledge of the present-day population. Subject to mild regularity conditions, we further show that any unidentifiable birth–death model class can be arbitrarily closely approximated by a class of identifiable models. The sampling requirements needed for our results to hold are explicit and are expected to be satisfied in many contexts such as the phylodynamic analysis of a global pandemic.
... Thus, this system has been mapped onto a Hidden Markov Model (HMM) where the processes of genetic drift and measurement noise determine the transition and emission probabilities, respectively [25,26]. Methods often assume uniform sampling of infected individuals from the population [22,23,27], but this assumption does not usually hold outside of surveillance studies. A recent study accounted for overdispersed sampling of sequences in the inference of fitness coefficients of SARS-CoV-2 variants, but assumes constant overdispersion over time [28]; in reality, the observation process may change over time due to changes in testing intensity between locations and subpopulations. ...
... We observe that measurement noise of SARS-CoV-2 is mostly indistinguishable from uniform sampling, but data from some variants at some times do exhibit more elevated measurement noise than uniform sampling. Thus, we expect that assuming uniform sampling, as many methods do, or constant overdispersion will lead to accurate estimates for this dataset [22,23,27,28]. The number of SARS-CoV-2 sequences from England is extremely high and sampling biases are expected to be low, because of efforts to reduce sampling biases by sampling somewhat uniformly from the population through the COVID-19 Infection Survey [36] (from which a subset of positives are sequenced and included in the COG-UK surveillance sequencing data that we use). ...
Article
Full-text available
Genetic drift in infectious disease transmission results from randomness of transmission and host recovery or death. The strength of genetic drift for SARS-CoV-2 transmission is expected to be high due to high levels of superspreading, and this is expected to substantially impact disease epidemiology and evolution. However, we don’t yet have an understanding of how genetic drift changes over time or across locations. Furthermore, noise that results from data collection can potentially confound estimates of genetic drift. To address this challenge, we develop and validate a method to jointly infer genetic drift and measurement noise from time-series lineage frequency data. Our method is highly scalable to increasingly large genomic datasets, which overcomes a limitation in commonly used phylogenetic methods. We apply this method to over 490,000 SARS-CoV-2 genomic sequences from England collected between March 2020 and December 2021 by the COVID-19 Genomics UK (COG-UK) consortium and separately infer the strength of genetic drift for pre-B.1.177, B.1.177, Alpha, and Delta. We find that even after correcting for measurement noise, the strength of genetic drift is consistently, throughout time, higher than that expected from the observed number of COVID-19 positive individuals in England by 1 to 3 orders of magnitude, which cannot be explained by literature values of superspreading. Our estimates of genetic drift suggest low and time-varying establishment probabilities for new mutations, inform the parametrization of SARS-CoV-2 evolutionary models, and motivate future studies of the potential mechanisms for increased stochasticity in this system.
... Modern methods of reconstructing phylogenetic trees depend rather critically on an understanding of simple, neutral stochastic models of the interaction between mutation and speciation which produce 'prior' probability distributions for use in Bayesian-type analyses (Felsenstein 2004;Drummond and Rambaut 2007;Mulder and Crawford 2015). The most widely used of these models for speciation and extinction is the birth-death process (Kendall 1948;Nee et al. 1994;Rannala and Yang 1996;Nee 2006;Gernhard 2008;Gernhard et al. 2008;Stadler 2009), which simplifies to the birth-only or Yule process in the absence of extinction events (Yule 1924). According to this latter model, extant species have a constant chance of diverging into two branches in unit time. ...
... A similar problem was discussed by Stadler (2009). However, in that study an improper uniform prior for the time of origin of a tree was assumed, leading to a posterior distribution for the age of a clade of known size (Gernhard 2008) which differs from that used in the present analysis, which is based on what may be termed a 'Kendall prior' for this distribution, to be derived in the next section. ...
Article
In this contribution, a general expression is derived for the probability density of the time to the most recent common ancestor (TMRCA) of a simple birth-death tree, a widely used stochastic null-model of biological speciation and extinction, conditioned on the constant birth and death rates and number of extant lineages. This density is contrasted with a previous result which was obtained using a uniform prior for the time of origin. The new distribution is applied to two problems of phylogenetic interest. First, that of the probability of the number of taxa existing at any time in the past in a tree of a known number of extant species, and given birth and death rates, and second, that of determining the TMRCA of two randomly selected taxa in an unobserved tree that is produced by a simple birth-only, or Yule, process. In the latter case, it is assumed that only the rate of bifurcation (speciation) and the size, or number of tips, are known. This is shown to lead to a closed-form analytical expression for the probability distribution of this parameter, which is arrived at based on the known mathematical form of the age distribution of Yule trees of a given size and branching rate, which is derived here de novo, and a similar distribution which additionally is conditioned on tree age. The new distribution is the exact Yule prior for divergence times of pairs of taxa under the stated conditions and is potentially useful in statistical (Bayesian) inference studies of phylogenies.
... Typically, phylogenetic networks do not include all extant taxa or they lack fossil specimens that can provide information about extinct lineages. We can model incomplete lineage sampling by pruning away unsampled lineages (Figure 4), leaving what is often referred to as the reconstructed or sampled phylogenetic network (Gernhard, 2008;Stadler, 2009). Indeed, it is necessary to account for incomplete sampling as it affects expected branch-length distributions (Nee et al., 1994;Stadler, 2008). ...
... Since each type of hybridization necessitates a different number of speciation events to explain the same number of lineages, over-attributing a specific type of hybridization likely would lead to bias in diversification-rate estimates. Additionally, both sampling only extant taxa (Nee et al., 1994) and incomplete sampling (Stadler, 2008(Stadler, , 2009, are known change our expectations about the birth-death process and resulting distributions of bifurcating trees. However, it is not well characterized how these processes change our expectations of the birth-death-hybridization process. ...
Article
Full-text available
Gene flow is increasingly recognized as an important macroevolutionary process. The many mechanisms that contribute to gene flow (e.g. introgression, hybridization, lateral gene transfer) uniquely affect the diversification of dynamics of species, making it important to be able to account for these idiosyncrasies when constructing phylogenetic models. Existing phylogenetic‐network simulators for macroevolution are limited in the ways they model gene flow. We present SiPhyNetwork , an R package for simulating phylogenetic networks under a birth–death‐hybridization process. Our package unifies the existing birth–death‐hybridization models while also extending the toolkit for modelling gene flow. This tool can create patterns of reticulation such as hybridization, lateral gene transfer, and introgression. Specifically, we model different reticulate events by allowing events to either add, remove or keep constant the number of lineages. Additionally, we allow reticulation events to be trait dependent, creating the ability to model the expanse of isolating mechanisms that prevent gene flow. This tool makes it possible for researchers to model many of the complex biological factors associated with gene flow in a phylogenetic context.
... Since the phylogenetic sample size is given, n, the Yule process should be conditioned on having n tips: such conditioned branching processes have received significant attention in recent years, due to e.g. Aldous and Popovic [2005], Gernhard [2008], Mooers et al. [2012], Stadler [2009Stadler [ , 2011, . This "tree-free" approach for comparative phylogenetics was previously addressed by Sagitov and Bartoszek [2012] and Crawford and Suchard [2013], [much earlier Edwards, 1970, used a related branching Brownian process as a population genetics model]. ...
... see Fig 1, left panel (all simulations are produced using the TreeSim [Stadler, 2009[Stadler, , 2011 and mvSLOUCH [Bartoszek et al., 2012] R packages). It follows that the normalized sample variance ...
Preprint
We consider a branching particle system where particles reproduce according to the pure birth Yule process with the birth rate L, conditioned on the observed number of particles to be equal n. Particles are assumed to move independently on the real line according to the Brownian motion with the local variance s2. In this paper we treat n particles as a sample of related species. The spatial Brownian motion of a particle describes the development of a trait value of interest (e.g. log-body-size). We propose an unbiased estimator Rn2 of the evolutionary rate r2=s2/L. The estimator Rn2 is proportional to the sample variance Sn2 computed from n trait values. We find an approximate formula for the standard error of Rn2 based on a neat asymptotic relation for the variance of Sn2.
... Typically, phylogenetic networks do not include all extant taxa or they lack fossil specimens that can provide information about extinct lineages. We can model incomplete lineage sampling by pruning away unsampled lineages (Figure 4), leaving what is often referred to as the reconstructed or sampled phylogenetic network (Gernhard, 2008;Stadler, 2009). Indeed, it is necessary to account for incomplete sampling as it affects expected branchlength distributions (Nee et al., 1994;Stadler, 2008). ...
... Since each type of hybridization necessitates a different number of speciation events to explain the same number of lineages, over-attributing a specific type of hybridization likely would lead to bias in diversification estimates. Additionally, both sampling only extant taxa (Nee et al., 1994) and incomplete sampling (Stadler, 2008(Stadler, , 2009, change our expectations about the birth-death process. However, it is not well characterized how these processes change our expectations of the birth-death-hybridization process. ...
Preprint
Full-text available
Gene flow is increasingly recognized as an important macroevolutionary process. The many mechanisms that contribute to gene flow ( e . g ., introgression, hybridization, lateral gene transfer) uniquely affect the diversification of dynamics of species, making it important to be able to account for these idiosyncrasies when constructing phylogenetic models. Existing phylogenetic-network simulators for macroevolution are limited in the ways they model gene flow. We present SiPhyNetwork, an R package for simulating phylogenetic networks under a birth-death-hybridization process. Our package unifies the existing birth-death-hybridization models while also extending the toolkit for modeling gene flow. This tool can create patterns of reticulation such as hybridization, lateral gene transfer, and introgression. Specifically, we model different reticulate events by allowing events to either add, remove, or keep constant the number of lineages. Additionally, we allow reticulation events to be trait-dependent, creating the ability to model the expanse of isolating mechanisms that prevent gene flow. This tool makes it possible for researchers to model many of the complex biological factors associated with gene flow in a phylogenetic context.
... . [8] Becoming infectious changes the corresponding individual's state but does not affect the total number of infected individuals, transmissions increase the total number, and removal decreases it. Note that only individuals in state I can transmit or be removed: ...
... Combining [8] and [9] we rewrite [7] as a system of multivariate algebraic equations: ...
Preprint
Full-text available
Multi-type birth-death (MTBD) models are phylodynamic analogies of compartmental models in classical epidemiology. They serve to infer such epidemiological parameters as the average number of secondary infections R e and the infectious time from a phylogenetic tree (a genealogy of pathogen sequences). The representatives of this model family focus on various aspects of pathogen epidemics. For instance, the birth-death exposed-infectious (BDEI) model describes the transmission of pathogens featuring an incubation period (when there is a delay between the moment of infection and becoming infectious, as for Ebola and SARS-CoV-2), and permits its estimation along with other parameters. With constantly growing sequencing data, MTBD models should be extremely useful for unravelling information on pathogen epidemics. However, existing implementations of these models in a phylodynamic framework have not yet caught up with the sequencing speed. Computing time and numerical instability issues limit their applicability to medium data sets (≤ 500 samples), while the accuracy of estimations should increase with more data. We propose a new highly parallelizable formulation of ordinary differential equations for MTBD models. We also extend them to forests to represent situations when a (sub-)epidemic started from several cases (e.g. multiple introductions to a country). We implemented it for the BDEI model in a maximum likelihood framework using a combination of numerical analysis methods for efficient equation resolution. Our implementation estimates epidemiological parameter values and their confidence intervals in four minutes on a phylogenetic tree of 10 000 samples. Comparison to the existing implementations on simulated data shows that it is not only 30 000 times faster, but also more accurate. An application of our tool to the 2014 Ebola epidemic in Sierra-Leone is also convincing, with very fast calculation and precise estimates. Phylodynamics, Epidemiology, Mathematical modelling, Ordinary Differential Equations, Birth-Death models, Ebola
... In line with this need for a more appropriate model, Nee et al. (1994) derived the probability distribution of a tree from which the extinct lineages have been removed (called a reconstructed tree) under the birth-death model of Kendall (1948). Later this work was extended to account for incomplete taxon sampling (Stadler, 2009). ...
... Stadler (2009) derived the probability distribution of a reconstructed tree with incomplete (and random) taxon sampling under the constant BD model. Her model assumes that each extant species isFig. ...
Article
Full-text available
The field of phylogenetics has burgeoned into a great diversity of statistical models, providing researchers with a vast amount of analytical tools for investigating the evolutionary theory. This abundance of theoretical work has the merit that many different aspects of evolution can be investigated using various types of data. However, empiricists may sometimes struggle to find the right model for their needs amid such variety. In particular, some computer programs gather the theory of many different models, published in hundreds of different papers, within the same operational framework. This makes it particularly difficult for users to obtain comprehensive information about the assumptions and structure of various models. Yet, a large part of phylogenetic models are structured in individual modules that can be linked together in the same conceptual framework, akin to some sort of phylogenetic supermodel. In this paper, we propose to browse through the network of phylogenetic models, emphasizing their modular structure, with the purpose to outline the commonalities and differences of individual models. Focusing on probabilistic models, we describe how to go from the model assumptions to the corresponding probability distributions as pedagogically as possible. To achieve this task, we resort heavily on graph theory to represent the probabilistic relationships among parameters and data, and present the models in their most elementary form (i.e. including parameters that are generally marginalized out), which simplifies the mathematics considerably. We concentrate on models designed for species trees, but evoke the link with other types of trees (e.g. gene trees).
... The BI analyses were compiled through BEAST 1.8.2 (Bayesian Evolutionary Analysis Sampling Tree; Drummond et al. 2012). A birth-death incomplete sampling speciation model (Stadler 2009) was selected. Four independent Markov chain Monte Carlo (MCMC) chains of 20 million generations each were run. ...
Article
Full-text available
Agaricus is a genus with more than 500 species. Most of the new species reported since 2000 are tropical or subtropical. The study area, the Malakand region, located in the north of Pakistan, has a subtropical climate. In this study, nine species, including three new species, of Agaricus subgenus Pseudochitonia, are reported from this region. Description of the new species are based on morphological characteristics and phylogenetic analyses using three DNA regions: nuc ribosomal DNA internal transcribed spacers (ITS), fragments of the large subunit of nuc ribosomal DNA (28S), and the translation elongation factor 1-alpha gene (TEF1). One new species, Agaricus lanosus, with wooly squamules on its cap, forms a lineage within Agaricus sect. Bivelares and cannot be classified with certainty in one of the two subsections (Cupressorum and Hortenses) of this section. Agaricus rhizoideus with rhizoid-like structure at the base of the stipe forms a basal clade in Agaricus sect. Hondenses. Specimens of the third new species, Agaricus malakandensis, form a species-level clade within Agaricus sect. Catenulati and exhibits the morphological characteristics of this section. Due to their similar ITS sequences, two previously unnamed specimens from Thailand (A. sp. LD2012162 and CA799) are considered conspecific with A. malakandensis.
... Instead, we take advantage of the branching structure of a phylogeny T generated 329 under a birth-death-sampling process [71,72]. The likelihood that the phylogeny 330 evolved as observed under this model is determined by the rates at which lineages are 331 born via transmission, die or are removed from the infectious population, and are 332 sampled. ...
Preprint
Full-text available
Antimicrobial resistant pathogens such as Escherichia coli sequence type 131 (ST131) pose a serious threat to public health globally. In the United States, ST131 acquired multiple antimicrobial resistance (AMR) genes and rapidly grew to its current high prevalence in healthcare settings. Notably, this coincided with the introduction and widespread use of antibiotics such as fluoroquinolones, suggesting AMR as the major driver of ST131’s expansion. Yet, within ST131, there remains considerable diversity between strains in resistance profiles and their repertoires of virulence factors, stress factors, plasmids, and other accessory elements. Understanding which genomic features contribute to ST131’s competitive advantage and their relative effects on population-level fitness therefore poses a considerable challenge. Here we use phylodynamic birth-death models to estimate the relative fitness of different ST131 lineages from bacterial phylogenies. By extending these phylodynamic methods to allow multiple genomic features to shape bacterial fitness, we further quantify the relative contribution of individual AMR genes to ST131’s fitness. Our analysis indicates that while many genomic elements, including various AMR genes, virulence factors, and plasmids, have all contributed substantially to ST131’s rapid growth, major increases in ST131’s fitness are largely attributable to mutations in gyrase A that confer resistance to fluoroquinolones. Author summary ST131 is a pandemic lineage of E. coli that has spread globally and is now responsible for a large percentage of blood and urinary tract infections that cannot be treated with many common antibiotics. While antibiotic resistance has undoubtedly given ST131 a competitive edge, the relative importance of resistance compared with other factors shaping a pathogen’s growth or transmission potential (i.e. fitness) is often difficult to measure in natural settings. Here, we present a method that allows us to look at the entire spectrum of factors determining a pathogen’s fitness and estimate the individual contribution of each component to pathogen’s overall fitness. Our results suggest that resistance to fluoroquinolones, a widely used class of antibiotics, provides ST131 with a disproportionately large fitness advantage relative to many other factors with more moderate fitness effects. Understanding what determines the fitness of ST131 therefore provides insights that can be used to curb the spread of resistance and monitor for emerging lineages with high pandemic potential due to shared fitness enhancing attributes.
... Maximum Likelihood (ML) phylogeny was performed with RAxML-HPC Black-Box, implemented on CIPRES Science Gateway(Miller et al. 2010; StamatakisTable 1. Taxa included in the molecular phylogenetic analyses. Death Incomplete Sampling speciation model(Stadler 2009) was selected. Four independent runs were performed with BEAST on XSEDE tool on the CIPRES Science Gateway(Miller et al. 2010). ...
Article
Full-text available
Hymenagaricus has small to medium-sized mushrooms and the cap surface with squamulose pellicles, consisting of hymeniform or pseudoparenchymatous cells and yellowish-brown basidiospores. The species of Hymenagaricus are very similar to those of Xanthagaricus and it is extremely difficult to differentiate the species of both genera in the field. However, phylogenetically, both the genera are clearly distinct. In this study, we describe two new species of Hymenagaricus, i.e. H. wadijarzeezicus and H. parvulus from the southern part of Oman. Species descriptions are based on a combination of morphological characteristics of basidiomata and phylogenetic analyses of three gene regions: internal transcribed spacer (ITS1-5.8S-ITS2 = ITS), the large subunit of nuclear ribosomal DNA (28S) and translation elongation factor one alpha (EF-1α). Full descriptions, micrographs and illustration of anatomical features, basidiomata photos and phylogenetic analyses results of the new taxa are provided. Morphological comparisons of new taxa with similar species and a key to species included in the phylogenetic analyses are also provided.
... Because clade membership of unsequenced taxa is not certain (see Results), we assigned different sampling probabilities for Neo-Astragalus (0.4472) and the Eurasian clades (0.2041), as this was the lowest phylogenetic level at which applying taxonomic species count data for previously unsequenced species was feasible. Because outgroup sampling practices violate sampling assumptions of many birth-death models (Stadler, 2009;Spasojevic et al., 2021) and models that can incorporate imbalanced sampling still have poor performance under sparse sampling of higher-level taxa (Sun et al., 2020a), diversification estimation was limited to the ingroup, the Astragalus s.s. Markov chain Monte Carlo (MCMC) was run in four chains for 64.4 million generations (stopped based on ESS > 400 and strong convergence), δT = 0.01, swap period = 1000 generations, and sampling every 100,000th generation; further parameters followed BAMM documentation (http://bamm-project.org/). ...
Article
Full-text available
Premise Astragalus (Fabaceae), with more than 3000 species, represents a globally successful radiation of morphologically highly similar species predominant across the northern hemisphere. It has attracted attention from systematists and biogeographers, who have asked what factors might be behind the extraordinary diversity of this important arid‐adapted clade and what sets it apart from close relatives with far less species richness. Methods Here, for the first time using extensive phylogenetic sampling, we asked whether (1) Astragalus is uniquely characterized by bursts of radiation or whether diversification instead is uniform and no different from closely related taxa. Then we tested whether the species diversity of Astragalus is attributable specifically to its predilection for (2) cold and arid habitats, (3) particular soils, or to (4) chromosome evolution. Finally, we tested (5) whether Astragalus originated in central Asia as proposed and (6) whether niche evolutionary shifts were subsequently associated with the colonization of other continents. Results Our results point to the importance of heterogeneity in the diversification of Astragalus , with upshifts associated with the earliest divergences but not strongly tied to any abiotic factor or biogeographic regionalization tested here. The only potential correlate with diversification we identified was chromosome number. Biogeographic shifts have a strong association with the abiotic environment and highlight the importance of central Asia as a biogeographic gateway. Conclusions Our investigation shows the importance of phylogenetic and evolutionary studies of logistically challenging “mega‐radiations.” Our findings reject any simple key innovation behind high diversity and underline the often nuanced, multifactorial processes leading to species‐rich clades.
... This leads to a statistically coherent prior on divergence times, where the variance associated with node ages reflects the incompleteness of the fossil record, as well as uncertainty associated with the placement of fossil taxa in the phylogeny. Bayesian inference using the FBD process as a tree prior also allows a reliable estimation of the diversification and sampling parameters, although at least one of the parameters is required to be fixed by the user for the model to be identifiable (Stadler 2009;Gavryushkina et al. 2014). In most empirical applications, the extant sampling proportion is the best known and is thus chosen to be fixed. ...
Article
Full-text available
The fossilized birth-death (FBD) process provides an ideal model for inferring phylogenies from both extant and fossil taxa. Using this approach, fossils are directly integrated into the tree, leading to a statistically coherent prior on divergence times. Since fossils are typically not associated with molecular sequences, additional information is required to place fossils in the tree. We use simulations to evaluate two different approaches to handling fossil placement in FBD analyses: using topological constraints, where the user specifies monophyletic clades based on established taxonomy, or using total-evidence analyses, which use a morphological data matrix in addition to the molecular alignment. We also explore how rate variation in fossil recovery or diversification rates impacts these approaches. We find that the extant topology is well recovered under all methods of fossil placement. Divergence times are similarly well recovered across all methods, with the exception of constraints which contain errors. We see similar patterns in datasets which include rate variation, however, relative errors in extant divergence times increase when more variation is included in the dataset, for all approaches using topological constraints, and particularly for constraints with errors. Finally, we show that trees recovered under the FBD model are more accurate than those estimated using non-time calibrated inference. Overall, we show that both fossil placement approaches are reliable even when including uncertainty. Our results underscore the importance of core taxonomic research, including morphological data collection and species descriptions, irrespective of the approach to handling phylo-genetic uncertainty using the FBD process.
... ; https://doi.org/10. 1101/2023 To evaluate the sensitivity of the results to the choice of tree prior, we conducted analyses using both a Yule speciation model and a birth-death model for the tree prior (Stadler, 2009). These two models were compared using marginal likelihoods, calculated with the stepping-stone estimator (Xie et al., 2011), and the Bayes factor was interpreted according to the guidelines of Kass and Raftery (1995). ...
Preprint
Full-text available
The crustacean order Stomatopoda comprises approximately 500 species of mantis shrimps. These marine predators, common in tropical and subtropical waters, possess sophisticated visual systems and specialized hunting appendages. In this study, we infer the evolutionary relationships within Stomatopoda using a combined data set of 77 morphological characters, complete mitochondrial genomes, and three nuclear markers. Our data set includes representatives from all seven stomatopod superfamilies, including the first sequence data from Erythrosquilloidea. Using a Bayesian relaxed molecular clock with fossil-based calibration priors, we estimate that crown-group unipeltatan stomatopods appeared ~140 (95% credible interval 201-102) million years ago in the late Mesozoic. Additionally, our results support the hypothesis that specialized smashing and spearing appendages appeared early in the evolutionary history of Unipeltata. We found no evidence of a correlation between rates of morphological and molecular evolution across the phylogeny, but identified very high levels of among-lineage rate variation in the morphological characters. Our total-evidence analysis recovered evolutionary signals from both molecular and morphological data sets, demonstrating the merit in combining these sources of information for phylogenetic inference and evolutionary analysis.
... In order to provide an approximate time frame for the diversification of the tribe, divergence times were estimated within a Bayesian framework using the software BEAST v.2.6.7 (Bouckaert et al. 2019), employing an uncorrelated lognormal relaxed clock model (Drummond et al. 2006), and a birth-death speciation tree prior (Stadler 2009). Two distinct clock models were implemented, one for nuclear loci and one for mitochondrial loci, whereas the best-fitting partitioning scheme and substitution model were estimated with the package bModelTest (Bouckaert and Drummond 2017) implemented in BEAST. ...
Article
Full-text available
Although the complex evolutionary history of lichen-forming fungi has gained considerable attention, particularly regarding the long-debated role of these organisms in shaping early terrestrial ecosystems, the evolution of lichenivory and its potential impact on the diversification of lichenophages have been largely neglected. With > 800 described species worldwide and a broad geographical distribution, the tribe Helopini (Coleoptera: Tenebrionidae) represents a diverse, yet poorly studied, group of predominantly lichenophagous beetles. Using a dataset of 52 ingroup taxa and five gene fragments, a first phylogenetic hypothesis of the tribe was generated, which was subsequently used for reconstructing the ancestral state of the trophic and habitat associations of the beetles and for estimating a time frame of diversification. Our phylogenetic reconstruction sheds light on the higher-level systematics of the tribe, supporting the current subtribal division of the group while also providing a framework for understanding the intergeneric relationships within subtribes. The results also indicate an Early Cretaceous origin of the tribe, highlighting the close association between Helopini and lichen-forming fungi since the emergence of the group. Nevertheless, at least seven independent switches from lichenophagy to alternative feeding habits have occurred since the middle Eocene, which can be linked temporally to transitions from forests to open habitats.
... Any existing estimates of microbial diversification are derived from phylogenetic data (Louca et al., 2018;Scholl & Wiens, 2016). Due to the nearly nonexistent microbial fossil record, these phylogenies are constructed solely from molecular data, which may lead to incorrect rate estimation when diversification rates vary among lineages (Rabosky, 2010;Stadler, 2009). These phylogenies can also be generated by highly dissimilar birth-death processes that have divergent speciation and extinction dynamics (Louca & Pennell, 2020). ...
Article
Full-text available
Biologists have long sought to quantify the number of species on Earth. Often missing from these efforts is the contribution of microorganisms, the smallest but most abundant form of life on the planet. Despite recent large‐scale sampling efforts, estimates of global microbial diversity span many orders of magnitude. It is important to consider how speciation and extinction over the last 4 billion years constrain inventories of biodiversity. We parameterized macroevolutionary models based on birth–death processes that assume constant and universal speciation and extinction rates. The models reveal that richness beyond 10 ¹² species is feasible and in agreement with empirical predictions. Additional simulations suggest that mass extinction events do not place hard limits on modern‐day microbial diversity. Together, our study provides independent support for a massive global‐scale microbiome while shedding light on the upper limits of life on Earth.
... Finally, our ClaDS package contains a sampling parameter ρ, which represents the probability for each extant species to be sampled at present, taken to be identical across species. All these parameters can be estimated by the inference, however at least one of ρ, λ 0 and µ 0 or ϵ has to be fixed for the model to be identifiable (Stadler, 2009). In most macroevolution studies, we expect that ρ will be the most well-known parameter and thus the easiest to fix to the correct value. ...
Article
Full-text available
Bayesian phylogenetic inference requires a tree prior, which models the underlying diversification process which gives rise to the phylogeny. Existing birth-death diversification models include a wide range of features, for instance lineage-specific variations in speciation and extinction rates. While across-lineage variation in speciation and extinction rates is widespread in empirical datasets, few heterogeneous rate models have been implemented as tree priors for Bayesian phylogenetic inference. As a consequence, rate heterogeneity is typically ignored when reconstructing phylogenies, and rate heterogeneity is usually investigated on fixed trees. In this paper, we present a new BEAST2 package implementing the cladogenetic diversification rate shift (ClaDS) model as a tree prior. ClaDS is a birth-death diversification model designed to capture small progressive variations in birth and death rates along a phylogeny. Unlike previous implementations of ClaDS, which were designed to be used with fixed, user-chosen phylogenies, our package is implemented in the BEAST2 framework and thus allows full phylogenetic inference, where the phylogeny and model parameters are co-estimated from a molecular alignment. Our package provides all necessary components of the inference, including a new tree object and operators to propose moves to the MCMC. It also includes a graphical interface through BEAUti. We validate our implementation of the package by comparing the produced distributions to simulated data, and show an empirical example of the full inference, using a dataset of cetaceans.
... Two ways of sampling have been studied in depth: sampling each individual independently with probability p (Bernoulli sampling) or taking a sample of size n, uniformly from the population (uniform sampling). Stadler [40], considered a Bernoulli sample from a birth and death process, conditional on sampling n individuals. She gave an explicit formula for the joint density of the coalescent times when a uniform prior is assumed for the time from the origin. ...
Preprint
Consider a birth and death process started from one individual in which each individual gives birth at rate $\lambda$ and dies at rate $\mu$, so that the population size grows at rate $r = \lambda - \mu$. Lambert and Harris, Johnston, and Roberts came up with methods for constructing the exact genealogy of a sample of size $n$ taken from this population at time $T$. We use the construction of Lambert, which is based on the coalescent point process, to obtain asymptotic results for the site frequency spectrum associated with this sample. In the supercritical case $r > 0$, our results extend results of Durrett for exponentially growing populations. In the critical case $r = 0$, our results parallel those that Dahmer and Kersting obtained for Kingman's coalescent.
... (Darriba et al., 2012) for the partitions: HKY+ G (Hasegawa et al., 1985) for the plastid partitions (trnL-F, rps15-ycf1), GTR+I+G for ITS and GTR+G for ETS (Tavaré, 1986). The birth-death model with incomplete sampling (Stadler, 2009) was chosen because of incomplete species sampling. The uncorrelated relaxed clock model was used and unlinked between nucleotide and plastid sequences, the clock models were checked and affirmed in Tracer 1.7.1 based on the coefficients of variation (>0.1) (Drummond & Bouckaert, 2015). ...
Article
Full-text available
Aim Recent investigations on the floristic exchange between Southeast Asia and Australia have shown a clear dispersal directionality bias (West to East) of wet‐adapted plant taxa. However, dispersal routes and directions of wet forest taxa into the South Pacific remain insufficiently known. We here aimed to establish the most likely routes and directions of plant dispersal into the Southwest Pacific islands. Location Southeast Asia, East Asia, Australia, Southwest Pacific. Taxon Dysoxylum s.l. (Meliaceae). This includes Dysoxylum s.s., Didymocheton , Epicharis , Goniocheton , Pseudocarapa , and Prasoxylon . Method We sampled 75% of the species diversity in Dysoxylum s.l., covering the entire distribution range, all genera and major lineages. Phylogenetic relationships of 149 accessions were reconstructed using Bayesian Evolutionary Analysis and two internal constraints. The dispersal–extinction–cladogenesis variant, founder‐event speciation (DEC+J), was used for reconstructing the biogeographic history, and 100 Biogeographical Stochastic Mappings were simulated. Results Dysoxylum s.l. originated and firstly diversified in the western part of its current distribution range (including Indochina) during the Miocene to Pliocene, followed by an overall eastern range expansion towards Malesia, Australia and the Southwest Pacific in the Pliocene. Main Conclusions The south‐eastward expansion of lineages into Wallacea and Australia is in temporal agreement with the convergence of the Asian and Australian tectonic plates since the Miocene. Long‐distance dispersal is the main mechanism that led to the current distribution. Two dispersal pathways into the Southwest Pacific are identified, (1) through New Guinea and the Solomon Islands to Fiji, and (2) from New Zealand to Fiji. For both routes, Fiji was an important secondary source area for dispersal into the Southwest Pacific.
... After this procedure, 36 loci were selected for 76 tips (Table S6). These were concatenated into a single matrix (63 780 bp) and analyzed in BEAST v.1.10.4 using GTR+G substitution, the uncorrelated relaxed lognormal clock model (Drummond et al., 2006), and the birth-death model with incomplete sampling (Stadler, 2009). As the initial starting tree to inform the process, we used the IQ-TREE topology dated in TREEPL (Smith & O'Meara, 2012) with a uniform distribution of 109.821-120.5255 million yr ago (Ma) for the crown of Scrophulariceae obtained from Ram ırez- Barahona et al. (2020). ...
Article
Full-text available
The figwort family, Scrophulariaceae, comprises c. 2000 species whose evolutionary relationships at the tribal level have proven difficult to resolve, hindering our ability to understand their origin and diversification. We designed a specific probe kit for Scrophulariaceae, targeting 849 nuclear loci and obtaining plastid regions as by‐products. We sampled c. 87% of the genera described in the family and use the nuclear dataset to estimate evolutionary relationships, timing of diversification, and biogeographic patterns. Ten tribes, including two new tribes, Androyeae and Camptolomeae, are supported, and the phylogenetic positions of Androya, Camptoloma, and Phygelius are unveiled. Our study reveals a major diversification at c. 60 million yr ago in some Gondwanan landmasses, where two different lineages diversified, one of which gave rise to nearly 81% of extant species. A Southern African origin is estimated for most modern‐day tribes, with two exceptions, the American Leucophylleae, and the mainly Australian Myoporeae. The rapid mid‐Eocene diversification is aligned with geographic expansion within southern Africa in most tribes, followed by range expansion to tropical Africa and multiple dispersals out of Africa. Our robust phylogeny provides a framework for future studies aimed at understanding the role of macroevolutionary patterns and processes that generated Scrophulariaceae diversity.
... facchinii 2', S. facchinii sequences were labeled with the suffix '2'. Time-calibrated phylogenetic trees were calculated with STARBEAST2 v0.15.5 (Ogilvie et al., 2017) as implemented in BEAST v2.6.3 (Bouckaert et al., 2019), using an uncorrelated lognormal relaxed molecular clock (Drummond et al., 2006) with a birth-death speciation process (Gernhard, 2008;Stadler, 2009) run them for a total of 450 million generations. Every 5000th generation was sampled. ...
Article
Full-text available
Saxifraga section Saxifraga subsection Arachnoideae is a lineage of 12 species distributed mainly in the European Alps. It is unusual in terms of ecological diversification by containing both high elevation species from exposed alpine habitats and low elevation species from shady habitats such as overhanging rocks and cave entrances. Our aims are to explore which of these habitat types is ancestral, and to identify the possible drivers of this remarkable ecological diversification. Using a Hybseq DNA‐sequencing approach and a complete species sample we reconstructed and dated the phylogeny of subsection Arachnoideae. Using Landolt indicator values, this phylogenetic tree was used for the reconstruction of the evolution of temperature, light and soil pH requirements in this lineage. Diversification of subsection Arachnoideae started in the late Pliocene and continued through the Pleistocene. Both diversification among and within clades was largely allopatric, and species from shady habitats with low light requirements are distributed in well‐known refugia. We hypothesize that low light requirements evolved when species persisting in cold‐stage refugia were forced into marginal habitats by more competitive warm‐stage vegetation. While we do not claim that such competition resulted in speciation, it very likely resulted in adaptive evolution. Saxifraga sect. Saxifraga subsect. Arachnoideae is unusual in terms of ecological diversification by containing both high elevation species from exposed alpine habitats and low elevation species from shady habitats. Based on phylogenetic relationships, the geographical distribution of species in relation to refugial areas in the Alps and a reconstruction of ecological preferences we hypothesize that low light requirements evolved when species persisting in cold‐stage refugia were forced into marginal habitats by more competitive warm‐stage vegetation.
... Here we perform an extensive simulation study to investigate how well the new mvSLOUCH package is able to perform model selection. The tree is simulated as a pure birth tree using TreeSim::sim.bd.taxa() (Stadler, 2009(Stadler, , 2011 for a number of tips, n = 32, 64,128,256,512,1024,2048. For comparison between all the simulations, all the phylogenies are rescaled to height 1. ...
Article
Full-text available
The advent of fast computational algorithms for phylogenetic comparative methods allows for considering multiple hypotheses concerning the co-adaptation of traits and also for studying if it is possible to distinguish between such models based on contemporary species measurements. Here we demonstrate how one can perform a study with multiple competing hypotheses using mvSLOUCH by analysing two data sets, one concerning feeding styles and oral morphology in ungulates, and the other concerning fruit evolution in Ferula (Apiaceae). We also perform simulations to determine if it is possible to distinguish between various adaptive hypotheses. We find that Akaike's information criterion corrected for small sample size has the ability to distinguish between most pairs of considered models. However, in some cases there seems to be bias towards Brownian motion or simpler Ornstein-Uhlenbeck models. We also find that measurement error and forcing the sign of the diagonal of the drift matrix for an Ornstein-Uhlenbeck process influences identifiability capabilities. It is a cliché that some models, despite being imperfect, are more useful than others. Nonetheless, having a much larger repertoire of models will surely lead to a better understanding of the natural world, as it will allow for dissecting in what ways they are wrong.
... The ITS was not further divided into three partitions as overpartitioning negatively affects MCMC convergence (Rannala 2002, Fenn et al. 2008). The following priors were entered into BEAUti to generate an XML file: the selected substitution models for ITS and LSU, strict molecular clock, a birth-death tree prior accommodated for incomplete sampling (Stadler 2009) to model the speciation of nodes in the topology, with a randomly generated starting tree, 40 million generations, and 4,000 as sampling frequency. Four independent runs were performed using BEAST on XSEDE (Drummond et al. 2012) in CIPRES (Miller et al. 2010). ...
Article
Full-text available
Fungi in the order Laboulbeniales (Ascomycota, Laboulbeniomycetes) are obligate, microscopic ectoparasites of arthropods. These fungi, unlike their close relatives, never form hyphae. Instead, they produce a three-dimensional thallus that consists of several hundred to a thousand vegetative cells derived from a two-celled ascospore by determinate mitotic divisions. Of 2,325 described species, 80 % are known from beetles (Coleoptera). Hesperomyces is a genus of 11 species associated with ladybirds (Coleoptera, Coccinellidae) and false skin beetles (Biphyllidae). One species, Hesperomyces virescens, is known from all continents except Australia and Antarctica, and has been reported on 30 ladybird hosts in 20 genera. Previous work, based on geometric morphometrics, molecular phylogeny, sequence-based species delimitation methods, and host information, pointed out that He. virescens is a complex of multiple species segregated by host. Here, we formally describe the most recorded species in the complex, Hesperomyces harmoniae-parasite of the harlequin ladybird Harmonia axyridis, a globally invasive species. Using DNA isolates of Hesperomyces from multiple host species, including the host on which He. virescens was originally described (Chilocorus stigma), we found that He. harmoniae forms a single clade in our phylogenetic reconstruction of a two-locus riboso-mal dataset. Hesperomyces harmoniae is currently known from five continents and 31 countries: Canada, El Salvador, Mexico, the USA (North America); Argentina, Colombia, Ecuador (South America); Austria, Belgium, Bulgaria, Croatia, Czech Republic, France, Germany, Greece, Hungary, Italy, Luxembourg, Montenegro, The Netherlands, Poland, Romania, Russia, Serbia, Slovakia, Switzerland, the UK (Europe); South Africa (Africa); China, Japan, and Turkey (Asia).
... Birth-death models describe the proliferation of infections forward in time, where a "birth" event represents the transmission of an infection to an uninfected susceptible host, and a "death" event can represent either the diagnosis and treatment of an infection, or its spontaneous clearance by the host [59]. This class of models was originally formulated to describe the proliferation of species through speciation and extinction [60]. ...
... These are primarily generative models for phylogenetic trees and models of molecular sequence evolution. Generative models of phylogenetic trees, such as Kingman's coalescent [37] and Birth-Death-Sampling speciation processes [50], can be used to infer parameters related to population dynamics from genetic sequence data. TreeFlow implements models of nucleotide sequence evolution such as the Jukes-Cantor [34], HKY85 [27], and General Time Reversible (GTR) [54] models. ...
Preprint
Full-text available
Probabilistic programming frameworks are powerful tools for statistical modelling and inference. They are not immediately generalisable to phylogenetic problems due to the particular computational properties of the phylogenetic tree object. TreeFlow is a software library for probabilistic programming and automatic differentiation with phylogenetic trees. It implements inference algorithms for phylogenetic tree times and model parameters given a tree topology. We demonstrate how TreeFlow can be used to quickly implement and assess new models. We also show that it provides reasonable performance for gradient-based inference algorithms compared to specialized computational libraries for phylogenetics.
... Namely, we investigated how many population dynamic parameters we can concurrently estimate. [26] showed that under birth-death sampling models as used here, at most two out of the three parameters (cell division rate, apoptosis rate and sampling proportion) can be obtained from a reconstructed tree. We first evaluated the information content of the data based on our simulations where editing occurred for half of the experimental time span. ...
Article
Full-text available
The development of organisms and tissues is dictated by an elaborate balance between cell division, apoptosis and differentiation: the cell population dynamics. To quantify these dynamics, we propose a phylodynamic inference approach based on single-cell lineage recorder data. We developed a Bayesian phylogenetic framework—time-scaled developmental trees (TiDeTree)—that uses lineage recorder data to estimate time-scaled single-cell trees. By implementing TiDeTree within BEAST 2, we enable joint inference of the time-scaled trees and the cell population dynamics. We validated TiDeTree using simulations and showed that performance further improves when including multiple independent sources of information into the inference, such as frequencies of editing outcomes or experimental replicates. We benchmarked TiDeTree against state-of-the-art methods and show comparable performance in terms of tree topology, plus direct assessment of uncertainty and co-estimation of additional parameters. To demonstrate TiDeTree’s use in practice, we analysed a public dataset containing lineage data from approximately 100 stem cell colonies. We estimated a time-scaled phylogeny for each colony; as well as the cell division and apoptosis rates underlying the growth dynamics of all colonies. We envision that TiDeTree will find broad application in the analysis of single-cell lineage tracing data, which will improve our understanding of cellular processes during development.
... The ITS and combined ITS-28S-TEF1α datasets were converted to XML files using BEAUti 1.8.4 (Bayesian Evolutionary Analysis Utility; Drummond et al. 2012). A Birth-Death Incomplete Sampling speciation model (Stadler 2009) was selected. Four independent MCMC chains of 20 million generations each were run. ...
Article
Full-text available
Fuscoporia is a large genus with approximately 80 known species, distributed in various climates from subtropical to temperate, across the all continents except Antarctica. Divergence times of Fuscoporia was estimated for the first time, using BEAST v. 1.8.4, with three internal calibration points. Using three DNA regions: internal transcribed spacers (ITS1-5.8S-ITS2 = ITS), D1/D2 domain of large subunit of nuclear ribosomal DNA (28S) and translation elongation factor 1 alpha gene (TEF1α), the ancestral age of the genus was estimated around 77 Myr (million years). Molecular clock analyses also indicate the presence of six major clades, and the stem ages of each were estimated below 50 Myr. These six clades could be used for infrageneric classification of Fuscoporia. Further, we also discussed the distribution of Fuscoporia species in various climates. We hypothesized that the ancestral species of the genus evolved during late Cretaceous period with resupinate fruiting body in subtropics of Southern Asia. Furthermore, we also described a new species in the genus, Fuscoporia dhofarensis from Dhofar region, located in the southern part of Oman. Species description is based on morphological characteristics of fruiting body and phylogenetic analyses of ITS, 28S and TEF1α regions. The new species is characterized by a pileate fruiting body, with dimitic hyphal system, broadly ellipsoid basidiospores.
... A useful tool for characterising the properties of coalescent trees generated by a homogeneous birth-death (BD) is the reversed-reconstructed process (RRP) (Aldous and Popovic, 2005;Gernhard, 2008), which generally assumes an improper prior on the time since initiation of the process, starting with a single ancestor. By exploiting known results for expected waiting times between coalescent events for a sampled RRP (Stadler, 2009;Wiuf, 2018;Ignatieva et al., 2020) we characterise the properties of the coalescent for a super-critical Feller diffusion and calculate sampling distributions for a multi-type branching diffusion with neutral mutations to first order in small mutation rates. ...
Preprint
Full-text available
Consider the diffusion process defined by the forward equation $u_t(t, x) = \tfrac{1}{2}\{x u(t, x)\}_{xx} - \alpha \{x u(t, x)\}_{x}$ for $t, x \ge 0$ and $-\infty < \alpha < \infty$, with an initial condition $u(0, x) = \delta(x - x_0)$. This equation was introduced and solved by Feller to model the growth of a population of independently reproducing individuals. We explore important coalescent processes related to Feller's solution. For any $\alpha$ and $x_0 > 0$ we calculate the distribution of the random variable $A_n(s, t)$, defined as the finite number of ancestors at a time $s$ in the past of a sample of size $n$ taken from the infinite population of a Feller diffusion at a time $t$ since since its initiation. We illustrate two applications: The first is an efficient derivation of the stationary sampling distribution of a subcritical diffusion of a population of multiple types undergoing neutral mutations between types. The second is the construction of a coalescent tree for a supercritical diffusion assuming a uniform prior on the time since initiation of the tree, from which sampling distributions for a multi-type population undergoing slow, neutral mutations are derived.
... Branch support is based on an ultrafast bootstrap analysis [50], run with 1000 bootstrap replicates with a minimum correlation coefficient of 0.99. Bayesian analysis was performed using BEAST v1.10.4 [51], with a dataset including no outgroups, a relaxed lognormal molecular clock and a birth-death incomplete sampling speciation tree prior [52]. Given the absence of an adequate fossil record to calibrate a molecular clock for our data, we used an estimated substitution rate for COI of 0.0169 ± 0.0019 [53], which has been the most commonly used in Collembola studies [54][55][56][57]. ...
Article
Full-text available
Collembola, commonly known as springtails, are important detritivores, abundant in leaf litter and soil globally. Springtails are wingless hexapods with many North American species having wide distributions ranging from as far as Alaska to Mexico. Here, we analyze the occurrence and intraspecific diversity of springtails with a globular body shape (Symphypleona and Neelipleona), in southern high Appalachia, a significant biodiversity hotspot. The peaks of high Appalachia represent ‘sky islands’ due to their physical isolation, and they host numerous endemic species in other taxa. We surveyed globular Collembola through COI metabarcoding, assessing geographic and genetic diversity across localities and species. Intraspecific diversity in globular Collembola was extremely high, suggesting that considerable cryptic speciation has occurred. While we were able to associate morphospecies with described species in most of the major families in the region (Dicyrtomidae, Katiannidae, Sminthuridae, and Sminthurididae), other families (Neelidae, and Arrhopalitidae) are in more pressing need of taxonomic revision before species identities can be confirmed. Due to poor representation in databases, and high intraspecific variability, no identifications were accomplished through comparison with available DNA barcodes.
... If we consider the pandemic on a longer time period, basic birth-death models (e.g. [29]) are not an appropriate choice, since the reproductive rate usually decreases with time as collective immunity builds up or as the susceptible population is exhausted. These limitations are often addressed in epidemiology using compartmental models, such as SI, SIS and SIR [30], or their stochastic realisations, which are also birth-death processes. ...
Article
Full-text available
Accurate simulation of complex biological processes is an essential component of developing and validating new technologies and inference approaches. As an effort to help contain the COVID-19 pandemic, large numbers of SARS-CoV-2 genomes have been sequenced from most regions in the world. More than 5.5 million viral sequences are publicly available as of November 2021. Many studies estimate viral genealogies from these sequences, as these can provide valuable information about the spread of the pandemic across time and space. Additionally such data are a rich source of information about molecular evolutionary processes including natural selection, for example allowing the identification of new variants with transmissibility and immunity evasion advantages. To our knowledge, there is no framework that is both efficient and flexible enough to simulate the pandemic to approximate world-scale scenarios and generate viral genealogies of millions of samples. Here, we introduce a new fast simulator VGsim which addresses the problem of simulation genealogies under epidemiological models. The simulation process is split into two phases. During the forward run the algorithm generates a chain of population-level events reflecting the dynamics of the pandemic using an hierarchical version of the Gillespie algorithm. During the backward run a coalescent-like approach generates a tree genealogy of samples conditioning on the population-level events chain generated during the forward run. Our software can model complex population structure, epistasis and immunity escape.
... Each gene region was assigned the substitution model of best fit as selected by Smart Model Selection (SMS, Lefort et al. 2017) using the Akaike Information Criterion. A birth-death process with incomplete sampling was used as the speciation prior (Stadler 2009) along with an uncorrelated lognormal relaxed molecular clock model (Drummond and Rambaut 2007). Time calibration was performed using a combination of fossil-based primary calibration points (Manchester 1994;Kovar-Eder et al. 2001), as well as secondary calibration points taken from other recently published timetrees (Strijk et al. 2014;Magallón et al. 2015). ...
Article
Diversification of woody plant lineages in New Zealand has unfolded in complex physiographic, climatic, and environmental contexts. Many tree and shrub lineages have existed in New Zealand since the late Cenozoic when Forest was the dominant biome, subsequently diversifying (or continuing to diversify) during the Pliocene/Pleistocene as Open (below treeline) and Alpine biomes emerged. We examine the links between biomes occupied, traits, and diversification. In particular, whether traits are phylogenetically conserved or ecologically constrained and their relationship to biomes occupied. We focus on Melicytus, Myrsine and Pseudopanax which occur across Forest, Open, and Alpine biomes. Our approach combines measured traits and modelled niche traits of extant species to examine the importance of biome occupancy and biome shifts on trait evolution in these lineages. Our results demonstrate trait values are filtered by biomes in these lineages and can predict biomes occupied. However, few biome shifts were associated with trait evolution, typically only biome shifts into extreme environments (Alpine) involved trait innovations. In addition to biomes, trait evolution can also be influenced by species age, trait lability and broad climatic change. Integrating functional traits in a phylogenetic framework can identify how evolutionary and ecological features create modern biogeographic patterns in New Zealand.
... A secondary calibration of divergence times was used to obtain an ultrametric Synalpheus phylogeny tree. A birth-death process (Stadler 2009) with an initial random tree was conducted. The TMP2 substitution model of selection was implemented with AC = AT, AG = CT, and CG = GT with empirical base frequencies and four gamma categories. ...
Article
Full-text available
Genome size (GS) or DNA nuclear content is considered a useful index for making inferences about evolutionary models and life history in animals, including taxonomic, biogeographical, and ecological scenarios. However, patterns of GS variation and their causes in crustaceans are still poorly understood. This study aimed to describe the GS of five Neotropical Synalpheus non-gambarelloides shrimps (S. apioceros, S. minus, S. brevicarpus, S. fritzmueller, and S. scaphoceris) and compare the C-values of all Caridea infraorder in terms of geography and phylogenetics. All animals were sampled in the coast of São Paulo State, Brazil, and GS was assessed by flow cytometry analysis (FCA). The C-values ranged from 7.89 pg in S. apioceros to 12.24 pg in S. scaphoceris. Caridean shrimps had higher GS than other Decapoda crustaceans. The results reveal a tendency of obtaining larger genomes in species with direct development in Synalpheus shrimps. In addition, a tendency of positive biogeographical (latitudinal) correlation with Caridea infraorder was also observed. This study provides remarkable and new protocol for FCA (using gating strategy for the analysis), which led to the discovery of new information regarding GS of caridean shrimps, especially for Neotropical Synalpheus, which represents the second-largest group in the Caridea infraorder.
... In BEAUti version 1.8.0, the sequence alignment was partitioned by genes with gene-specific nucleotide substitution models (see Table 3). An uncorrelated relaxed lognormal clock along with Birth-Death incomplete sampling tree prior (Stadler, 2009) was used. Since no fossil calibration was available to define the split between the genus Cremnoconchus and the rest of the members of Littorinidae, we used nine calibration points, based on available fossil records and geological events as given in Reid et al., (2012) to date the tree. ...
Article
Snails of the genus Cremnoconchus-the only freshwater members of the gastropod family Littorinidae-are endemic to the spray zones of numerous waterfalls in the Western Ghats of India. Cremnoconchus consists of nine described and possibly numerous undescribed species as many of these appear to be restricted to specific waterfalls. This is the first attempt at resolving the relationships between the various species in this genus and establishing its monophyly in the family. Further, we also undertake species delimitation analysis to characterize cryptic diversity in this group. Phylogenetic analyses based on nuclear and mitochondrial genes support the monophyly of Cremnoconchus within the family. A fossil-calibrated Bayesian time tree suggests that this freshwater lineage diverged from its marine counterparts around 90.40 million years ago. The separation of Crem-noconchus from its marine ancestors might have been facilitated by the break-up of Gondwana or fluctuating sea levels during this period. Species delimitation analysis retrieved 12 potentially undescribed species in this group. These species formed two distinct clades in the phylogeny, one largely confined to the northern Western Ghats and the other to the central Western Ghats. Species belonging to the northern and central Western Ghats seem to have separated around 56.11 mya, i.e. after the northern Western Ghats were formed. Additionally, spatial isolation due to the patchiness of suitable habitats (waterfalls) and low mobility might have facilitated their diversification.
... As one unit of time corresponds to one year, the estimated value for δ is given by 1/10 × 365 = 36.5. Using this, we fixed the uninfectious rate to be 36.5 to avoid nonidentifiability issues since we cannot estimate R, δ, and s simultaneously (Stadler, 2009;Louca and Pennell, 2020). For the GMRF smoothing prior, we chose a relatively uninformative hyperprior distribution with large variance for the parameters of the smoothing prior. ...
Thesis
Genetic sequences carry a wealth of information. Scientists and statisticians have utilized genetic variation data to answer a wide range of questions in evolutionary biology and epidemiology. With the advent of high throughput sequencing, the availability of genetic sequence data has exploded this century. While the unprecedented amount of genetic data available presents an opportunity to garner a deeper understanding about viruses and humans, making use of large volumes of genetic data is still a challenging problem. In what is to follow, we present three methods that tackle various problems analyzing genetic variation data. First, we introduce the framework known as the sequentially Markov coalescent (SMC), which enables likelihood based inference using hidden Markov models (HMMs) where the latent variables represent genealogies. While genealogies are continuous, HMMs are discrete, requiring SMC based methods to discretize genealogies. This discretization often leads to biased and noisy estimates of the population size history. We introduce a method that avoids the need for discretization leading to Bayesian and frequentist inference procedures that are faster and less biased than its predecessors. Additionally, while coalescent HMMs based on SMC can be decoded in linear time, there does not yet exist a linear time EM algorithm for coalescent HMMs based on SMC', the more accurate approximation. We present a linear time EM algorithm based on SMC'. Advantages of this method include increased accuracy, computation time, uncertainty quantification, and ability to incorporate regularization. Lastly, we present a new approach for estimating transmission and recovery rates of viruses using genetic sequence data. With the outbreak of the SARS-CoV-2, there are millions of genomic sequences available to analyze, but few methods to exploit the information contained in these sequences. By integrating recent advances in Bayesian inference and differentiable programming with phylodynamics, we provide a method capable of estimating transmission, recovery, and sampling of pathogens using thousands of sequences. We apply our method to SARS-CoV-2 data and find that our estimates of the effective reproductive number closely match other estimates from methods based on public health data.
... Phylogenetic trees and divergence times were estimated in beast 1.8.4 (Drummond & Rambaut, 2007) run on the CIPRES portal cluster (Miller et al., 2010). The Ensete fossil (Manchester & Kress, 1993) was used as a minimum age constraint of 43 beast was run with three independent Markov chains of 10 E8 generations each, sampling every 30,000th generation, with a lognormal relaxed clock as clock prior and a Birth-Death Incomplete Sampling (Stadler, 2009) as tree prior using their default settings. ...
Article
Aim The relationships between biome shifts and global environmental changes in temperate zone habitats have been extensively explored; yet, the historical dynamics of taxa found in the tropical rain forest (TRF) remain poorly known. This study aims to reconstruct the relationships between tropical rain forest shifts and global environmental changes through the patterns of historical biogeography of a pantropical family of monocots, the Zingiberaceae. Location Global. Taxon Zingiberaceae. Methods We sampled DNA sequences (nrITS, trn K, trn L‐ trn F and psb A‐ trn H) from GenBank for 77% of the genera, including 30% of species, in the Zingiberaceae. Global fossil records of the Zingiberaceae were collected from literatures. Rates of speciation, extinction and diversification were estimated based on phylogenetic data and fossil records through methods implemented in BAMM. Ancestral ranges were estimated using single‐tree BioGeoBEARS and multiple‐trees BioGeoBEARS in RASP. Dispersal rate through time and dispersal rate among regions were calculated in R based on the result of ancestral estimation. Results The common ancestor of the Zingiberaceae likely originated in northern Africa during the mid‐Cretaceous, with later dispersal to the Asian tropics. Indo‐Burma, rather than Malesia, was likely a provenance of the common ancestor of Alpinioideae–Zingiberoideae. Several abrupt shifts of evolutionary rates from the Palaeocene were synchronized with sudden global environmental changes. Main conclusions Integrating phylogenetic patterns with fossil records suggests that the Zingiberaceae dispersed to Asia through drift of the Indian Plate from Africa in the late Palaeocene. Formation of island chains, land corridors and warming temperatures facilitated the emigration of the Zingiberaceae to a broad distribution across the tropics. Moreover, dramatic fluctuations of the speciation rate of Zingiberoideae appear to have been synchronized with global climate fluctuations. In general, the evolutionary history of the Zingiberaceae broadens our understanding of the association between TRF shifts in distribution and past global environmental changes, especially the origin of TRF in Southeast Asia.
... Nee, May and Harvey (1994) and Gernhard (2008) showed that the same result can be obtained when conditioning on the number of tips. The fact that Equation (2) can be obtained as a completely observed Markovian process is the result of Stadler (2009Stadler ( , 2010, who showed that the BDSP can be interpreted as a birth-death process with reduced rates and complete sampling. Finally, extended the result under piece-wise constant birth, death and sampling rates. ...
Article
Genomic surveillance of SARS-CoV-2 has been instrumental in tracking the spread and evolution of the virus during the pandemic. The availability of SARS-CoV-2 molecular sequences isolated from infected individuals, coupled with phylodynamic methods, have provided insights into the origin of the virus, its evolutionary rate, the timing of introductions, the patterns of transmission, and the rise of novel variants that have spread through populations. Despite enormous global efforts of governments, laboratories, and researchers to collect and sequence molecular data, many challenges remain in analyzing and interpreting the data collected. Here, we describe the models and methods currently used to monitor the spread of SARS-CoV-2, discuss long-standing and new statistical challenges, and propose a method for tracking the rise of novel variants during the epidemic.
Article
Estimating how traits evolved and impacted diversification across the tree of life represents a critical topic in ecology and evolution. Although there has been considerable research in comparative biology, large parts of the tree of life remain underexplored. Sharks are an iconic clade of marine vertebrates, and key components of marine ecosystems since the early Mesozoic. However, few studies have addressed how traits evolved or whether they impacted their extant diversity patterns. Our study aimed to fill this gap by reconstructing the largest time-calibrated species-level phylogeny of sharks and compiling an exhaustive database for ecological (diet, habitat) and biological (reproduction, maximum body length) traits. Using state-of-the-art models of evolution and diversification, we outlined the major character shifts and modes of trait evolution across shark species. We found support for sequential models of trait evolution and estimated a small to medium-sized lecithotrophic and coastal-dwelling most recent common ancestor for extant sharks. However, our exhaustive hidden traits analyses do not support trait-dependent diversification for any examined traits, challenging previous works. This suggests that the role of traits in shaping sharks' diversification dynamics might have been previously overestimated and should motivate future macroevolutionary studies to investigate other drivers of diversification in this clade.
Preprint
Interpretations of biodiversity patterns across timescales necessarily assume that fundamental processes of evolution do not change over time. A recurring pattern across a variety of biological datasets, from genomes to the fossil record, has shown evolutionary rates increasing toward the present or over shorter time scales, however, indicating potentially fundamental but unknown processes unifying evolutionary timescales. We demonstrate that these patterns are statistical artifacts of time-independent errors present across ecological and evolutionary datasets, which produce hyperbolic patterns of rates through time. Specifically, no matter how time-independent error and measurements of evolutionary change are distributed, estimated rates exhibit a hyperbolic pattern when plotted against time. Our findings validate assumptions of uniformitarianism but pose new challenges for the rate estimates foundational to other macroevolutionary models. One-Sentence Summary Errors in ecological and evolutionary data lead to observed patterns of increasing macroevolutionary rates toward the present.
Article
Full-text available
Background and Aims Biogeographic relationships between the Canary Islands and northwest Africa are often explained by oceanic dispersal and geographic proximity. Sister-group relationships between Canarian and eastern African/Arabian taxa, the “Rand Flora” pattern, are rare among plants, and have been attributed to the extinction of northwestern African populations. Euphorbia balsamifera is the only representative species of this pattern that is distributed in the Canary Islands and northwest Africa; it is also one of few species present in all seven islands. Previous studies placed E. balsamifera African populations as sister to the Canarian populations, but they were based on herbarium samples with highly degraded DNA. Here, we test the extinction hypothesis by sampling new continental populations; we also expand the Canarian sampling to examine the dynamics of island colonization and diversification. Methods Using target enrichment with genome skimming, we reconstructed phylogenetic relationships within E. balsamifera, and between this species and its disjunct relatives. A SNP dataset obtained from the target sequences was used to infer population-genetic diversity patterns. We employed convolutional neural networks (CNNs) to discriminate among alternative Canary Islands colonization scenarios. Key results Results confirm the Rand Flora sister-group relationship between western E. balsamifera and E. adenensis in the Eritreo-Arabian region, and recover an eastern-western geographic structure among E. balsamifera Canarian populations. CNNs supported a scenario of east-to-west island colonization, followed by population extinctions in Lanzarote and Fuerteventura and recolonization from Tenerife and Gran Canaria; a signal of admixture between the eastern island and northwest African populations was recovered. Conclusions Populations of E. balsamifera from northwest Africa are not the remnants of an ancestral stock, but originated from migration events from Lanzarote and Fuerteventura. These results support the Surfing Syngameon Hypothesis for the colonization of the Canary Islands by E. balsamifera, but also a recent back-colonization to the continent.
Article
Using phylogenies of present‐day species to estimate diversification rate trajectories—speciation and extinction rates over time—is a challenging task due to non‐identifiability issues. Given a phylogeny, there exists an infinite set of trajectories that result in the same likelihood; this set has been coined a congruence class. Previous work has developed approaches for sampling trajectories within a given congruence class, with the aim to assess the extent to which congruent scenarios can vary from one another. Based on this sampling approach, it has been suggested that rapid changes in speciation or extinction rates are conserved across the class. Reaching such conclusions requires to sample the broadest possible set of distinct trajectories. We introduce a new method for exploring congruence classes that we implement in the R package CRABS. Whereas existing methods constrain either the speciation rate or the extinction rate trajectory, ours provides more flexibility by sampling congruent speciation and extinction rate trajectories simultaneously. This allows covering a more representative set of distinct diversification rate trajectories. We also implement a filtering step that allows selecting the most parsimonious trajectories within a class. We demonstrate the utility of our new sampling strategy using a simulated scenario. Next, we apply our approach to the study of mammalian diversification history. We show that rapid changes in speciation and extinction rates need not be conserved across a congruence class, but that selecting the most parsimonious trajectories shrinks the class to concordant scenarios. Our approach opens new avenues both to truly explore the myriad of potential diversification histories consistent with a given phylogeny, embracing the uncertainty inherent to phylogenetic diversification models, and to select among these different histories. This should help refining our inference of diversification trajectories from extant data.
Article
Birth-death models are widely used in combination with species phylogenies to study past diversification dynamics. Current inference approaches typically rely on likelihood-based methods. These methods are not generalizable, as a new likelihood formula must be established each time a new model is proposed; for some models such formula is not even tractable. Deep learning can bring solutions in such situations, as deep neural networks can be trained to learn the relation between simulations and parameter values as a regression problem. In this paper, we adapt a recently developed deep learning method from pathogen phylodynamics to the case of diversification inference, and we extend its applicability to the case of the inference of state-dependent diversification models from phylogenies associated with trait data. We demonstrate the accuracy and time efficiency of the approach for the time constant homogeneous birth-death model and the Binary-State Speciation and Extinction model. Finally, we illustrate the use of the proposed inference machinery by reanalyzing a phylogeny of primates and their associated ecological role as seed dispersers. Deep learning inference provides at least the same accuracy as likelihood-based inference while being faster by several orders of magnitude, offering a promising new inference approach for deployment of future models in the field.
Article
Recent theoretical work on phylogenetic birth-death models offers differing viewpoints on whether they can be estimated using lineage-through-time data. Louca and Pennell (2020) showed that the class of models with continuously differentiable rate functions is nonidentifiable: any such model is consistent with an infinite collection of alternative models, which are statistically indistinguishable regardless of how much data are collected. Legried and Terhorst (2022) qualified this grave result by showing that identifiability is restored if only piecewise constant rate functions are considered. Here, we contribute new theoretical results to this discussion, in both the positive and negative directions. Our main result is to prove that models based on piecewise polynomial rate functions of any order and with any (finite) number of pieces are statistically identifiable. In particular, this implies that spline-based models with an arbitrary number of knots are identifiable. The proof is simple and self-contained, relying mainly on basic algebra. We complement this positive result with a negative one, which shows that even when identifiability holds, rate function estimation is still a difficult problem. To illustrate this, we prove some rates-of-convergence results for hypothesis testing using birth-death models. These results are information-theoretic lower bounds which apply to all potential estimators.
Preprint
Full-text available
Birth-death models are widely used in combination with species phylogenies to study past diversification dynamics. Current inference approaches typically rely on likelihood-based methods. These methods are not generalizable, as a new likelihood formula must be established each time a new model is proposed; for some models such formula is not even tractable. Deep learning can bring solutions in such situations, as deep neural networks can be trained to learn the relation between simulations and parameter values as a regression problem. In this paper, we adapt a recently developed deep learning method from pathogen phylodynamics to the case of diversification inference, and we extend its applicability to the case of the inference of state-dependent diversification models from phylogenies associated with trait data. We demonstrate the accuracy and time efficiency of the approach for the time constant homogeneous birth-death model and the Binary-State Speciation and Extinction model. Finally, we illustrate the use of the proposed inference machinery by reanalyzing a phylogeny of primates and their associated ecological role as seed dispersers. Deep learning inference provides at least the same accuracy as likelihood-based inference while being faster by several orders of magnitude, offering a promising new inference approach for deployment of future models in the field.
Article
Full-text available
In the simplest phylogenetic diversification model (the pure-birth Yule process), lineages split independently at a constant rate λ for time t. The length of a randomly chosen edge (either interior or pendant) in the resulting tree has an expected value that rapidly converges to 12λ as t grows, and thus is essentially independent of t. However, the behaviour of the length L of the longest pendant edge reveals remarkably different behaviour: L converges to t/2 as the expected number of leaves grows. Extending this model to allow an extinction rate μ (where μ < λ), we also establish a similar result for birth–death trees, except that t/2 is replaced by t/2 ⋅ (1 - μ/λ). This ‘complete’ tree may contain subtrees that have died out before time t; for the ‘reduced tree’ that just involves the leaves present at time t and their direct ancestors, the longest pendant edge length L again converges to t/2. Thus, there is likely to be at least one extant species whose associated pendant branch attaches to the tree approximately half-way back in time to the origin of the entire clade. We also briefly consider the length of the shortest edges. Our results are relevant to phylogenetic diversity indices in biodiversity conservation, and to quantifying the length of aligned sequences required to correctly infer a tree. We compare our theoretical results with simulations, and with the branch lengths from a recent phylogenetic tree of all mammals.
Article
Full-text available
As shown during the SARS-CoV-2 pandemic, phylogenetic and phylodynamic methods are essential tools to study the spread and evolution of pathogens. One of the central assumptions of these methods is that the shared history of pathogens isolated from different hosts can be described by a branching phylogenetic tree. Recombination breaks this assumption. This makes it problematic to apply phylogenetic methods to study recombining pathogens, including, for example, coronaviruses. Here, we introduce a Markov chain Monte Carlo approach that allows inference of recombination networks from genetic sequence data under a template switching model of recombination. Using this method, we first show that recombination is extremely common in the evolutionary history of SARS-like coronaviruses. We then show how recombination rates across the genome of the human seasonal coronaviruses 229E, OC43 and NL63 vary with rates of adaptation. This suggests that recombination could be beneficial to fitness of human seasonal coronaviruses. Additionally, this work sets the stage for Bayesian phylogenetic tracking of the spread and evolution of SARS-CoV-2 in the future, even as recombinant viruses become prevalent. Genetic recombination can confound standard phylogenetic approaches. Here, the authors present a method to reconstruct virus recombination networks, and show the importance of recombination in shaping the ongoing evolution of SARS-like, MERS and 3 human seasonal coronaviruses.
Article
Full-text available
An improved Bayesian method is presented for estimating phylogenetic trees using DNA sequence data. The birth-death process with species sampling is used to specify the prior distribution of phylogenies and ancestral speciation times, and the posterior probabilities of phylogenies are used to estimate the maximum posterior probability (MAP) tree. Monte Carlo integration is used to integrate over the ancestral speciation times for particular trees. A Markov Chain Monte Carlo method is used to generate the set of trees with the highest posterior probabilities. Methods are described for an empirical Bayesian analysis, in which estimates of the speciation and extinction rates are used in calculating the posterior probabilities, and a hierarchical Bayesian analysis, in which these parameters are removed from the model by an additional integration. The Markov Chain Monte Carlo method avoids the requirement of our earlier method for calculating MAP trees to sum over all possible topologies (which limited the number of taxa in an analysis to about five). The methods are applied to analyze DNA sequences for nine species of primates, and the MAP tree, which is identical to a maximum-likelihood estimate of topology, has a probability of approximately 95%.
Article
Full-text available
Molecular sequences obtained at different sampling times from populations of rapidly evolving pathogens and from ancient subfossil and fossil sources are increasingly available with modern sequencing technology. Here, we present a Bayesian statistical inference approach to the joint estimation of mutation rate and population size that incorporates the uncertainty in the genealogy of such temporally spaced sequences by using Markov chain Monte Carlo (MCMC) integration. The Kingman coalescent model is used to describe the time structure of the ancestral tree. We recover information about the unknown true ancestral coalescent tree, population size, and the overall mutation rate from temporally spaced data, that is, from nucleotide sequences gathered at different times, from different individuals, in an evolving haploid population. We briefly discuss the methodological implications and show what can be inferred, in various practically relevant states of prior knowledge. We develop extensions for exponentially growing population size and joint estimation of substitution model parameters. We illustrate some of the important features of this approach on a genealogy of HIV-1 envelope (env) partial sequences.
Article
Full-text available
The evolutionary analysis of molecular sequence variation is a statistical enterprise. This is reflected in the increased use of probabilistic models for phylogenetic inference, multiple sequence alignment, and molecular population genetics. Here we present BEAST: a fast, flexible software architecture for Bayesian analysis of molecular sequences related by an evolutionary tree. A large number of popular stochastic models of sequence evolution are provided and tree-based models suitable for both within- and between-species sequence data are implemented. BEAST version 1.4.6 consists of 81000 lines of Java source code, 779 classes and 81 packages. It provides models for DNA and protein sequence evolution, highly parametric coalescent analysis, relaxed clock phylogenetics, non-contemporaneous sequence data, statistical alignment and a wide range of options for prior distributions. BEAST source code is object-oriented, modular in design and freely available at http://beast-mcmc.googlecode.com/ under the GNU LGPL license. BEAST is a powerful and flexible evolutionary analysis package for molecular sequence variation. It also provides a resource for the further development of new models and statistical methods of evolutionary analysis.
Article
Phylogenies that are reconstructed without fossil material often contain approximate dates for lineage splitting. For example, particular nodes on molecular phylogenies may be dated by known geographic events that caused lineages to split, thereby calibrating a molecular clock that is used to date other nodes. On the one hand, such phylogenies contain no information about lineages that have become extinct. On the other hand, they do provide a potentially useful testing ground for ideas about evolutionary processes. Here we first ask what such reconstructed phylogenies should be expected to look like under a birth-death process in which the birth and death parameters of lineages remain constant through time. We show that it is possible to estimate both the birth and death rates of lineages from the reconstructed phylogenies, even though they contain no explicit information about extinct lineages. We also show how such phylogenies can reveal mass extinctions and how their characteristic footprint can be distinguished from similar ones produced by density-dependent cladogenesis.
Article
We study the following model for a phylogenetic tree on n extant species: the origin of the clade is a random time in the past whose (improper) distribution is uniform on (0,∞); thereafter, the process of extinctions and speciations is a continuous-time critical branching process of constant rate, conditioned on there being the prescribed number n of species at the present time. We study various mathematical properties of this model as n →∞: namely the time of origin and of the most recent common ancestor, the pattern of divergence times within lineage trees, the time series of the number of species, the total number of extinct species, the total number of species ancestral to the extant ones, and the ‘local’ structure of the tree itself. We emphasize several mathematical techniques: the association of walks with trees; a point process representation of lineage trees; and Brownian limits.
Article
We investigate a neutral model for speciation and extinction, the constant rate birth-death process. The process is conditioned to have $n$ extant species today, we look at the tree distribution of the reconstructed trees-- i.e. the trees without the extinct species. Whereas the tree shape distribution is well-known and actually the same as under the pure birth process, no analytic results for the speciation times were known. We provide the distribution for the speciation times and calculate the expectations analytically. This characterizes the reconstructed trees completely. We will show how the results can be used to date phylogenies.
Article
Phylogenies reconstructed from gene sequences can be used to investigate the tempo and mode of species diversification. Here we develop and use new statistical methods to infer past patterns of speciation and extinction from molecular phylogenies. Specifically, we test the null hypothesis that per-lineage speciation and extinction rates have remained constant through time. Rejection of this hypothesis may provide evidence for evolutionary events such as adaptive radiations or key adaptations. In contrast to previous approaches, our methods are robust to incomplete taxon sampling and are conservative with respect to extinction. Using simulation we investigate, first, the adverse effects of failing to take incomplete sampling into account and, second, the power and reliability of our tests. When applied to published phylogenies our tests suggest that, in some cases, speciation rates have decreased through time.
Article
Phylogenies that are reconstructed without fossil material often contain approximate dates for lineage splitting. For example, particular nodes on molecular phylogenies may be dated by known geographic events that caused lineages to split, thereby calibrating a molecular clock that is used to date other nodes. On the one hand, such phylogenies contain no information about lineages that have become extinct. On the other hand, they do provide a potentially useful testing ground for ideas about evolutionary processes. Here we first ask what such reconstructed phylogenies should be expected to look like under a birth-death process in which the birth and death parameters of lineages remain constant through time. We show that it is possible to estimate both the birth and death rates of lineages from the reconstructed phylogenies, even though they contain no explicit information about extinct lineages. We also show how such phylogenies can reveal mass extinctions and how their characteristic footprint can be distinguished from similar ones produced by density-dependent cladogenesis.
Article
Preface 1. Inference and the evolutionary tree problem 2. The model 3. The likelihood approach 4. A likelihood solution 5. Further aspects of the problem and its likelihood solution 6. The Icelandic admixture problem Summary References References index Subject index.
Article
A solution to the problem of estimating the positions and times of the branch points of a Brownian‐motion/Yule process, given the positions of all the particles at a particular time, is outlined. A likelihood approach is used, and it is shown that the solution involves maintaining a clear distinction between likelihood and conditional probability if difficulties over mathematical singularities are to be avoided. Several unsolved mathematical problems are encountered, and it is concluded that some simulation studies may be required for a complete solution. Questions of scientific inference which the problem raises are discussed briefly.
Article
A new Markov chain is introduced which can be used to describe the family relationships among n individuals drawn from a particular generation of a large haploid population. The properties of this process can be studied, simultaneously for all n, by coupling techniques. Recent results in neutral mutation theory are seen as consequences of the genealogy described by the chain.
Thesis
Eine Phylogenie repräsentiert die Verwandtschaftsbeziehungen zwischen Spezies. In der vorliegenden Arbeit beantworten wir Fragestellungen, die bei der Rekonstruktion und Analyse von Phylogenien auftreten. Wir entwickeln eine allgemeine Klasse von neutralen Modellen für Speziation und Aussterben. Die Ergebnisse werden zur Datierung von Supertrees, zur Erstellung von so genannten lineages-through-time Plots sowie zur Berechnung von p-Werten für unsere Runs Statistik verwendet. Für Modelle, die nicht analysiert werden können, stellen wir Simulationsalgorithmen zur Verfügung. Wir zeigen, dass weit verbreitete Simulationsprogramme selbst unter gängigen Modellen fehlerhaft arbeiten. Zum Abschluss der Arbeit beweisen wir ein Komplexitätsresultat für die Datierung von Phylogenien mit Retikulationsereignissen. Die verwendeten mathematischen Methoden kommen hauptsächlich aus der Stochastik, Statistik, Kombinatorik und Komplexitätstheorie.
Article
The n-coalescent is a continuous-time Markov chain on a finite set of states, which describes the family relationships among a sample of n members drawn from a large haploid population. Its transition probabilities can be calculated from a factorization of the chain into two independent components, a pure death process and a discrete-time jump chain. For a deeper study, it is useful to construct a more complicated Markov process in which n-coalescents for all values of n are embedded in a natural way.
Article
We investigate a neutral model for speciation and extinction, the constant rate birth–death process. The process is conditioned to have n extant species today, we look at the tree distribution of the reconstructed trees—i.e. the trees without the extinct species. Whereas the tree shape distribution is well-known and actually the same as under the pure birth process, no analytic results for the speciation times were known. We provide the distribution for the speciation times and calculate the expectations analytically. This characterizes the reconstructed trees completely. We will show how the results can be used to date phylogenies.
Article
In this paper, we present a new way to describe the timing of branching events in phylogenetic trees. Our description is in terms of the relative timing of diversification events between sister clades; as such it is complementary to existing methods using lineages-through-time plots which consider diversification in aggregate. The method can be applied to look for evidence of diversification happening in lineage-specific “bursts”, or the opposite, where diversification between 2 clades happens in an unusually regular fashion. In order to be able to distinguish interesting events from stochasticity, we discuss 2 classes of neutral models on trees with relative timing information and develop a statistical framework for testing these models. These model classes include both the coalescent with ancestral population size variation and global rate speciation–extinction models. We end the paper with 2 example applications: first, we show that the evolution of the hepatitis C virus deviates from the coalescent with arbitrary population size. Second, we analyze a large tree of ants, demonstrating that a period of elevated diversification rates does not appear to have occurred in a bursting manner.
Article
The importance of stochastic processes in relation to problems of population growth was pointed out by W. Feller [1] in 1939. He considered among other examples the "birth-and-death" process in which the expected birth and death rates (per head of population per unit of time) were constants, $\lambda_o$ and $\mu_o$, say. In this paper, I shall give the complete solution of the equations governing the generalised birth-and-death process in which the birth and death rates $\lambda(t)$ and $\mu(t)$ may be any specified functions of the time. The mathematical method employed starts from M. S. Bartlett's idea of replacing the differential-difference equations for the distribution of the population size by a partial differential equation for its generating function. For an account of this technique,$^1$ reference may be made to Bartlett's North Carolina lectures [2]. The formulae obtained lead to an expression for the probability of the ultimate extinction of the population, and to the necessary and sufficient condition for a birth-and-death process to be of "transient" type. For transient processes the distribution of the cumulative population is also considered, but here in general it is not found possible to do more than evaluate its mean and variance as functions of $t$, although a complete solution (including the determination of the asymptotic form of the distribution as $t$ tends to infinity) is obtained for the simple process in which the birth and death rates are independent of the time. It is shown that a birth-and-death process can be constructed to give an expected population size $\bar n_t$ which is any desired function of the time $t$, and among the many possible solutions the unique one is determined which makes the fluctuation, Var$(n_t)$, a minimum for all. The general theory is illustrated with reference of two examples. The first of these is the $(\lambda_0, \mu_1t)$ process introduced by N. Arley [3] in his study of the cascade showers associated with cosmic radiation; here the birth rate is constant and the death rate is a constant multiple of the "age, $t$, of the process. The $\bar n_t$-curve is then Gaussian in form, and the process is always of transient type. The second example is provided by the family of "periodic" processes, in which the birth and death rates are periodic functions of the time $t$. These appear well adapted to describe the response of population growth (or epidemic spread) to the influence of the seasons.
Article
We study the following model for a phylogenetic tree on n extant species: the origin of the clade is a random time in the past whose (improper) distribution is uniform on (0,∞); thereafter, the process of extinctions and speciations is a continuous-time critical branching process of constant rate, conditioned on there being the prescribed number n of species at the present time. We study various mathematical properties of this model as n→∞: namely the time of origin and of the most recent common ancestor, the pattern of divergence times within lineage trees, the time series of the number of species, the total number of extinct species, the total number of species ancestral to the extant ones, and the `local' structure of the tree itself. We emphasize several mathematical techniques: the association of walks with trees; a point process representation of lineage trees; and Brownian limits.
Article
Drawing inferences about macroevolutionary processes from phylogenetic trees is a fundamental challenge in evolutionary biology. Understanding stochastic models for speciation is an essential step in solving this challenge. We consider a neutral class of stochastic models for speciation, the constant rate birth-death process. For trees with n extant species - which might be derived from bigger trees via random taxon sampling - we calculate the expected time of the kth speciation event (k=1,...,n-1). Further, for a tree with n extant species, we calculate the density and expectation for the number of lineages at any time between the origin of the process and the present. With the developed methods, expected lineages-through-time (LTT) plots can be drawn analytically. The effect of random taxon sampling on LTT plots is discussed.
Article
Hepatitis C virus (HCV) infection is a major health problem in Egypt, where the seroprevalence is 10–20-fold higher than that in the United States. To characterize the HCV genotype distribution and concordance of genotype assessments on the basis of multiple genomic regions, specimens were obtained from blood donors in 15 geographically diverse governorates throughout Egypt. The 5′ noncoding, core/E1, and NS5B regions were amplified by reverse transcription—polymerase chain reaction and analyzed by both restriction fragment length polymorphism (RFLP) and phylogenetic tree construction. For the 5′ noncoding region, 122 (64%) of 190 specimens were amplified and analyzed by RFLP: 111 (91%) were genotype 4, 1 (1%) was genotype 1a, 1 (1%) was genotype 1b, and 9 (7%) could not be typed. Phylogenetic analyses of the core/E1 and NS5B regions confirmed the genotype 4 preponderance and revealed evidence of 3 new subtypes. Analysis of genetic distance between isolates was consistent with the introduction of multiple virus strains 75–140 years ago, and no clustering was detected within geographic regions, suggesting widespread dispersion at some time since then.
Article
Phylogenies reconstructed from gene sequences can be used to investigate the tempo and mode of species diversification. Here we develop and use new statistical methods to infer past patterns of speciation and extinction from molecular phylogenies. Specifically, we test the null hypothesis that per-lineage speciation and extinction rates have remained constant through time. Rejection of this hypothesis may provide evidence for evolutionary events such as adaptive radiations or key adaptations. In contrast to previous approaches, our methods are robust to incomplete taxon sampling and are conservative with respect to extinction. Using simulation we investigate, first, the adverse effects of failing to take incomplete sampling into account and, second, the power and reliability of our tests. When applied to published phylogenies our tests suggest that, in some cases, speciation rates have decreased through time.
Article
The theory of island biogeography asserts that an island or a local community approaches an equilibrium species richness as a result of the interplay between the immigration of species from the much larger metacommunity source area and local extinction of species on the island (local community). Hubbell generalized this neutral theory to explore the expected steady-state distribution of relative species abundance (RSA) in the local community under restricted immigration. Here we present a theoretical framework for the unified neutral theory of biodiversity and an analytical solution for the distribution of the RSA both in the metacommunity (Fisher's log series) and in the local community, where there are fewer rare species. Rare species are more extinction-prone, and once they go locally extinct, they take longer to re-immigrate than do common species. Contrary to recent assertions, we show that the analytical solution provides a better fit, with fewer free parameters, to the RSA distribution of tree species on Barro Colorado Island, Panama, than the lognormal distribution.
Article
Tuberculosis can be studied at the population level by genotyping strains of Mycobacterium tuberculosis isolated from patients. We use an approximate Bayesian computational method in combination with a stochastic model of tuberculosis transmission and mutation of a molecular marker to estimate the net transmission rate, the doubling time, and the reproductive value of the pathogen. This method is applied to a published data set from San Francisco of tuberculosis genotypes based on the marker IS6110. The mutation rate of this marker has previously been studied, and we use those estimates to form a prior distribution of mutation rates in the inference procedure. The posterior point estimates of the key parameters of interest for these data are as follows: net transmission rate, 0.69/year [95% credibility interval (C.I.) 0.38, 1.08]; doubling time, 1.08 years (95% C.I. 0.64, 1.82); and reproductive value 3.4 (95% C.I. 1.4, 79.7). These figures suggest a rapidly spreading epidemic, consistent with observations of the resurgence of tuberculosis in the United States in the 1980s and 1990s.
Article
The frequency of a given gene in a population may be modified by a number of conditions including recurrent mutation to and from it, migration, selection of various sorts and, far from least in importance, were chance variation
Article
In this paper, we investigate the standard Yule model, and a recently studied model of speciation and extinction, the “critical branching process.” We develop an analytic way—as opposed to the common simulation approach—for calculating the speciation times in a reconstructed phylogenetic tree. Simple expressions for the density and the moments of the speciation times are obtained. Methods for dating a speciation event become valuable, if for the reconstructed phylogenetic trees, no time scale is available. A missing time scale could be due to supertree methods, morphological data, or molecular data which violates the molecular clock. Our analytic approach is, in particular, useful for the model with extinction, since simulations of birth-death processes which are conditioned on obtaining n extant species today are quite delicate. Further, simulations are very time consuming for big n under both models.
Article
t,n be a continuous-time critical branching process conditioned to have population n at time t. Consider t,n as a random rooted tree with edge-lengths. We define the genealogy t,n ) of the population at time t to be the smallest subtree of t,n containing all the edges at a distance t from the root. We also consider a Bernoulli(p) sampling process on the leaves of t,n , and define the p-sampled history p (T t,n ) to be the smallest subtree of containing all the sampled leaves at a distance less than t from the root. We first give a representation of t,n ) and p (T t,n ) in terms of point-processes, and then provide their asymptotic behavior as t 0 , and np p 0 . The resulting asymptotic processes are related to a Brownian excursion conditioned to have local time at t 0 equal to 1, sampled at times of a Poisson( 2 ) process.
Article
Yule (1924) observed that distributions of number of species per genus were typically long-tailed, and proposed a stochastic model to fit this data. Modern taxonomists often prefer to represent relationships between species via phylogenetic trees; the counterpart to Yule's observation is that actual reconstructed trees look surprisingly unbalanced. The imbalance can readily be seen via a scatter diagram of the sizes of clades involved in the splits of published large phylogenetic trees. Attempting stochastic modeling leads to two puzzles. First, two somewhat opposite possible biological descriptions of what dominates the macroevolutionary process (adaptive radiation; "neutral" evolution) lead to exactly the same mathematical model (Markov or Yule or coalescent). Second, neither this nor any other simple stochastic model predicts the observed pattern of imbalance. This essay represents a probabilist's musings on these puzzles, complementing the more detailed survey of biol...
Human Evolutionary Trees Neutral theory and relative species abundance in ecology
  • E A Thompson
Thompson, E.A., 1975. Human Evolutionary Trees. Cambridge University Press, Cambridge. Volkov, I., Banavar, J., Hubbell, S., Maritan, A., 2003. Neutral theory and relative species abundance in ecology. Nature 424 (6952), 1035–1037.
Evolving trees—models for speciation and extinction in phylogenetics Lineages-through-time plots of neutral models for speciation
  • T Stadler
  • T Stadler
  • T Stadler
Stadler, T., 2006–2008. Cass http://www.tb.ethz.ch/people/tstadlerS. Stadler, T., 2008a. Evolving trees—models for speciation and extinction in phylogenetics. Ph.D. Thesis, Technical University of Munich, 2008. Stadler, T., 2008b. Lineages-through-time plots of neutral models for speciation. Math. Biosci. 216, 163–171.
A method for investigating relative timing information on phylogenetic trees
  • D Ford
  • T Gernhard
  • E Matsen
D. Ford, T. Gernhard, and E. Matsen. A method for investigating relative timing information on phylogenetic trees. Syst. Biol., in press, 2009.
The coalescent. Stochastic Processes and Their Applications
  • J F C Kingman
J. F. C. Kingman. The coalescent. Stochastic Processes and Their Applications, 13:235-248, 1982.
Phylogenetics: the theory and practice of phylogenetic systematics
  • Colless