Article

On the Generalized "Birth-and-Death" Process

Authors:
To read the full-text of this research, you can request a copy directly from the author.

Abstract

The importance of stochastic processes in relation to problems of population growth was pointed out by W. Feller [1] in 1939. He considered among other examples the "birth-and-death" process in which the expected birth and death rates (per head of population per unit of time) were constants, $\lambda_o$ and $\mu_o$, say. In this paper, I shall give the complete solution of the equations governing the generalised birth-and-death process in which the birth and death rates $\lambda(t)$ and $\mu(t)$ may be any specified functions of the time. The mathematical method employed starts from M. S. Bartlett's idea of replacing the differential-difference equations for the distribution of the population size by a partial differential equation for its generating function. For an account of this technique,$^1$ reference may be made to Bartlett's North Carolina lectures [2]. The formulae obtained lead to an expression for the probability of the ultimate extinction of the population, and to the necessary and sufficient condition for a birth-and-death process to be of "transient" type. For transient processes the distribution of the cumulative population is also considered, but here in general it is not found possible to do more than evaluate its mean and variance as functions of $t$, although a complete solution (including the determination of the asymptotic form of the distribution as $t$ tends to infinity) is obtained for the simple process in which the birth and death rates are independent of the time. It is shown that a birth-and-death process can be constructed to give an expected population size $\bar n_t$ which is any desired function of the time $t$, and among the many possible solutions the unique one is determined which makes the fluctuation, Var$(n_t)$, a minimum for all. The general theory is illustrated with reference of two examples. The first of these is the $(\lambda_0, \mu_1t)$ process introduced by N. Arley [3] in his study of the cascade showers associated with cosmic radiation; here the birth rate is constant and the death rate is a constant multiple of the "age, $t$, of the process. The $\bar n_t$-curve is then Gaussian in form, and the process is always of transient type. The second example is provided by the family of "periodic" processes, in which the birth and death rates are periodic functions of the time $t$. These appear well adapted to describe the response of population growth (or epidemic spread) to the influence of the seasons.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

... For a brief illustration, we invoke the Crump-Mode-Jagger continuous-time branching process, corresponding to the SI model of disease invasion [68][69][70]. This model has a Markov chain embedding the so-called Kendall process, whose extinction probability is 1/R 0 [19]. However, the model also has an embedding consisting of a Bienaymé-Galton-Watson discrete-time branching process, where the distribution of secondary cases is Poisson. ...
... In this paper, we generalize the work of Kendall [19] and Bartlett [20] about disease invasion, to include the mathematical principles of pathogen emergence. Our primary focus is the analytic derivation of the probability that a patient minus one fails to initiate an epidemic of a directly transmitted or vector-borne disease. ...
... The moment expansion closes exactly at the expectation because all the rates of the Markov chain are linear in the stochastic variables. Furthermore, we note that if i 0 vanishes, then the Markov chain reduces to the Kendall process [19]. Markov chains can be naturally used to model population extinction. ...
Article
Full-text available
Epidemic or pathogen emergence is the phenomenon by which a poorly transmissible pathogen finds its evolutionary pathway to become a mutant that can cause an epidemic. Many mathematical models of pathogen emergence rely on branching processes. Here, we discuss pathogen emergence using Markov chains, for a more tractable analysis, generalizing previous work by Kendall and Bartlett about disease invasion. We discuss the probability of emergence failure for early epidemics, when the number of infected individuals is small and the number of the susceptible individuals is virtually unlimited. Our formalism addresses both directly transmitted and vector-borne diseases, in the cases where the original pathogen is 1) one step-mutation away from the epidemic strain, and 2) undergoing a long chain of neutral mutations that do not change the epidemiology. We obtain analytic results for the probabilities of emergence failure and two features transcending the transmission mechanism. First, the reproduction number of the original pathogen is determinant for the probability of pathogen emergence, more important than the mutation rate or the transmissibility of the emerged pathogen. Second, the probability of mutation within infected individuals must be sufficiently high for the pathogen undergoing neutral mutations to start an epidemic, the mutation threshold depending again on the basic reproduction number of the original pathogen. Finally, we discuss the parameterization of models of pathogen emergence, using SARS-CoV1 as an example of zoonotic emergence and HIV as an example for the emergence of drug resistance. We also discuss assumptions of our models and implications for epidemiology.
... This is more pronounced when the probability of minor outbreaks is large; i.e. when we have a small number of initial infections or when R 0 is close 1. In Section 3 we consider a birth-death process conditioned on major outbreaks which we approximate by conditioning on non-extinction (Kot, 2001;Kendall, 1948aKendall, , 1948b. This better-describes a typical major outbreak and we show that it performs well in removing the bias from the estimation of R 0 . ...
... (1) From this, the rate of change of the expected infectious population is (Feller, 1939;Kendall, 1948a): ...
... We resolved the issue of overestimation by developing a simple birth-death model with conditioning on major outbreaks. To make analytic progress we approximated conditioning on major outbreaks by conditioning against extinction (Equation (5)) (Kot, 2001;Kendall, 1948aKendall, , 1948b. We made use of the analytic solution of the simple birth-death master equation and showed that fitting this model to the early stages of epidemic outbreaks resolves the issue of overestimation of R 0 (Figs. ...
Article
Full-text available
The basic reproduction number, R0, is a well-known quantifier of epidemic spread. However, a class of existing methods for estimating R0 from incidence data early in the epidemic can lead to an over-estimation of this quantity. In particular, when fitting deterministic models to estimate the rate of spread, we do not account for the stochastic nature of epidemics and that, given the same system, some outbreaks may lead to epidemics and some may not. Typically, an observed epidemic that we wish to control is a major outbreak. This amounts to implicit selection for major outbreaks which leads to the over-estimation problem. We formally characterised the split between major and minor outbreaks by using Otsu's method which provides us with a working definition. We show that by conditioning a ‘deterministic’ model on major outbreaks, we can more reliably estimate the basic reproduction number from an observed epidemic trajectory.
... We call κ(t; π) and Λ(t; π) = t 0 κ(x; π)dx the baseline intensity function and the cumulative baseline intensity function, respectively. The above linear NHMPs were discussed by Kendall [11] and Konno [12]. Especially, the NHMP-based SRMs are characterized by transition rates in Eqs. ...
... Especially, the NHMP-based SRMs are characterized by transition rates in Eqs. (11) and (12) are known as the generalized binomial processes (GBPs) and the generalized Polya processes (GPPs), respectively. ...
... The GBP was first considered as an inverse death-process type of NHMP by Kendall [11]. Shanthikumar [25,26] applied the GBP to represent the software fault count processes, whose Kolmogorov forward equations are given by ...
Preprint
Full-text available
This paper explores yet another software reliability modeling framework based on non-homogeneous Markov processes (NHMPs). For two subclasses of NHMPs; generalized binomial processes (GBPs) and generalized Polya processes (GPPs), we formulate 22 novel NHMP-based software reliability models (SRMs) with 11 kinds of baseline intensity functions, which are different from the existing NHMP-based SRMs by Li et al. (2023). Our evaluation of NHMP-based SRMs focuses on assessing their goodness-of-fit and predictive performances using 8 data sets of software fault detection time-domain data and 8 data sets of time-interval data (group data). The results are compared with the well-known NHPP-based SRMs. Through comprehensive numerical experiments, we show that our new modeling framework could provide the strengths in the goodness-of-fit performance in many cases but outperform on the predictive performance in a limited number of cases.
... Note that the probability of extinction is obtained from the linear birth-death process (Kendall, 1948), assuming density independence at the beginning of the growth when the population size is very small compared to the carrying capacity. For nonlinear models, these early extinction events are not taken into account in deterministic approaches. ...
... Parameter values: carrying capacity K = 100, birth rate b = 1, death rate d = 0.1 (in (a)) and 0.5 (in (b)), and initial population size N 0 = 1 (in (a) and (c)). Furthermore, the linear birth-death process, whose analytical solution is known (Kendall, 1948), has a deterministic limit giving the population size averaged over the survival and early extinction trajectories (see Appendix D for details). ...
... Note that the analytical solution of Equation (11c) is known (Kendall, 1948). One can derive the mean population size from the master equation using ⟨N⟩ = ∑ K N=0 N × P N 0 ,N (t), which yields ...
Article
Full-text available
Population growth is a fundamental process in ecology and evolution. The population size dynamics during growth are often described by deterministic equations derived from kinetic models. Here, we simulate several population growth models and compare the size averaged over many stochastic realizations with the deterministic predictions. We show that these deterministic equations are generically bad predictors of the average stochastic population dynamics. Specifically, deterministic predictions overestimate the simulated population sizes, especially those of populations starting with a small number of individuals. Describing population growth as a stochastic birth process, we prove that the discrepancy between deterministic predictions and simulated data is due to unclosed-moment dynamics. In other words, the deterministic approach does not consider the variability of birth times, which is particularly important with small population sizes. We show that some moment-closure approximations describe the growth dynamics better than the deterministic prediction. However, they do not reduce the error satisfactorily and only apply to some population growth models. We explicitly solve the stochastic growth dynamics, and our solution applies to any population growth model. We show that our solution exactly quantifies the dynamics of a community composed of different strains and correctly predicts the fixation probability of a strain in a serial dilution experiment. Our work sets the foundations for a more faithful modeling of community and population dynamics. It will allow the development of new tools for a more accurate analysis of experimental and empirical results, including the inference of important growth parameters.
... These epidemiological setups are known under the name of disease invasion. The mathematical foundations to conceptualize disease invasion were laid by Kendall [19] and Bartlett [20] more than 50 years ago. More recently, Allen et al. [21,22] proposed generalizations of the mathematical formalism describing disease invasion. ...
... The choice of mathematical/numerical framework is very important to solve emergence problems. Kendall [19] and Bartlett [20] used Markov chains to solve analytically for the probability that an imported case is a patient zero, who initiates an epidemic, or else the transmission process s/he initiates goes extinct. Their results would have been very different should they have used branching processes instead of Markov chains. ...
... For a brief illustration, we invoke the Crump-Mode-Jagger continuous-time branching process, corresponding to the SI model of disease invasion [68][69][70]. This model has a Markov chain embedding, the so-called Kendall process, whose extinction probability is 1/R 0 [19]. However, the model also has an embedding consisting of a Bienaymé-Galton-Watson discrete-time branching process, where the distribution of secondary cases is Poisson. ...
Preprint
Full-text available
Epidemic or pathogen emergence is the phenomenon by which a poorly transmissible pathogen finds its evolutionary pathway to become a mutant which can cause an epidemic. Many mathematical models of pathogen emergence rely on branching processes. Here, we discuss pathogen emergence using Markov chains, for a more tractable analysis, generalizing previous work by Kendall and Bartlett about disease invasion. We discuss the probability of emergence failure for early epidemics, when the number of infected individuals is small and the number of the susceptible individuals is virtually unlimited. Our formalism addresses both directly transmitted and vector-borne diseases, in the cases where the original pathogen is 1) one step-mutation away from the epidemic strain, and 2) undergoing a long chain of neutral mutations which does not change the epidemiology. We obtained analytic results for the probabilities of emergence failure, and two features transcending the transmission mechanism. First, the reproduction number of the original pathogen is determinant for the probability of pathogen emergence, more important than the mutation rate or the transmissibility of the emerged pathogen. Second, the probability of mutation within infected individuals must be sufficiently high for the pathogen undergoing neutral mutations to start an epidemic, the mutation threshold depending again on the basic reproduction number of the original pathogen. Finally, we discuss the parameterization of models of pathogen emergence, using SARS-CoV1 as an example of zoonotic emergence, and HIV as an example for the emergence of drug resistance. We also discuss assumptions of our models and implications for epidemiology. Highlights • The foundations of modeling disease invasion were laid more than 50 years ago. • Markov chains provide a unifying framework for both disease invasion and emergence. • We obtain analytic results for the probability of pathogen emergence.
... The classical birth-death process has constant birth and death rates (λ and µ, respectively) and has a long history [1,6]. More general models were subsequently investigated (e.g, [7]) and which allow for these rates to depend on time, or the number of species present at a given time. In all such models, the tree either becomes extinct or the number of leaves tends to infinity with probability 1 as t → ∞ (e.g. by a more general result due to Jagers [8]). ...
... In particular, R s t (λ, θ) converges to 0 if λ ≤ θ and converges to a strictly positive value, namely 1 − θ/λ, if λ > θ [7,43]. ...
Preprint
Full-text available
A wide variety of stochastic models of cladogenesis (based on speciation and extinction) lead to an identical distribution on phylogenetic tree shapes once the edge lengths are ignored. By contrast, the distribution of the tree's edge lengths is generally quite sensitive to the underlying model. In this paper, we review the impact of different model choices on tree shape and edge length distribution, and its impact for studying the properties of phylogenetic diversity (PD) as a measure of biodiversity, and the loss of PD as species become extinct at the present. We also compare PD with a stochastic model of feature diversity, and investigate some mathematical links and inequalities between these two measures plus their predictions concerning the loss of biodiversity under extinction at the present.
... We further identify four situations that we argue need to be explicitly handled by methods that infer reassortment. For this, we use a simple linear birth-death Markov chain simulation (Kendall 1948;Stadler 2010) to show that these situations are common even in small samples with perfect data for very simple epidemiological models. By simulating sequence data from these models, we also assess the error properties of , and e). ...
... We implemented the conceptual model as a linear birthdeath Markov process (Feller 1939;Kendall 1948) using the suggested formalism for simulating genealogical processes in King et al. (2022). The model contains two distinct levels of biological processes: epidemiology of the host population and ancestry within the viral population (supplementary table S1, Supplementary Material online). ...
Article
Full-text available
Reassortment is an evolutionary process common in viruses with segmented genomes. These viruses can swap whole genomic segments during cellular co-infection, giving rise to novel progeny formed from the mixture of parental segments. Because large-scale genome rearrangements have the potential to generate new phenotypes, reassortment is important to both evolutionary biology and public health research. However, statistical inference of the pattern of reassortment events from phylogenetic data is exceptionally difficult, potentially involving inference of general graphs in which individual segment trees are embedded. In this paper, we argue that, in general, the number and pattern of reassortment events are not identifiable from segment trees alone, even with theoretically ideal data. We call this fact the fundamental problem of reassortment, which we illustrate using the concept of the `first-infection tree', a typically but not always counterfactual genealogy that would have been observed in the segment trees had no reassortment occurred. Further, we illustrate four additional problems that can arise logically in the inference of reassortment events and show, using simulated data, that these problems are not rare and can potentially distort our perception of reassortment even in small data sets. Finally, we discuss how existing methods can be augmented or adapted to account for not only the fundamental problem of reassortment but also the four additional situations that can complicate the inference of reassortment.
... . 6 On taking u = 0, we get ...
... In (3.4), we use the following expanded form of the cgf (see Kendall (1948), Eq. (35)): ...
Preprint
In this paper, we consider a generalized birth-death process (GBDP) and examined its linear versions. Using its transition probabilities, we obtain the system of differential equations that governs its state probabilities. The distribution function of its waiting-time in state $s$ given that it starts in state $s$ is obtained. For a linear version of it, namely, the generalized linear birth-death process (GLBDP), we obtain the probability generating function, mean, variance and the probability of ultimate extinction of population. Also, we obtain the maximum likelihood estimate of one of its parameter. The differential equations that govern the joint cumulant generating functions of the population size with cumulative births and cumulative deaths are derived. In the case of constant birth and death rates in GBDP, the explicit forms of the state probabilities, joint probability mass functions of population size with cumulative births and cumulative deaths, and their marginal probability mass functions are obtained. It is shown that the Laplace transform of a stochastic integral of GBDP satisfies its Kolmogorov backward equation with certain scaled parameters. Also, the first two moments of the stochastic path integral of GLBDP are obtained. Later, we consider the immigration effect in GLBDP for two different cases. An application of a linear version of GBDP and its stochastic path integral to vehicles parking management system is discussed.
... Modern methods of reconstructing phylogenetic trees depend rather critically on an understanding of simple, neutral stochastic models of the interaction between mutation and speciation which produce 'prior' probability distributions for use in Bayesian-type analyses (Felsenstein 2004;Drummond and Rambaut 2007;Mulder and Crawford 2015). The most widely used of these models for speciation and extinction is the birth-death process (Kendall 1948;Nee et al. 1994;Rannala and Yang 1996;Nee 2006;Gernhard 2008;Gernhard et al. 2008;Stadler 2009), which simplifies to the birth-only or Yule process in the absence of extinction events (Yule 1924). According to this latter model, extant species have a constant chance of diverging into two branches in unit time. ...
... Under our assumption of a simple BD process, it is well known that the a priori probability that after some time t the tree will have produced exactly k ≥ 1 leaves, or tips, follows the distribution (Kendall 1948), ...
Article
In this contribution, a general expression is derived for the probability density of the time to the most recent common ancestor (TMRCA) of a simple birth-death tree, a widely used stochastic null-model of biological speciation and extinction, conditioned on the constant birth and death rates and number of extant lineages. This density is contrasted with a previous result which was obtained using a uniform prior for the time of origin. The new distribution is applied to two problems of phylogenetic interest. First, that of the probability of the number of taxa existing at any time in the past in a tree of a known number of extant species, and given birth and death rates, and second, that of determining the TMRCA of two randomly selected taxa in an unobserved tree that is produced by a simple birth-only, or Yule, process. In the latter case, it is assumed that only the rate of bifurcation (speciation) and the size, or number of tips, are known. This is shown to lead to a closed-form analytical expression for the probability distribution of this parameter, which is arrived at based on the known mathematical form of the age distribution of Yule trees of a given size and branching rate, which is derived here de novo, and a similar distribution which additionally is conditioned on tree age. The new distribution is the exact Yule prior for divergence times of pairs of taxa under the stated conditions and is potentially useful in statistical (Bayesian) inference studies of phylogenies.
... The probability P (Z(ti+1) = zi+1|Z(ti) = zi, λ, µ) that there are zi+1 individuals at time ti+1 given that there are zi individuals at time ti follows from Keiding [1975] as where α(t) and β(t) are defined by [Kendall, 1948]. The Markov property implies that the likelihood for λ and µ, given an observation y obs where Z(t1) = z1, . . . ...
Preprint
Full-text available
Approximate Bayesian Computation (ABC) is a popular inference method when likelihoods are hard to come by. Practical bottlenecks of ABC applications include selecting statistics that summarize the data without losing too much information or introducing uncertainty, and choosing distance functions and tolerance thresholds that balance accuracy and computational efficiency. Recent studies have shown that ABC methods using random forest (RF) methodology perform well while circumventing many of ABC's drawbacks. However, RF construction is computationally expensive for large numbers of trees and model simulations, and there can be high uncertainty in the posterior if the prior distribution is uninformative. Here we adapt distributional random forests to the ABC setting, and introduce Approximate Bayesian Computation sequential Monte Carlo with random forests (ABC-SMC-(D)RF). This updates the prior distribution iteratively to focus on the most likely regions in the parameter space. We show that ABC-SMC-(D)RF can accurately infer posterior distributions for a wide range of deterministic and stochastic models in different scientific areas.
... In (23), we use the following expanded form of the cgf (see [7], Eq. (35)): ...
Article
Full-text available
In this paper, we consider a generalized birth–death process (GBDP) and examine its linear versions. Using its transition probabilities, we obtain the system of differential equations that governs its state probabilities. The distribution function of its waiting time in state s given that it starts in state s is obtained. For a linear version of it, namely the generalized linear birth–death process (GLBDP), we obtain the probability generating function, mean, variance and the probability of ultimate extinction of population. Also, we obtain the maximum likelihood estimate of its parameters. The differential equations that govern the joint cumulant generating functions of the population size with cumulative births and cumulative deaths are derived. In the case of constant birth and death rates in GBDP, the explicit forms of the state probabilities, joint probability mass functions of population size with cumulative births and cumulative deaths, and their marginal probability mass functions are obtained. It is shown that the Laplace transform of an integral of GBDP satisfies its Kolmogorov backward equation with certain scaled parameters. The first two moments of the path integral of GLBDP are obtained. Also, we consider the immigration effect in GLBDP for two different cases. An application of a linear version of GBDP and its path integral to a vehicles parking management system is discussed. Later, we introduce a time-changed version of the GBDP where time is changed via an inverse stable subordinator. We show that its state probabilities are governed by a system of fractional differential equations.
... Species origination and extinction times are generated using a stochastic birth-death process 94,95 . We use time-forward simulations in which the time of origin (t O ) is randomly drawn from a uniform distribution, here set to t O ∼ U½30,100. ...
Article
Full-text available
Understanding how biodiversity has changed through time is a central goal of evolutionary biology. However, estimates of past biodiversity are challenged by the inherent incompleteness of the fossil record, even when state-of-the-art statistical methods are applied to adjust estimates while correcting for sampling biases. Here we develop an approach based on stochastic simulations of biodiversity and a deep learning model to infer richness at global or regional scales through time while incorporating spatial, temporal and taxonomic sampling variation. Our method outperforms alternative approaches across simulated datasets, especially at large spatial scales, providing robust palaeodiversity estimates under a wide range of preservation scenarios. We apply our method on two empirical datasets of different taxonomic and temporal scope: the Permian-Triassic record of marine animals and the Cenozoic evolution of proboscideans. Our estimates provide a revised quantitative assessment of two mass extinctions in the marine record and reveal rapid diversification of proboscideans following their expansion out of Africa and a >70% diversity drop in the Pleistocene.
... Deaths within the cohort are shown as age-specific mortality rates. 18,40 By studying mortality life tables for similar populations, it is possible to determine elevated death rates, which can be avoided by applying standard medical care across a select population. ...
Thesis
Full-text available
In this work, a novel approach to measuring amenable deaths is introduced. The lowest age-specific mortality rates in the USA have been isolated to create normative life tables. The concept of normative life tables was first described in the context of the Global Burden of Disease Study at the University of Washington in Seattle, for measuring the general burden of disease in specific populations. Normative life tables provide an ideal life table for the USA, and shed light on shortcomings in states with comparatively high mortality rates. The normative life table approach is applied for a chronic and frequent health condition in the USA, namely COPD (Chronic Obstructive Lung Disease). The lowest COPD mortality rates in the USA for 2016 have been isolated to create normative COPD life tables. These normative life tables show the best practice for COPD in the USA. Excess deaths in COPD across the states are regarded as amenable deaths, i.e., deaths that with timely and effective medical interventions and public health efforts could have been prevented. California has the lowest proportion of amenable deaths due to COPD. Texas has moderate mortality rates for COPD, while Kentucky has the highest COPD mortalities in the USA, and therefore the highest proportion of amenable deaths in COPD. These changes are also reflected in the life expectancy of individuals with COPD. California has the highest life expectancy for individuals with COPD. In 2016, 50- year-olds with COPD in California were expected to live for an additional 21.01 years, while in Texas they had an additional 17.82 years to live and in Kentucky, only 11.94 years. The normative life table approach adds to current efforts by providing a fair way of measuring health care performance. It acts as an indicator of health care quality by measuring the share of amenable deaths that, with timely and effective medical interventions and public health efforts, could have been avoided.
... We used an uncorrelated relaxed molecular clock model (Drummond et al. 2006) and an exponential prior, with time tree and clock model linked across partitions. We applied a node-dating approach with a birthdeath tree prior (Kendall 1948 (Lu et al. 2019). Sole et al. (2013) calibrated this node with an age around the Jurassic (144-200 Mya), which, however, is less likely and not used here because both fossil and phylogenomic evidence from previous studies suggest a mid-Cretaceous divergence between Crocinae and Nemopterinae (Winterton et al. 2018, Lu et al. 2019, Vasilikopoulos et al. 2020. ...
Article
The spoon-winged lacewings (Neuroptera: Nemopteridae: Nemopterinae) are a group of charismatic insects with morphological and biological specializations. Among the known 105 species of Nemopterinae worldwide, only one species, namely Nemopistha sinica Yang, 1986, is recorded from East Asia. However, the morphology, taxonomic status, and evolutionary history of this rare species are poorly known. Here, we present a systematic revision of the Chinese Nemopterinae and establish a new genus, Sinonemoptera, that comprises Sinonemoptera sinica (Yang, 1986) comb. nov. from western Yunnan and a new species, Sinonemoptera tibetana sp. nov., from southeastern Tibet. Based on the phylogeny of Nemopterinae combining morphological and molecular evidence, Nemopterinae are divided into two major clades by the length of the adult abdomen, and Sinonemoptera gen. nov. together with some Afrotropical genera constitute a monophyletic lineage characterized by a long abdomen. Our results suggest a Late Cretaceous African origin and three Tertiary transcontinental dispersals in shaping the global distribution of Nemopterinae. Our ecological niche modelling demonstrates the specific requirement for warm and dry habitats in nemopterines and highlights the urgent need for protection of the savannah-like habitat along the Nujiang valley for the Chinese Nemopterinae.
... Birth-death processes are used to model the phylogenetic branching process on a macroevolutionary scale, thus revealing expected patterns of speciation and extinction (Kendall, 1948;Nee, 2006;Raup, 1985;"The Reconstructed Evolutionary Process," 1994). A rich theoretical framework for this process allows us to describe the distribution of branch lengths (Mooers et al., 2011;Steel & Mooers, 2010), growth dynamics ("Extinction Rates Can Be Estimated from Molecular Phylogenies," 1994;Stadler, 2008), and topological properties (Lambert & Stadler, 2013;Mooers & Heard, 1997) of phylogenetic trees. ...
Article
Full-text available
Gene-flow processes such as hybridization and introgression play important roles in shaping diversity across the tree of life. Recent studies extending birth-death models have made it possible to investigate patterns of reticulation in a macroevolutionary context. These models allow for different macroevolutionary patterns of gene flow events that can either add, maintain, or remove lineages—with the gene flow itself possibly being dependent on the relatedness between species—thus creating complex diversification scenarios. Further, many reticulate phylogenetic inference methods assume specific reticulation structures or phylogenies belonging to certain network classes. However, the distributions of phylogenetic networks under reticulate birth-death processes are poorly characterized, and it is unknown whether they violate common methodological assumptions. We use simulation techniques to explore phylogenetic network space under a birth-death-hybridization process where the hybridization rate can have a linear dependence on genetic distance. Specifically, we measured the number of lineages through time and role of hybridization in diversification along with the proportion of phylogenetic networks that belong to commonly used network classes (e.g., tree-child, tree-based, or level-1 networks). We find that the growth of phylogenetic networks and class membership are largely affected by assumptions about macroevolutionary patterns of gene flow. In accordance with previous studies, a lower proportion of networks belonged to these classes based on type and density of reticulate events. However, under a birth-death-hybridization process, these factors form an antagonistic relationship; the type of reticulation events that cause high membership proportions also lead to the highest reticulation density, consequently lowering the overall proportion of phylogenies in some classes. Further, we observed that genetic distance–dependent gene flow and incomplete sampling increase the proportion of class membership, primarily due to having fewer reticulate events. Our results can inform studies if their biological expectations of gene flow are associated with evolutionary histories that satisfy the assumptions of current methodology and aid in finding phylogenetic classes that are relevant for methods development.
... The distribution of modern diversity predicted by BDPs (homogeneous or epochally time-varying across all species) is geometric [8] [14], and this remains the case even when non-selective mass extinctions are considered [15]. However, a certain amount of evidence suggests that extant sizes are in fact over-dispersed relative to this expectation. ...
Preprint
Full-text available
Rate shifts in speciation and extinction have been recognised as important contributors to the creation of evolutionary patterns. In particular, the distribution of modern clade sizes is difficult to reconcile with models that do not include them. Although recent advances have allowed rate shifts to be integrated into evolutionary models, these have largely been for the purpose of inferring historical rate shifts across phylogenetic trees. In addition, these models have typically assumed an independence between patterns of diversification and rates of molecular and morphological evolution, despite there being mounting evidence of a connection between them. Here, we develop a new model with two principal goals: first, to explore the general patterns of diversification implied by constantly changing rates, and secondly to integrate diversification, molecular and morphological evolution into a single coherent framework. We thus develop and analyse a covariant birth-death process in which rates of all evolutionary processes (i.e. speciation, extinction and molecular and morphological change) covary continuously, both for each species and through time. We use this model to show that modern diversity is likely to be dominated by a small number of extremely large clades at any historical epoch; that these large clades are expected to be characterised by explosive early radiations accompanied by elevated rates of molecular evolution; and that extant organisms are likely to have evolved from species with unusually fast evolutionary rates. In addition, we show that under such a model, the amount of molecular change along a particular lineage is essentially independent of its height, which further weakens the molecular clock hypothesis. Finally, our model predicts the existence of "living fossil" sister groups to large clades that are both species poor and have exhibited slow rates of morphological and molecular change. Although our model is highly stochastic, it includes no special evolutionary moments or epochs. Our results thus demonstrate that the observed historical patterns of evolution can be modelled without invoking special evolutionary mechanisms or innovations that are unique to specific times or taxa, even when they are highly non-uniform: instead they could emerge from a process that is fundamentally homogeneous throughout time.
... This extinction rate µ is a per-capita rate, expressed in E/MSY. The birth-death process was initially developed to model population demography and has been intensively analyzed mathematically, for example by Kendall (1948). It was introduced in the field by Yule (1925), and recently used to develop process-based Bayesian approaches for estimating origination and extinction rates from occurrence or stratigraphic range data (Silvestro et al., 2014;Stadler et al., 2018). ...
... Both λ and μ are non-negative real numbers, and ρ is a real number between zero and one. A standard birth-death model is the constant-rate birthdeath (crBD) process (Kendall 1948;Thompson 1975;Nee 2006). It is age-independent such that rates are constant over time, and the time until speciation or extinction is exponentially distributed. ...
Article
Full-text available
Birth-death models are stochastic processes describing speciation and extinction through time and across taxa, and are widely used in biology for inference of evolutionary timescales. Previous research has highlighted how the expected trees under the constant-rate birth-death (crBD) model tend to differ from empirical trees, for example with respect to the amount of phylogenetic imbalance. However, our understanding of how trees differ between the crBD model and the signal in empirical data remains incomplete. In this Point of View, we aim to expose the degree to which the crBD model differs from empirically inferred phylogenies and test the limits of the model in practice. Using a wide range of topology indices to compare crBD expectations against a comprehensive dataset of 1189 empirically estimated trees, we confirm that crBD model trees frequently differ topologically compared with empirical trees. To place this in the context of standard practice in the field, we conducted a meta-analysis for a subset of the empirical studies. When comparing studies that used Bayesian methods and crBD priors with those that used other non-crBD priors and non-Bayesian methods (i.e., maximum likelihood methods), we do not find any significant differences in tree topology inferences. To scrutinize this finding for the case of highly imbalanced trees, we selected the 100 trees with the greatest imbalance from our dataset, simulated sequence data for these tree topologies under various evolutionary rates, and re-inferred the trees under maximum likelihood and using the crBD model in a Bayesian setting. We find that when the substitution rate is low, the crBD prior results in overly balanced trees, but the tendency is negligible when substitution rates are sufficiently high. Overall, our findings demonstrate the general robustness of crBD priors across a broad range of phylogenetic inference scenarios, but also highlights that empirically observed phylogenetic imbalance is highly improbable under the crBD model, leading to systematic bias in data sets with limited information content.
... A promising but underexplored approach to extinction is the neutral theory of biodiversity (NT) (7), a general model of the abundance and diversity dynamics of species in a community of fixed size (Fig. 1). This model is neutral because individuals are competitively identical regardless of species identity; the success or failure of species is therefore stochastic, determined solely by zero-sum ecological drift [the zero-sum constraint being perhaps its most important innovation over earlier birth-death models (8,9)]. NT thus dispenses with some features of biological interest like ecologically determined variation in fitness (10), but the resulting simplicity allows the theory to generate testable and often successful predictions across a broad array of ecological and evolutionary variables, including species abundances, species-area relationships, and phylogenetic tree shape (11). ...
Article
Full-text available
Red Queen (RQ) theory states that adaptation does not protect species from extinction because their competitors are continually adapting alongside them. RQ was founded on the apparent independence of extinction risk and fossil taxon age, but analytical developments have since demonstrated that age-dependent extinction is widespread, usually most intense among young species. Here, we develop ecological neutral theory as a general framework for modeling fossil species survivorship under incomplete sampling. We show that it provides an excellent fit to a high-resolution dataset of species durations for Paleozoic zooplankton and more broadly can account for age-dependent extinction seen throughout the fossil record. Unlike widely used alternative models, the neutral model has parameters with biological meaning, thereby generating testable hypotheses on changes in ancient ecosystems. The success of this approach suggests reinterpretations of mass extinctions and of scaling in eco-evolutionary systems. Intense extinction among young species does not necessarily refute RQ or require a special explanation but can instead be parsimoniously explained by neutral dynamics operating across species regardless of age.
... Amidst this variation, we are unaware of any methods that, within 32 a single cohesive codebase, can simultaneously (i) simulate under arbitrarily complex SSE scenarios (but see 33 [78]), (ii) support an intuitive model specification grammar (e.g., [14,30]), (iii) be easily extended by others 34 to include new models, and (iv) showcase a built-in graphical user interface for automatic visualization and 35 summarization of synthetic data, streamlining user interaction with the software (but see [14]). 36 In the hope of filling this gap in the computational biology toolbox, we introduce a new, open-source 37 computational framework for evolutionary modeling: PhyloJunction. PhyloJunction ships with a very gen-38 eral SSE model simulator and with additional functionalities for model validation and Bayesian analysis. ...
Preprint
Full-text available
We introduce PhyloJunction, a computational framework designed to facilitate the prototyping, testing, and characterization of evolutionary models. PhyloJunction is distributed as an open-source Python library that can be used to implement a variety of models, through its flexible graphical modeling architecture and dedicated model specification language. Model design and use are exposed to users via command-line and graphical interfaces, which integrate the steps of simulating, summarizing, and visualizing data. This paper describes the features of PhyloJunction - which include, but are not limited to, a general implementation of a popular family of phylogenetic diversification models - and, moving forward, how it may be expanded to not only include new models, but to also become a platform for conducting and teaching statistical learning.
... The partitioned analysis was run using the best substitution model selected by bmodeltest (Bouckaert and Drummond 2017) under all the reversible model search implemented in BEAST. The run was also performed under the Birth-Death process as tree prior (Kendall 1948;Rannala and Yang 1996;Gernhard 2008), and the Fast Relaxed Clock Log-Normal (Zhang and Drummond 2020). We used two fossil calibrations, the first for the stem lineage of Doryteuthis Naef 1912 to 48 million years ago (Mya), the minimum age of a statolith fossil from the middle Eocene (Neige et al. 2016). ...
Article
Full-text available
Cephalopod fisheries are increasing, but little is known about the cryptic diversity of some key commercial species. Recent studies have shown that cryptic speciation is common in cephalopods, including several oceanic squids formerly considered ‘cosmopolitan species.’ Further efforts are needed to investigate the cryptic diversity of commercial species, to inform management and support sustainable fisheries practices. Thysanoteuthis rhombus is an oceanic squid, currently recognized as the single species of the family Thysanoteuthidae. Thysanoteuthis. rhombus has a global distribution in tropical and subtropical waters and is an economically important species, with the highest catches occurring off Okinawa in Japan and of potential fishery resource for other countries due to its high abundance and large size. Here, we used sequences from 12S rRNA, 16S rRNA, and cytochrome c oxidase I to characterize its cryptic diversity using samples collected throughout most of its known geographic range. We identified three different putative species whose distributions are concordant with main ocean basins: Thysanoteuthis major, the most abundant species, is widely distributed in the North Pacific Ocean, North Indian Ocean, and limits of the South Atlantic Ocean; Thysanoteuthis rhombus is distributed in the North and South Atlantic Ocean and Mediterranean Sea; and Thysanoteuthis cf. filiferum, likely the least sampled to date, is found in the southwestern Pacific Ocean. A sister relationship was observed between T. rhombus and T. major, and T. cf. filiferum was found to be the most divergent species. Based on our divergence estimation, we hypothesize that the closure of the Isthmus of Panama during the early Pliocene played a significant role in the split of T. rhombus and T. major, while the split of their ancestor from T. cf. filiferum coincided with an increase in the Pacific Walker Circulation and the longitudinal gradient of surface temperatures in the Pacific Ocean during the Late Oligocene and Early Miocene. Our work identifies three different putative species within Thysanoteuthis and has potential use for improving fishery management and conserving the diversity in these species.
... Starting with the seminal works of the paleontologists of the "Woods Hole group" (Raup et al., 1973) and drawing on parallel mathematical progress (Kendall, 1948;Nee et al., 1994;Aldous, 2001;Aldous and Popovic, 2005), a powerful quantitative method has been developed in macroevolution, using birth-death processes as models for the diversification of species. In these so-called lineage-based models of diversification, species are particles that can undergo two kinds of events: speciation, modeled by instantaneous division; and extinction, modeled by instantaneous death. ...
Preprint
Full-text available
In the last two decades, lineage-based models of diversification, where species are viewed as particles that can divide (speciate) or die (become extinct) at rates depending on some evolving trait, have been very popular tools to study macroevolutionary processes. Here, we argue that this approach cannot be used to break down the inner workings of species diversification and that ``opening the species box'' is necessary to understand the causes of macroevolution. We set up a general framework for individual-based models of neutral speciation (i.e. no selection forces other than those acting against hybrids) that rely on a minimal number of mechanistic principles: (i) reproductive isolation is caused by excessive dissimilarity between pheno/genotypes; (ii) dissimilarity results from a balance between differentiation processes and homogenization processes; and (iii) dissimilarity can feed back on these processes by decelerating homogenization. We classify such models according to the main process responsible for homogenization: (1) clonal evolution models (ecological drift), (2) models of genetic isolation (gene flow) and (3) models of isolation by distance (spatial drift). We review these models and their specific predictions on macroscopic variables such as species abundances, speciation rates, interfertility relationships, phylogenetic tree structure... We propose new avenues of research by displaying conceptual questions remaining to be solved and new models to address them: the failure of speciation at secondary contact, the feedback of dissimilarity on homogenization, the emergence in space of reproductive barriers.
... Phylodynamic models can be classified into two main families: coalescent (Volz et al., 2009;Drummond et al., 2005;Pybus et al., 2000) and birth-death (BD) (Kendall, 1948;Maddison et al., 2007;Stadler, 2009Stadler, , 2010. Coalescent models are often preferred for estimating deterministic population dynamics; however, BD models are better adapted for highly stochastic processes, such as the dynamics of emerging pathogens (Macpherson et al., 2021). ...
Article
Full-text available
Multi-type birth-death (MTBD) models are phylodynamic analogies of compartmental models in classical epidemiology. They serve to infer such epidemiological parameters as the average number of secondary infections Re and the infectious time from a phylogenetic tree (a genealogy of pathogen sequences). The representatives of this model family focus on various aspects of pathogen epidemics. For instance, the birth-death exposed-infectious (BDEI) model describes the transmission of pathogens featuring an incubation period (when there is a delay between the moment of infection and becoming infectious, as for Ebola and SARS-CoV-2), and permits its estimation along with other parameters. With constantly growing sequencing data, MTBD models should be extremely useful for unravelling information on pathogen epidemics. However, existing implementations of these models in a phylodynamic framework have not yet caught up with the sequencing speed. Computing time and numerical instability issues limit their applicability to medium data sets ( 500 samples), while the accuracy of estimations should increase with more data. We propose a new highly parallelizable formulation of ordinary differential equations for MTBD models. We also extend them to forests to represent situations when a (sub-)epidemic started from several cases (e.g., multiple introductions to a country). We implemented it for the BDEI model in a maximum likelihood framework using a combination of numerical analysis methods for efficient equation resolution. Our implementation estimates epidemiological parameter values and their confidence intervals in two minutes on a phylogenetic tree of 10 000 samples. Comparison to the existing implementations on simulated data shows that it is not only much faster, but also more accurate. An application of our tool to the 2014 Ebola epidemic in Sierra-Leone is also convincing, with very fast calculation and precise estimates. As MTBD models are closely related to Cladogenetic State Speciation and Extinction (ClaSSE)-like models, our findings could also be easily transferred to the macroevolution domain.
... In particular, R s t (λ, θ ) converges to 0 if λ ≤ θ and converges to a strictly positive value 1 − θ/λ if λ > θ (Kendall 1948), (Yang and Rannala 1997). ...
Article
Full-text available
The current rapid extinction of species leads not only to their loss but also the disappearance of the unique features they harbour, which have evolved along the branches of the underlying evolutionary tree. One proxy for estimating the feature diversity (FD) of a set S of species at the tips of a tree is 'phylogenetic diversity' (PD): the sum of the branch lengths of the subtree connecting the species in S. For a phylogenetic tree that evolves under a standard birth-death process, and which is then subject to a sudden extinction event at the present (the simple 'field of bullets' model with a survival probability of s per species) the proportion of the original PD that is retained after extinction at the present is known to converge quickly to a particular concave function [Formula: see text] as t grows. To investigate how the loss of FD mirrors the loss of PD for a birth-death tree, we model FD by assuming that distinct discrete features arise randomly and independently along the branches of the tree at rate r and are lost at a constant rate [Formula: see text]. We derive an exact mathematical expression for the ratio [Formula: see text] of the two expected feature diversities (prior to and following an extinction event at the present) as t becomes large. We find that although [Formula: see text] has a similar behaviour to [Formula: see text] (and coincides with it for [Formula: see text]), when [Formula: see text], [Formula: see text] is described by a function that is different from [Formula: see text]. We also derive an exact expression for the expected number of features that are present in precisely one extant species. Our paper begins by establishing some generic properties of FD in a more general (non-phylogenetic) setting and applies this to fixed trees, before considering the setting of random (birth-death) trees.
... For small numbers of resistant cells, a deterministic description through an ordinary differential equation is not appropriate because stochastic fluctuations cannot be ignored and extinction events cannot be observed. We use a branching process in a time-heterogeneous environment [46][47][48] to approximate the probability of survival until the end of treatment. This is the probability of having at least one resistant cell in the bacterial population at the end of treatment, which we refer to as emergence of resistance. ...
Article
Full-text available
The use of an antibiotic may lead to the emergence and spread of bacterial strains resistant to this antibiotic. Experimental and theoretical studies have investigated the drug dose that minimizes the risk of resistance evolution over the course of treatment of an individual, showing that the optimal dose will either be the highest or the lowest drug concentration possible to administer; however, no analytical results exist that help decide between these two extremes. To address this gap, we develop a stochastic mathematical model of bacterial dynamics under antibiotic treatment. We explore various scenarios of density regulation (bacterial density affects cell birth or death rates), and antibiotic modes of action (biostatic or biocidal). We derive analytical results for the survival probability of the resistant subpopulation until the end of treatment, the size of the resistant subpopulation at the end of treatment, the carriage time of the resistant subpopulation until it is replaced by a sensitive one after treatment, and we verify these results with stochastic simulations. We find that the scenario of density regulation and the drug mode of action are important determinants of the survival of a resistant subpopulation. Resistant cells survive best when bacterial competition reduces cell birth and under biocidal antibiotics. Compared to an analogous deterministic model, the population size reached by the resistant type is larger and carriage time is slightly reduced by stochastic loss of resistant cells. Moreover, we obtain an analytical prediction of the antibiotic concentration that maximizes the survival of resistant cells, which may help to decide which drug dosage (not) to administer. Our results are amenable to experimental tests and help link the within and between host scales in epidemiological models.
... In Bayesian phylogenetics, the birth-death process enters the analysis as a prior model for the reconstructed phylogeny (the so-called tree prior.) Kendall, 1948 demonstrated how to use generating functions to describe birth-death processes when modelling infectious disease. Later, Nee et al., 1994 connected the process to the number of observed species in a phylogeny, and Stadler, 2010;Stadler et al., 2012 demonstrated how this can be applied when analysing pathogen genomes. ...
Preprint
Full-text available
Accurately estimating the prevalence and transmissibility of an infectious disease is a critical part of genetic infectious disease epidemiology. However, generating accurate estimates of these quantities, informed by both time series and sequencing data, is challenging. Birth-death processes and coalescent-based models are popular methods for modelling the transmission of infectious diseases, but they struggle with estimating the prevalence of infection. We extended our approximation of the likelihood for a point process of viral genomes and time series of case counts so it can estimate historical prevalence, and we implemented this in a BEAST2 package called Timtam. In a simulation study the approximation recovered the parameters from simulated data, even when we aggregated the point process data into a time series of daily case counts. To demonstrate how Timtam can be applied to real datasets, we estimated the reproduction number and the prevalence of infection through time during the SARS-CoV-2 outbreak onboard the Diamond Princess cruise ship using a time series of confirmed cases and sequence data. We found a greater prevalence than previously estimated and comment on how differences in the algorithms used could explain this.
... Phylogenetic approaches for studying species origination and extinction dynamics over deep time rely on the statistical adjustment of stochastic birth-death models (Kendall 1948) to dated phylogenetic trees representing the evolutionary relatedness of species and the dating of their divergence times (Stadler 2013;Morlon 2014;Harmon 2019). An increasing amount of such phylogenetic trees have become available, and has been accompanied by the complexification of diversification models. ...
Article
Birth-death models are widely used in combination with species phylogenies to study past diversification dynamics. Current inference approaches typically rely on likelihood-based methods. These methods are not generalizable, as a new likelihood formula must be established each time a new model is proposed; for some models such formula is not even tractable. Deep learning can bring solutions in such situations, as deep neural networks can be trained to learn the relation between simulations and parameter values as a regression problem. In this paper, we adapt a recently developed deep learning method from pathogen phylodynamics to the case of diversification inference, and we extend its applicability to the case of the inference of state-dependent diversification models from phylogenies associated with trait data. We demonstrate the accuracy and time efficiency of the approach for the time constant homogeneous birth-death model and the Binary-State Speciation and Extinction model. Finally, we illustrate the use of the proposed inference machinery by reanalyzing a phylogeny of primates and their associated ecological role as seed dispersers. Deep learning inference provides at least the same accuracy as likelihood-based inference while being faster by several orders of magnitude, offering a promising new inference approach for deployment of future models in the field.
... The multi-type birth-death process (MTBDP) is a continuous-time Markov chain generalizing the classical birth-death process (Feller, 1968;Kendall, 1948) to a finite number of types. The state of the MTBDP counts the number of individuals (or particles) of each type while they undergo birth, death, and type transition events according to specified rates, which may be arbitrary functions of the current state and of time. ...
Preprint
Multi-type birth-death processes underlie approaches for inferring evolutionary dynamics from phylogenetic trees across biological scales, ranging from deep-time species macroevolution to rapid viral evolution and somatic cellular proliferation. A limitation of current phylogenetic birth-death models is that they require restrictive linearity assumptions that yield tractable likelihoods, but that also preclude interactions between individuals. Many fundamental evolutionary processes -- such as environmental carrying capacity or frequency-dependent selection -- entail interactions, and may strongly influence the dynamics in some systems. Here, we introduce a multi-type birth-death process in mean-field interaction with an ensemble of replicas of the focal process. We prove that, under quite general conditions, the ensemble's stochastically evolving interaction field converges to a deterministic trajectory in the limit of an infinite ensemble. In this limit, the replicas effectively decouple, and self-consistent interactions appear as nonlinearities in the infinitesimal generator of the focal process. We investigate a special case that is amenable to calculations in the context of a phylogenetic birth-death model, and is rich enough to model both carrying capacity and frequency-dependent selection.
... Divergence time estimation was conducted in the software MCMCtree 4.9j, part of package PAML (Yang 1997(Yang , 2007, using the approximate likelihood method (dos Reis and Yang 2011), under a birth-death diversification model (Kendall 1948, Nee et al. 1994, Yang and Rannala 2006. The substitution model used was HKY, with five gamma categories. ...
Article
Long-horned bees (Apidae, Eucerini) are found in different biomes worldwide and include some important crop pollinators. In the Western Hemisphere, Eucerini received extensive taxonomic study during the twentieth century, resulting in several revisions of its genera. In contrast, progress on eucerine phylogenetic research and the genus-level classification has been slow, primarily due to the relatively homogeneous external morphology within the tribe and the rarity of many of its species in collections. Here, we present a comprehensive phylogenetic study of Eucerini based on ultraconserved elements, including 153 species from nearly all genera and subgenera and from all biogeographic regions where they occur. Many of these specimens are from museums and were collected as far back as 1909. We discuss the challenges of working with specimens with highly degraded DNA, present insights into improving phylogenetic results for both species-tree and concatenation approaches, and present a new pipeline for UCE curation (Curation of UltraconseRved Elements—CURE). Our results show the existence of seven main lineages in Eucerini and most of the genera and subgenera to be reciprocally monophyletic. Using a comprehensive and up-to-date phylogenetic framework, we: (1) propose taxonomic changes, including a new subtribal classification and reorganized generic and subgeneric limits; (2) estimate divergence times; and (3) conduct a detailed exploration of historical biogeography of long-horned bees. We find that eucerine lineages expanded their range onto most continents only after their initial diversification in southern South America during the Eocene.
... Then M(t) is the cumulated number of cases reported up to time t. It is known that I(t) and M(t) satisfies the differential equations [21]; ...
... Birth-death processes are often used to describe macroevolutionary patterns (Kendall, 1948;Nee, 2006) and, consequently, are also commonly used in phylogenetic simulators (e.g. Hagen & Stadler, 2018;Höhna, 2013;Höhna et al., 2015;Stadler, 2011). ...
Article
Full-text available
Gene flow is increasingly recognized as an important macroevolutionary process. The many mechanisms that contribute to gene flow (e.g. introgression, hybridization, lateral gene transfer) uniquely affect the diversification of dynamics of species, making it important to be able to account for these idiosyncrasies when constructing phylogenetic models. Existing phylogenetic‐network simulators for macroevolution are limited in the ways they model gene flow. We present SiPhyNetwork , an R package for simulating phylogenetic networks under a birth–death‐hybridization process. Our package unifies the existing birth–death‐hybridization models while also extending the toolkit for modelling gene flow. This tool can create patterns of reticulation such as hybridization, lateral gene transfer, and introgression. Specifically, we model different reticulate events by allowing events to either add, remove or keep constant the number of lineages. Additionally, we allow reticulation events to be trait dependent, creating the ability to model the expanse of isolating mechanisms that prevent gene flow. This tool makes it possible for researchers to model many of the complex biological factors associated with gene flow in a phylogenetic context.
Article
Divergence-time estimation is one of the most important endeavors in historical linguistics. Its importance is matched only by its difficulty. As Bayesian methods of divergence-time estimation have become more common over the past two decades, a number of critical issues have come to the fore, including model sensitivity, the dependence of root-age estimates on uncertain interior-node ages, and the relationship between ancient languages and their modern counterparts. This study addresses these issues in an investigation of a particularly fraught case within Indo-European: the diversification of Latin into the Romance languages. The results of this study support a gradualist account of their formation that most likely began after 300 CE. They also bolster the view that Classical Latin is a sampled ancestor of the Romance languages (i.e., it lies along the branch leading to the Romance languages).
Article
The processes that generate biodiversity start on a microevolutionary scale, where each individual’s history can impact the species’ history. This manuscript presents a theoretical study that examines the macroevolutionary patterns that emerge from the microevolutionary dynamics of populations inhabiting two patches. The model is neutral, meaning that neither survival nor reproduction depends on a fixed genotype, yet individuals must have minimal genetic similarity to reproduce. We used historical sea level oscillation over the past 800 thousand years to hypothesize periods when individuals could migrate from one patch to another. In our study, we keep track of each speciation and extinction event, build the complete and extant phylogenies, and characterize the macroevolutionary patterns regarding phylogeny balance, acceleration of speciation, and crown age. We also evaluate ecological patterns: richness, beta diversity, and species distribution symmetry. The balance of the complete phylogeny can be a sign of the speciation mode, contrasting speciation induced by migration and isolation (vicariance). The acceleration of the speciation process is also affected by the geographical barriers and the duration of the isolation period, with high isolation times leading to accelerated speciation. We report the correlation between ecological and macroevolutionary patterns and show it decreases with the time spent in isolation. We discuss, in light of our results, the challenge of integrating present-time community ecology with macroevolutionary patterns.
Article
We present an approach to computing the probability of epidemic “burnout,” i.e., the probability that a newly emergent pathogen will go extinct after a major epidemic. Our analysis is based on the standard stochastic formulation of the Susceptible-Infectious-Removed (SIR) epidemic model including host demography (births and deaths) and corresponds to the standard SIR ordinary differential equations (ODEs) in the infinite population limit. Exploiting a boundary layer approximation to the ODEs and a birth-death process approximation to the stochastic dynamics within the boundary layer, we derive convenient, fully analytical approximations for the burnout probability. We demonstrate—by comparing with computationally demanding individual-based stochastic simulations and with semi-analytical approximations derived previously—that our fully analytical approximations are highly accurate for biologically plausible parameters. We show that the probability of burnout always decreases with increased mean infectious period. However, for typical biological parameters, there is a relevant local minimum in the probability of persistence as a function of the basic reproduction number R 0 . For the shortest infectious periods, persistence is least likely if R 0 ≈ 2.57 ; for longer infectious periods, the minimum point decreases to R 0 ≈ 2 . For typical acute immunizing infections in human populations of realistic size, our analysis of the SIR model shows that burnout is almost certain in a well-mixed population, implying that susceptible recruitment through births is insufficient on its own to explain disease persistence.
Article
Birth and death Markov processes can model stochastic physical systems from percolation to disease spread and, in particular, wildfires. We introduce and analyze a birth-death-suppression Markov process as a model of controlled culling of an abstract, dynamic population. Using analytic techniques, we characterize the probabilities and timescales of outcomes like absorption at zero (extinguishment) and the probability of the cumulative population (burned area) reaching a given size. The latter requires control over the embedded Markov chain: this discrete process is solved using the Pollazcek orthogonal polynomials, a deformation of the Gegenbauer/ultraspherical polynomials. This allows analysis of processes with bounded cumulative population, corresponding to finite burnable substrate in the wildfire interpretation, with probabilities represented as spectral integrals. This technology is developed to lay the foundations for a dynamic decision support framework. We devise real-time risk metrics and suggest future directions for determining optimal suppression strategies, including multievent resource allocation problems and potential applications for reinforcement learning.
Article
In this article, we focus on nonhomogeneous Markov processes (NHMPs), which are generalizations of the well-known homogeneous Markov processes (HMPs) and nonhomogeneous Poisson processes, and compare two software reliability models (SRMs) which can be classified into a generalized binomial process (GBP) and a generalized Polya process (GPP). GBP and GPP are also characterized, respectively, as a Markov inverse death process and a Markov birth process, with state- and time-dependent transition rates. We develop a unified software reliability modeling framework based on the NHMPs and apply it to the software reliability prediction. Through numerical examples with the fault count data observed in actual closed-source software (CSS) and open-source software (OSS) development projects, we compare two SRMs (GBP and GPP) in terms of the goodness-of-fit and predictive performances, in addition to the quantitative software reliability assessment. We also consider software release problems with these generalized SRMs, and investigate the impact on the software release decision.
Article
Ultimately, the eventual extinction of any biological population is an inevitable outcome. While extensive research has focused on the average time it takes for a population to go extinct under various circumstances, there has been limited exploration of the distributions of extinction times and the likelihood of significant fluctuations. Recently, Hathcock and Strogatz [D. Hathcock and S. H. Strogatz, Phys. Rev. Lett. 128, 218301 (2022)] identified Gumbel statistics as a universal asymptotic distribution for extinction-prone dynamics in a stable environment. In this study we aim to provide a comprehensive survey of this problem by examining a range of plausible scenarios, including extinction-prone, marginal (neutral), and stable dynamics. We consider the influence of demographic stochasticity, which arises from the inherent randomness of the birth-death process, as well as cases where stochasticity originates from the more pronounced effect of random environmental variations. Our work proposes several generic criteria that can be used for the classification of experimental and empirical systems, thereby enhancing our ability to discern the mechanisms governing extinction dynamics. Employing these criteria can help clarify the underlying mechanisms driving extinction processes.
Article
During an epidemic outbreak, typically only partial information about the outbreak is known. A common scenario is that the infection times of individuals are unknown, but individuals, on displaying symptoms, are identified as infectious and removed from the population. We study the distribution of the number of infectives given only the times of removals in a Markovian susceptible–infectious–removed (SIR) epidemic. Primary interest is in the initial stages of the epidemic process, where a branching (birth–death) process approximation is applicable. We show that the number of individuals alive in a time-inhomogeneous birth–death process at time $t \geq 0$ , given only death times up to and including time t , is a mixture of negative binomial distributions, with the number of mixing components depending on the total number of deaths, and the mixing weights depending upon the inter-arrival times of the deaths. We further consider the extension to the case where some deaths are unobserved. We also discuss the application of the results to control measures and statistical inference.
Article
Full-text available
Phylogenetic models have become increasingly complex, and phylogenetic data sets have expanded in both size and richness. However, current inference tools lack a model specification language that can concisely describe a complete phylogenetic analysis while remaining independent of implementation details. We introduce a new lightweight and concise model specification language, 'LPhy', which is designed to be both human and machine-readable. A graphical user interface accompanies 'LPhy', allowing users to build models, simulate data, and create natural language narratives describing the models. These narratives can serve as the foundation for manuscript method sections. Additionally, we present a command-line interface for converting LPhy-specified models into analysis specification files (in XML format) compatible with the BEAST2 software platform. Collectively, these tools aim to enhance the clarity of descriptions and reporting of probabilistic models in phylogenetic studies, ultimately promoting reproducibility of results.
Article
A population experiencing habitat loss can avoid extinction by undergoing genetic adaptation-a process known as evolutionary rescue. Here we analytically approximate the probability of evolutionary rescue via a niche-constructing mutation that allows carriers to convert a novel, unfavorable reproductive habitat to a favorable state at a cost to their fecundity. We analyze competition between mutants and non-niche-constructing wild types, who ultimately require the constructed habitats to reproduce. We find that over-exploitation of the constructed habitats by wild types can generate damped oscillations in population size shortly after mutant invasion, thereby decreasing the probability of rescue. Such post-invasion extinction is less probable when construction is infrequent, habitat loss is common, the reproductive environment is large, or the population's carrying capacity is small. Under these conditions, wild types are less likely to encounter the constructed habitats and, consequently, mutants are more likely to fix. These results suggest that, without a mechanism that deters wild type inheritance of the constructed habitats, a population undergoing rescue via niche construction may remain prone to short-timescale extinction despite successful mutant invasion.
Article
Recent theoretical work on phylogenetic birth-death models offers differing viewpoints on whether they can be estimated using lineage-through-time data. Louca and Pennell (2020) showed that the class of models with continuously differentiable rate functions is nonidentifiable: any such model is consistent with an infinite collection of alternative models, which are statistically indistinguishable regardless of how much data are collected. Legried and Terhorst (2022) qualified this grave result by showing that identifiability is restored if only piecewise constant rate functions are considered. Here, we contribute new theoretical results to this discussion, in both the positive and negative directions. Our main result is to prove that models based on piecewise polynomial rate functions of any order and with any (finite) number of pieces are statistically identifiable. In particular, this implies that spline-based models with an arbitrary number of knots are identifiable. The proof is simple and self-contained, relying mainly on basic algebra. We complement this positive result with a negative one, which shows that even when identifiability holds, rate function estimation is still a difficult problem. To illustrate this, we prove some rates-of-convergence results for hypothesis testing using birth-death models. These results are information-theoretic lower bounds which apply to all potential estimators.
ResearchGate has not been able to resolve any references for this publication.