ArticlePublisher preview available

Extant timetrees are consistent with a myriad of diversification histories

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract and Figures

Time-calibrated phylogenies of extant species (referred to here as ‘extant timetrees’) are widely used for estimating diversification dynamics¹. However, there has been considerable debate surrounding the reliability of these inferences2–5 and, to date, this critical question remains unresolved. Here we clarify the precise information that can be extracted from extant timetrees under the generalized birth–death model, which underlies most existing methods of estimation. We prove that, for any diversification scenario, there exists an infinite number of alternative diversification scenarios that are equally likely to have generated any given extant timetree. These ‘congruent’ scenarios cannot possibly be distinguished using extant timetrees alone, even in the presence of infinite data. Importantly, congruent diversification scenarios can exhibit markedly different and yet similarly plausible dynamics, which suggests that many previous studies may have over-interpreted phylogenetic evidence. We introduce identifiable and easily interpretable variables that contain all available information about past diversification dynamics, and demonstrate that these can be estimated from extant timetrees. We suggest that measuring and modelling these identifiable variables offers a more robust way to study historical diversification dynamics. Our findings also make it clear that palaeontological data will continue to be crucial for answering some macroevolutionary questions.
Identifiability issues persist in large trees a–c, Diversification analysis of a timetree (about 114,000 tips) simulated from a birth–death process that exhibits a mass extinction event at around 5 Myr before present. a, LTT of the generated tree (long-dashed curve), dLTT of the true model that generated the tree (continuous curve) and dLTT of a maximum-likelihood fitted model (short-dashed curve) are shown. The fitted dLTT is practically identical to the true dLTT and thus is covered by the latter. b, True speciation and extinction rates (continuous curves), compared to fitted speciation and extinction rates (dashed curves). There is considerable disagreement between the fitted and true λ and μ, despite the fact that the allowed model set could—in principle—approximate the true rates reasonably well. c, Pulled diversification rate (PDR) of the true model (continuous curve), compared to the pulled diversification rate of the fitted model (dashed curve). d–f, Diversification analysis of a timetree (about 785,000 tips) simulated from a birth–death process that exhibits a rapid radiation event at around 5 Myr before present and a mass extinction event at around 2 Myr before present. d–f are analogous to a–c. There is considerable disagreement between the fitted and true λ and μ, despite the fact that the allowed model set could—in principle—approximate the true rates reasonably well. Extended Data Figure 7 provides the fitting results when μ is fixed to its true value. g–i, Diversification analyses of an extant timetree of 79,874 seed plant species, performed either by fitting λ and μ on a grid of discrete time points or by fitting the parameters of generic polynomial or exponential functions for λ and μ. g, LTT of the tree, dLTT of the grid-fitted model and dLTT of the fitted parametric model. h, Speciation and extinction rates predicted by the grid-fitted model or the fitted parametric model. i, Pulled diversification rate predicted by the grid-fitted model and the fitted parametric model. Further details are provided in Supplementary Information sections S.10 and S.11.
… 
This content is subject to copyright. Terms and conditions apply.
502 | Nature | Vol 580 | 23 April 2020
Article
Extant timetrees are consistent with a
myriad of diversification histories
Stilianos Louca1,2 ✉ & Matthew W. Pennell3,4 ✉
Time-calibrated phylogenies of extant species (referred to here as ‘extant timetrees’)
are widely used for estimating diversication dynamics1. However, there has been
considerable debate surrounding the reliability of these inferences2–5 and, to date, this
critical question remains unresolved. Here we clarify the precise information that can
be extracted from extant timetrees under the generalized birth–death model, which
underlies most existing methods of estimation. We prove that, for any diversication
scenario, there exists an innite number of alternative diversication scenarios that
are equally likely to have generated any given extant timetree. These ‘congruent’
scenarios cannot possibly be distinguished using extant timetrees alone, even in the
presence of innite data. Importantly, congruent diversication scenarios can exhibit
markedly dierent and yet similarly plausible dynamics, which suggests that many
previous studies may have over-interpreted phylogenetic evidence. We introduce
identiable and easily interpretable variables that contain all available information
about past diversication dynamics, and demonstrate that these can be estimated
from extant timetrees. We suggest that measuring and modelling these identiable
variables oers a more robust way to study historical diversication dynamics. Our
ndings also make it clear that palaeontological data will continue to be crucial for
answering some macroevolutionary questions.
A central challenge in evolutionary biology is to reconstruct rates of
speciation and extinction over time5. Unfortunately, the majority of
taxa that have ever lived have not left much trace in the fossil record,
and the primary source of information on their past diversification
dynamics therefore comes from extant timetrees. Many methods
have been developed for extracting this information; most methods
fit variants of a birth–death process
1,6
. Despite the popularity of these
methods, which collectively have been used in thousands of studies
7–9
,
their reliability has been called into question by comparisons with
fossil-based estimates1,3,5,6,10. The reasoning behind these critiques
is that there may be insufficient information in extant timetrees to
fully reconstruct historical diversification dynamics. However, this
critical issue has remained unresolved; it is unknown precisely what
information on speciation and extinction rates is contained in extant
timetrees.
Here we present a definite answer to this question for the general
stochastic birth–death process with homogeneous (that is, lineage-
independent) rates, in which speciation (‘birth’) rates (λ) and extinc-
tion (‘death’) rates (μ) can vary over time, that underlies the majority
of existing methods for reconstructing diversification dynamics from
phylogenies
1
. We mathematically show that, for any givencandidate
birth–death model, there exists an infinite number of alternative birth–
death models that can explain any extant timetree equally as well as
can the candidate model. These alternative models may appear to be
similarly plausible and yet exhibit markedly different features, such as
different trends through time in both λ and μ. This severe ambiguity
persists for arbitrarily large trees and cannot be resolved even with an
infinite amount of data; it is thus impossible to design asymptotically
consistent estimators for λ and μ. Using simulated and real timetrees
as examples, we demonstrate how failing to recognize this issue can
seriously mislead our inferences about past diversification dynamics.
We present appropriately modified variables that are asymptotically
identifiable and that contain all available information on historical
diversification dynamics.
Lineages through time
An important feature of extant timetrees is the lineages-through-time
curve (LTT), which counts the number of lineages at each time in the
past that are represented by at least one sampled extant descending
species in the tree. The likelihood of a tree under a given birth–death
model, the LLT of the tree and the LTT that would be expected under
the model are linked as follows. Any given combination of (potentially
time-dependent) speciation and extinction rates (λ and μ, respectively)
and the probability that an extant species will be included in the tree
(‘sampling fraction’) (ρ) can be used to define a deterministic diver-
sification process, in which the number of lineages through time no
longer varies stochastically but instead according to a set of differ-
ential equations
6,11
(Supplementary Information section S.1). Given
a number of extant sampled species (M
o
), the LTT predicted by these
https://doi.org/10.1038/s41586-020-2176-1
Received: 14 September 2019
Accepted: 10 March 2020
Published online: 15 April 2020
Check for updates
1Department of Biology, University of Oregon, Eugene, OR, USA. 2Institute of Ecology and Evolution, University of Oregon, Eugene, OR, USA. 3Biodiversity Research Centre, University of British
Columbia, Vancouver, British Columbia, Canada. 4Department of Zoology, University of British Columbia, Vancouver, British Columbia, Canada. e-mail: louca.research@gmail.com; pennell@
zoology.ubc.ca
Content courtesy of Springer Nature, terms of use apply. Rights reserved
... The potential nonidentifiability of evolutionary rates based on extant phylogenies has recently become a topic of fierce debate (Louca and Pennell 2020;Legried and Terhorst 2022;Morlon et al. 2022;Kopperud et al. 2023). Louca and Pennell (2020) describe how speciation and extinction functions are drawn from congruence classes, within which each pair of functions is equally likely to fit the evolutionary history of a given phylogeny. ...
... The potential nonidentifiability of evolutionary rates based on extant phylogenies has recently become a topic of fierce debate (Louca and Pennell 2020;Legried and Terhorst 2022;Morlon et al. 2022;Kopperud et al. 2023). Louca and Pennell (2020) describe how speciation and extinction functions are drawn from congruence classes, within which each pair of functions is equally likely to fit the evolutionary history of a given phylogeny. As a result, it may not be possible to discern which ...
Article
Full-text available
The current status of the Sino-Himalayan region as a biodiversity hotspot, particularly for flora, has often been linked to the uplift of the Sino-Tibetan Plateau and Himalayan and Hengduan Mountains. However, the relationship between the topological development of the region and the onset of diversification is yet to be confirmed. Here, we apply Bayesian phylodynamic methods to a large phylogeny of angiosperm species from the Sino-Himalayas, to infer changes in their evolutionary rates through time. We find strong evidence for high diversification rates in the Paleocene, late Miocene and Pliocene, and for negative diversification rates in the Quaternary, driven by an increase in extinction rates. Our analyses suggest that changes in global palaeotemperatures are unlikely to be a driving force for these rate shifts. Instead, the collision of the Indian continent with Eurasia and coeval topographic change in the Sino-Himalayas, the Miocene Grassland Expansion, and the impact of Pleistocene glaciations on this altitudinally-variable region may have driven these rates. We also demonstrate the strong influence of change time choice on the shape of inferred piecewise-constant trajectories in Bayesian phylodynamics, and advocate for the use of prior information when making this decision. Supplementary material at https://doi.org/10.6084/m9.figshare.c.7179254
... It is also widely recognised that diversity patterns in the fossil record are skewed by geological and anthropogenic biases 1,6,[30][31][32][33] , fuelling development of increasingly sophisticated methods for quantifying diversification dynamics from incomplete, biased fossil occurrence data. In the last decade, Bayesian approaches, which couple birth-death and preservation processes have enabled estimation of sampling-corrected origination and extinction rates from fossil occurrences [34][35][36][37] , avoiding the problems of inferring these fundamental rates from extant phylogenies [38][39][40][41][42] . In turn, lineage birth and death rates can be modelled as functions of their potential drivers [43][44][45] , permitting separate consideration of the factors that promoted origination or drove extinction. ...
Article
Full-text available
Palaeontologists have long sought to explain the diversification of individual clades to whole biotas at global scales. Advances in our understanding of the spatial distribution of the fossil record through geological time, however, has demonstrated that global trends in biodiversity were a mosaic of regionally heterogeneous diversification processes. Drivers of diversification must presumably have also displayed regional variation to produce the spatial disparities observed in past taxonomic richness. Here, we analyse the fossil record of ammonoids, pelagic shelled cephalopods, through the Late Cretaceous, characterised by some palaeontologists as an interval of biotic decline prior to their total extinction at the Cretaceous-Paleogene boundary. We regionally subdivide this record to eliminate the impacts of spatial sampling biases and infer regional origination and extinction rates corrected for temporal sampling biases using Bayesian methods. We then model these rates using biotic and abiotic drivers commonly inferred to influence diversification. Ammonoid diversification dynamics and responses to this common set of diversity drivers were regionally heterogeneous, do not support ecological decline, and demonstrate that their global diversification signal is influenced by spatial disparities in sampling effort. These results call into question the feasibility of seeking drivers of diversity at global scales in the fossil record.
... In addition to fitness, pathogen phylogenies can be strongly shaped by sampling 287 biases [64][65][66]. The major risk here is that oversampling of certain genotypes or 288 lineages could produce a higher branching rate that would be difficult to distinguish 289 from higher pathogen fitness [67][68][69]. This is a particular concern as our ST131 290 samples largely came from studies (here grouped by NCBI Bioprojects) that were 291 specifically looking for either fluoroquinolone or beta-lactam resistance. ...
Preprint
Full-text available
Antimicrobial resistant pathogens such as Escherichia coli sequence type 131 (ST131) pose a serious threat to public health globally. In the United States, ST131 acquired multiple antimicrobial resistance (AMR) genes and rapidly grew to its current high prevalence in healthcare settings. Notably, this coincided with the introduction and widespread use of antibiotics such as fluoroquinolones, suggesting AMR as the major driver of ST131’s expansion. Yet, within ST131, there remains considerable diversity between strains in resistance profiles and their repertoires of virulence factors, stress factors, plasmids, and other accessory elements. Understanding which genomic features contribute to ST131’s competitive advantage and their relative effects on population-level fitness therefore poses a considerable challenge. Here we use phylodynamic birth-death models to estimate the relative fitness of different ST131 lineages from bacterial phylogenies. By extending these phylodynamic methods to allow multiple genomic features to shape bacterial fitness, we further quantify the relative contribution of individual AMR genes to ST131’s fitness. Our analysis indicates that while many genomic elements, including various AMR genes, virulence factors, and plasmids, have all contributed substantially to ST131’s rapid growth, major increases in ST131’s fitness are largely attributable to mutations in gyrase A that confer resistance to fluoroquinolones. Author summary ST131 is a pandemic lineage of E. coli that has spread globally and is now responsible for a large percentage of blood and urinary tract infections that cannot be treated with many common antibiotics. While antibiotic resistance has undoubtedly given ST131 a competitive edge, the relative importance of resistance compared with other factors shaping a pathogen’s growth or transmission potential (i.e. fitness) is often difficult to measure in natural settings. Here, we present a method that allows us to look at the entire spectrum of factors determining a pathogen’s fitness and estimate the individual contribution of each component to pathogen’s overall fitness. Our results suggest that resistance to fluoroquinolones, a widely used class of antibiotics, provides ST131 with a disproportionately large fitness advantage relative to many other factors with more moderate fitness effects. Understanding what determines the fitness of ST131 therefore provides insights that can be used to curb the spread of resistance and monitor for emerging lineages with high pandemic potential due to shared fitness enhancing attributes.
Preprint
Full-text available
A phylogenetic tree has three types of attributes: size, shape (topology), and branch lengths. Phylodynamic studies are often motivated by questions regarding the size of clades, nevertheless, nearly all of the inference methods only make use of the other two attributes. In this paper, we ask whether there is additional information if we consider tree size more explicitly in phylodynamic inference methods. To address this question, we first needed to be able to compute the expected tree size distribution under a specified phylodynamic model; perhaps surprisingly, there is not a general method for doing so --- it is known what this is under a Yule or constant rate birth-death model but not for the more complicated scenarios researchers are often interested in. We present three different solutions to this problem: using i) the deterministic limit; ii) master equations; and iii) an ensemble moment approximation. Using simulations, we evaluate the accuracy of these three approaches under a variety of scenarios and alternative measures of tree size (i.e., sampling through time or only at the present; sampling ancestors or not). We then use the most accurate measures for the situation, to investigate the added informational content of tree size. We find that for two critical phylodynamic questions --- i) is diversification diversity dependent? and, ii) can we distinguish between alternative diversification scenarios? --- knowing the expected tree size distribution under the specified scenario provides insights that could not be gleaned from considering the expected shape and branch lengths alone. The contribution of this paper is both a novel set of methods for computing tree size distributions and a path forward for richer phylodynamic inference into the evolutionary and epidemiological processes that shape lineage trees.
Article
The evolutionary histories of major clades, including mammals, often comprise changes in their diversification dynamics, but how these changes occur remains debated. We combined comprehensive phylogenetic and fossil information in a new “birth-death diffusion” model that provides a detailed characterization of variation in diversification rates in mammals. We found an early rising and sustained diversification scenario, wherein speciation rates increased before and during the Cretaceous-Paleogene (K-Pg) boundary. The K-Pg mass extinction event filtered out more slowly speciating lineages and was followed by a subsequent slowing in speciation rates rather than rebounds. These dynamics arose from an imbalanced speciation process, with separate lineages giving rise to many, less speciation-prone descendants. Diversity seems to have been brought about by these isolated, fast-speciating lineages, rather than by a few punctuated innovations.
Article
Estimating how traits evolved and impacted diversification across the tree of life represents a critical topic in ecology and evolution. Although there has been considerable research in comparative biology, large parts of the tree of life remain underexplored. Sharks are an iconic clade of marine vertebrates, and key components of marine ecosystems since the early Mesozoic. However, few studies have addressed how traits evolved or whether they impacted their extant diversity patterns. Our study aimed to fill this gap by reconstructing the largest time-calibrated species-level phylogeny of sharks and compiling an exhaustive database for ecological (diet, habitat) and biological (reproduction, maximum body length) traits. Using state-of-the-art models of evolution and diversification, we outlined the major character shifts and modes of trait evolution across shark species. We found support for sequential models of trait evolution and estimated a small to medium-sized lecithotrophic and coastal-dwelling most recent common ancestor for extant sharks. However, our exhaustive hidden traits analyses do not support trait-dependent diversification for any examined traits, challenging previous works. This suggests that the role of traits in shaping sharks' diversification dynamics might have been previously overestimated and should motivate future macroevolutionary studies to investigate other drivers of diversification in this clade.
Article
Background Squamata (lizards, snakes, and amphisbaenians) is a Triassic lineage with an extensive and complex biogeographic history, yet no large-scale study has reconstructed the ancestral range of early squamate lineages. The fossil record indicates a broadly Pangaean distribution by the end- Cretaceous, though many lineages (e.g., Paramacellodidae, Mosasauria, Polyglyphanodontia) subsequently went extinct. Thus, the origin and occupancy of extant radiations is unclear and may have been localized within Pangaea to specific plates, with potential regionalization to distinct Laurasian and Gondwanan landmasses during the Mesozoic in some groups. Methods We used recent tectonic models to code extant and fossil squamate distributions occurring on nine discrete plates for 9,755 species, with Jurassic and Cretaceous fossil constraints from three extinct lineages. We modeled ancestral ranges for crown Squamata from an extant-only molecular phylogeny using a suite of biogeographic models accommodating different evolutionary processes and fossil-based node constraints from known Jurassic and Cretaceous localities. We hypothesized that the best-fit models would not support a full Pangaean distribution (i.e., including all areas) for the origin of crown Squamata, but would instead show regionalization to specific areas within the fragmenting supercontinent, likely in the Northern Hemisphere where most early squamate fossils have been found. Results Incorporating fossil data reconstructs a localized origin within Pangaea, with early regionalization of extant lineages to Eurasia and Laurasia, while Gondwanan regionalization did not occur until the middle Cretaceous for Alethinophidia, Scolecophidia, and some crown Gekkotan lineages. While the Mesozoic history of extant squamate biogeography can be summarized as a Eurasian origin with dispersal out of Laurasia into Gondwana, their Cenozoic history is complex with multiple events (including secondary and tertiary recolonizations) in several directions. As noted by previous authors, squamates have likely utilized over-land range expansion, land-bridge colonization, and trans-oceanic dispersal. Tropical Gondwana and Eurasia hold more ancient lineages than the Holarctic (Rhineuridae being a major exception), and some asymmetries in colonization (e.g., to North America from Eurasia during the Cenozoic through Beringia) deserve additional study. Future studies that incorporate fossil branches, rather than as node constraints, into the reconstruction can be used to explore this history further.
Article
Full-text available
Angiosperms are the cornerstone of most terrestrial ecosystems and human livelihoods1,2. A robust understanding of angiosperm evolution is required to explain their rise to ecological dominance. So far, the angiosperm tree of life has been determined primarily by means of analyses of the plastid genome3,4. Many studies have drawn on this foundational work, such as classification and first insights into angiosperm diversification since their Mesozoic origins5–7. However, the limited and biased sampling of both taxa and genomes undermines confidence in the tree and its implications. Here, we build the tree of life for almost 8,000 (about 60%) angiosperm genera using a standardized set of 353 nuclear genes⁸. This 15-fold increase in genus-level sampling relative to comparable nuclear studies⁹ provides a critical test of earlier results and brings notable change to key groups, especially in rosids, while substantiating many previously predicted relationships. Scaling this tree to time using 200 fossils, we discovered that early angiosperm evolution was characterized by high gene tree conflict and explosive diversification, giving rise to more than 80% of extant angiosperm orders. Steady diversification ensued through the remaining Mesozoic Era until rates resurged in the Cenozoic Era, concurrent with decreasing global temperatures and tightly linked with gene tree conflict. Taken together, our extensive sampling combined with advanced phylogenomic methods shows the deep history and full complexity in the evolution of a megadiverse clade.
Article
In the last decade, the Fossilized Birth–Death (FBD) process has yielded interesting clues about the evolution of biodiversity through time. To facilitate such studies, we extend our method to compute the probability density of phylogenetic trees of extant and extinct taxa in which the only temporal information is provided by the fossil ages (i.e. without the divergence times) in order to deal with the piecewise constant FBD process, known as the “skyline FBD”, which allows rates to change between pre‐defined time intervals, as well as modelling extinction events at the bounds of these intervals. We develop approaches based on this method to assess hypotheses about the diversification process and to answer questions such as “Does a mass extinction occur at this time?” or “Is there a change in the fossilization rate between two given periods?”. Our software can also yield Bayesian and maximum‐likelihood estimates of the parameters of the skyline FBD model under various constraints. These approaches are applied to a simulated dataset in order to test their ability to answer the questions above. Finally, we study an updated dataset of Permo‐Carboniferous synapsids to get additional insights into the dynamics of biodiversity change in three clades (Ophiacodontidae, Edaphosauridae and Sphenacodontidae) in the Pennsylvanian (Late Carboniferous) and Cisuralian (Early Permian), and to assess support for end‐Sakmarian (or Artinskian) and end‐Cisuralian mass extinction events discussed in previous studies.
Article
Colonization of a novel geographic area is a classic source of ecological opportunity. Likewise, complex microhabitats are thought to promote biodiversity. We sought to reconcile these two predictions when they are naturally opposing outcomes. We assess the macroevolutionary consequences of an ancestral shift from benthic to pelagic microhabitat zones on rates of speciation and phenotypic evolution in North American minnows. Pelagic species have more similar phenotypes and slower rates of phenotypic evolution, but faster speciation rates, than benthic species. These are likely two independent, opposing responses to specialization along the benthic-pelagic axis, as rates of phenotypic evolution and speciation are not directly correlated. The pelagic zone is more structurally homogenous and offers less ecological opportunity, acting as an ecological dead end for minnows. In contrast, pelagic species may be more mobile and prone to dispersal and subsequent geographic isolation and, consequently, experience elevated instances of allopatric speciation. Microhabitat shifts can have decoupled effects on different dimensions of biodiversity, highlighting the need for nuance when interpreting the macroevolutionary consequences of ecological opportunity.
Article
Full-text available
Diversification rates vary over time, yet the factors driving these variations remain unclear. Temporal declines in speciation rates have often been interpreted as the effect of ecological limits, competition, and diversity dependence, emphasising the role of biotic factors. Abiotic factors, such as climate change, are also supposed to have affected diversification rates over geological time scales, yet direct tests of these presumed effects have mainly been limited to few clades well represented in the fossil record. If warmer climatic periods have sustained faster speciation, this could explain slowdowns in speciation during the Cenozoic climate cooling. Here, we apply state‐of‐the art diversity‐dependent and temperature‐dependent phylogenetic models of diversification to 218 tetrapod families, along with constant rate and time‐dependent models. We confirm the prevalence of diversification slowdowns, and find as much support for temperature‐dependent than diversity‐dependent models. These results call for a better integration of these two processes in studies of diversification dynamics.
Article
Full-text available
Stochastic birth-death models provide the foundation for studying and simulating evolutionary trees in phylodynamics. A curious feature of such models is that they exhibit fundamental symmetries when the birth and death rates are interchanged. In this paper, we first provide intuitive reasons for these known transformational symmetries. We then show that these transformational symmetries (encoded in algebraic identities) are preserved even when individuals at the present are sampled with some probability. However, these extended symmetries require the death rate parameter to sometimes take a negative value. In the last part of this paper, we describe the relevance of these transformations and their application to computational phylodynamics, particularly to maximum likelihood and Bayesian inference methods, as well as to model selection.
Article
Full-text available
Significance Some branches of the tree of life are incredibly diverse, while others are represented by only a few living species. Ultimately, this difference reflects the balance of the formation and the extinction of species. Countless explanations have been proposed for why the rates of these two processes vary between lineages, including aspects of the organisms themselves and the environments they live in. Here we reveal that a substantial amount of variation in these rates is associated with a simple factor: time. Younger groups appear to accumulate diversity at much faster rates than older groups. This time scaling of macroevolutionary rates suggests that there may be hidden generalities governing the diversification of life on Earth.
Article
Full-text available
Measuring the pace at which speciation and extinction occur is fundamental to understanding the origin and evolution of biodiversity. Both the fossil record and molecular phylogenies of living species can provide independent estimates of speciation and extinction rates, but often produce strikingly divergent results. Despite its implications, the theoretical reasons for this discrepancy remain unknown. Here, we reveal a conceptual and methodological basis able to reconcile palaeontological and molecular evidence: discrepancies are driven by different implicit assumptions about the processes of speciation and species evolution in palaeontological and neontological analyses. We present the “birth-death chronospecies” model that clarifies the definition of speciation and extinction processes allowing for a coherent joint analysis of fossil and phylogenetic data. Using simulations and empirical analyses we demonstrate not only that this model explains much of the apparent incongruence between fossils and phylogenies, but that differences in rate estimates are actually informative about the prevalence of different speciation modes.
Article
Full-text available
Numerous studies have estimated plant and animal diversification dynamics; however, no comparable rigorous estimates exist for bacteria-the most ancient and widespread form of life on Earth. Here, we analyse phylogenies comprising up to 448,112 bacterial lineages to reconstruct global bacterial diversification dynamics. To handle such large phylogenies, we developed methods based on the statistical properties of infinitely large trees. We further analysed sequencing data from 60 environmental studies to determine the fraction of extant bacterial diversity missing from the phylogenies-a crucial parameter for estimating speciation and extinction rates. We estimate that there are about 1.4-1.9 million extant bacterial lineages when lineages are defined by 99% similarity in the 16S ribosomal RNA gene, and that bacterial diversity has been continuously increasing over the past 1 billion years (Gyr). Recent bacterial extinction rates are estimated at 0.03-0.05 per lineage per million years (lineage-1 Myr-1), and are only slightly below estimated recent bacterial speciation rates. Most bacterial lineages ever to have inhabited this planet are estimated to be extinct. Our findings disprove the notion that bacteria are unlikely to go extinct, and provide a valuable perspective on the evolutionary history of a domain of life with a sparse and cryptic fossil record.
Article
Full-text available
Far more species of organisms are found in the tropics than in temperate and polar regions, but the evolutionary and ecological causes of this pattern remain controversial1,2. Tropical marine fish communities are much more diverse than cold-water fish communities found at higher latitudes3,4, and several explanations for this latitudinal diversity gradient propose that warm reef environments serve as evolutionary ‘hotspots’ for species formation5,6,7,8. Here we test the relationship between latitude, species richness and speciation rate across marine fishes. We assembled a time-calibrated phylogeny of all ray-finned fishes (31,526 tips, of which 11,638 had genetic data) and used this framework to describe the spatial dynamics of speciation in the marine realm. We show that the fastest rates of speciation occur in species-poor regions outside the tropics, and that high-latitude fish lineages form new species at much faster rates than their tropical counterparts. High rates of speciation occur in geographical regions that are characterized by low surface temperatures and high endemism. Our results reject a broad class of mechanisms under which the tropics serve as an evolutionary cradle for marine fish diversity and raise new questions about why the coldest oceans on Earth are present-day hotspots of species formation.
Article
Full-text available
Premise of the Study: Large phylogenies can help shed light on macroevolutionary patterns that inform our understanding of fundamental processes that shape the tree of life. These phylogenies also serve as tools that facilitate other systematic, evolutionary, and ecological analyses. Here we combine genetic data from public repositories (GenBank) with phylogenetic data (Open Tree of Life project) to construct a dated phylogeny for seed plants. Methods: We conducted a hierarchical clustering analysis of publicly available molecular data for major clades within the Spermatophyta. We constructed phylogenies of major clades, estimated divergence times, and incorporated data from the Open Tree of Life project, resulting in a seed plant phylogeny. We estimated diversification rates, excluding those taxa without molecular data. We also summarized topological uncertainty and data overlap for each major clade. Key Results: The trees constructed for Spermatophyta consisted of 79,881 and 353,185 terminal taxa; the latter included the Open Tree of Life taxa for which we could not include molecular data from GenBank. The diversification analyses demonstrated nested patterns of rate shifts throughout the phylogeny. Data overlap and inference uncertainty show significant variation throughout and demonstrate the continued need for data collection across seed plants. Conclusions: This study demonstrates a means for combining available resources to construct a dated phylogeny for plants. However, this approach is an early step and more developments are needed to add data, better incorporating underlying uncertainty, and improve resolution. The methods discussed here can also be applied to other major clades in the tree of life.
Article
Full-text available
A birth-death-sampling model gives rise to phylogenetic trees with samples from the past and the present. Interpreting “birth” as branching speciation, “death” as extinction, and “sampling” as fossil preservation and recovery, this model – also referred to as the fossilized birth-death (FBD) model – gives rise to phylogenetic trees on extant and fossil samples. The model has been mathematically analyzed and successfully applied to a range of datasets on different taxonomic levels, such as penguins, plants, and insects. However, the current mathematical treatment of this model does not allow for a group of temporally distinct fossil specimens to be assigned to the same species. In this paper, we provide a general mathematical FBD modeling framework that explicitly takes “stratigraphic ranges” into account, with a stratigraphic range being defined as the lineage interval associated with a single species, ranging through time from the first to the last fossil appearance of the species. To assign a sequence of fossil samples in the phylogenetic tree to the same species, i.e., to specify a stratigraphic range, we need to define the mode of speciation. We provide expressions to account for three common speciation modes: budding (or asymmetric) speciation, bifurcating (or symmetric) speciation, and anagenetic speciation. Our equations allow for flexible joint Bayesian analysis of paleontological and neontological data. Furthermore, our framework is directly applicable to epidemiology, where a stratigraphic range is the observed duration of infection of a single patient, “birth” via budding is transmission, “death” is recovery, and “sampling” is sequencing the pathogen of a patient. Thus, we present a model that allows for incorporation of multiple observations through time from a single patient.
Article
Motivation: The birth-death model constitutes the theoretical backbone of most phylogenetic tools for reconstructing speciation/extinction dynamics over time. Performing simulations of reconstructed trees (linking extant taxa) under the birth-death model in backward time, conditioned on the number of species sampled at present day and, in some cases, a specific time interval since the most recent common ancestor (MRCA), is needed for assessing the performance of reconstruction tools, for parametric bootstrapping and for detecting data outliers. The few simulation tools that exist scale poorly to large modern phylogenies, which can comprise thousands or even millions of tips (and rising). Results: Here I present efficient software for simulating reconstructed phylogenies under time-dependent birth-death models in backward time, conditioned on the number of sampled species and (optionally) on the time since the MRCA. On large trees, my software is 1,000-10,000 times faster than existing tools. Availability: The presented software is incorporated into the R package "castor", which is available on The Comprehensive R Archive Network (CRAN).
Article
Motivation: Biodiversity databases now comprise hundreds of thousands of sequences and trait records. For example, the Open Tree of Life includes over 1,491,000 metazoan and over 300,000 bacterial taxa. These data provide unique opportunities for analysis of phylogenetic trait distribution and reconstruction of ancestral biodiversity. However, existing tools for comparative phylogenetics scale poorly to such large trees, to the point of being almost unusable. Results: Here we present a new R package, named "castor", for comparative phylogenetics on large trees spanning millions of tips. On large trees castor is often 100-1000 times faster than existing tools. Availability: The castor source code, compiled binaries, documentation and usage examples are freely available at the Comprehensive R Archive Network (CRAN). Contact: louca.research@gmail.com.