Figure - available via license: Creative Commons Attribution 2.0 Generic
Content may be subject to copyright.
Coloring and taxonomic assignments. A taxonomically labeled phylogenetic tree that is concordant with the genus level taxonomic assignments gi but not the species taxonomic assignments si.

Coloring and taxonomic assignments. A taxonomically labeled phylogenetic tree that is concordant with the genus level taxonomic assignments gi but not the species taxonomic assignments si.

Source publication
Article
Full-text available
Although taxonomy is often used informally to evaluate the results of phylogenetic inference and the root of phylogenetic trees, algorithmic methods to do so are lacking. In this paper we formalize these procedures and develop algorithms to solve the relevant problems. In particular, we introduce a new algorithm that solves a "subcoloring" problem...

Similar publications

Article
Full-text available
The coalescent methods for species tree reconstruction are increasingly popular because they can accommodate coalescence and multilocus data sets. Herein, we present STRAW, a web server that offers workflows for reconstruction of phylogenies of species using three species tree methods—MP-EST, STAR and NJst. The input data are a collection of rooted...
Preprint
Full-text available
The minimal number of rooted subtree prune and regraft (rSPR) operations needed to transform one phylogenetic tree into another one induces a metric on phylogenetic trees - the rSPR-distance. The rSPR-distance between two phylogenetic trees $T$ and $T'$ can be characterised by a maximum agreement forest; a forest with a minimal number of components...
Article
Full-text available
Phylogenetic trees are a central tool in understanding evolution. They are typically inferred from sequence data, and capture evolutionary relationships through time. It is essential to be able to compare trees from different data sources (e.g. several genes from the same organisms) and different inference methods. We propose a new metric for robus...
Article
Full-text available
Multi-labeled trees are a generalization of phylogenetic trees that are used, for example, in the study of gene versus species evolution and as the basis for phylogenetic network construction. Unlike phylogenetic trees, in a leaf-multi-labeled tree it is possible to label more than one leaf by the same element of the underlying label set. In this p...
Article
Full-text available
Licensed of histone exist: H1/H5, H2A, H2B, H3, a H2A, H2B, H3 and H4 are known as the cor histones H1/H5 are known as the linker histo Our aim to study the similarity between dif based on the marker (Histone family) usi studies. Phylogenetic inference is the proce hypothesis about the evolutionary relatedne based on their observable characteristic...

Citations

... A further advantage over the above pipelines is the ability to use custom reference trees, thus providing a better context for interpreting the data under study. Note however that incongruencies between the taxonomy and the phylogeny can hinder the assignment, if they are not resolved (204). ...
Preprint
Full-text available
Phylogenetic placement refers to a family of tools and methods to analyze, visualize, and interpret the tsunami of metagenomic sequencing data generated by high-throughput sequencing. Compared to alternative (e. g., similarity-based) methods, it puts the metagenomic sequences into a phylogenetic context using a set of known reference sequences and taking evolutionary history into account. Thereby, one can increase the accuracy of metagenomic surveys and eliminate the requirement for having exact or close matches with existing sequence databases. Phylogenetic placement constitutes a valuable analysis tool per se, but also entails a plethora of downstream tools to interpret its results. A common use case is to analyze microbial communities obtained from metagenomic sequencing, for example via taxonomic assignment, diversity quantification, sample comparison, and identification of correlations with environmental variables. In this review, we provide an overview over the methods developed during the first ten years. In particular, the goals of this review are (i) to motivate the usage of phylogenetic placement and illustrate some of its use cases, (ii) to outline the full workflow, from raw sequences to publishable figures, including best practices, (iii) to introduce the most common tools and methods and their capabilities, (iv) to point out common placement pitfalls and misconceptions, (v) to showcase typical placement-based analyses, and how they can help to analyze, visualize, and interpret phylogenetic placement data.
... Equivalently, the convex recoloring problem is to maximize the number of nodes where the initial colors stay without change. For applications of the convex recoloring problem in bioinformatics, the reader may refer to [12][13][14]. Figures 1 and 2 illustrate an optimal solution to the convex recoloring problem on a phylogenetic tree and the columns in an optimal basis of the master problem. In Figure 1, given the coloring at the leaf nodes on the left, each color makes a connected subgraph in the optimal solution on the right which is obtained by changing only one color at node c or keeping the maximum number of initial colors at the other six leaf nodes. ...
... The convex recoloring problem can measure the gap between phylogeny and taxonomy [6,7]. Figure 1 illustrates three species by three colors on the phylogenetic tree where each leaf node represents a homologous protein sequence. ...
Conference Paper
Full-text available
The convex recoloring (CR) problem is to recolor the nodes of a colored graph at minimum number of color changes such that each color induces a connected subgraph. We adjust to the convex recoloring problem the column generation framework developed by Johnson et al. (Math Program 62:133–151, 1993). For the convex recoloring problem on a tree, the subproblem to generate columns can be solved in polynomial time by a dynamic programming algorithm. The column generation framework solves the convex recoloring problem on a tree with a large number of colors extremely fast.
... The convex recoloring problem can measure the gap between phylogeny and taxonomy [6,7]. Figure 1 illustrates three species by three colors on the phylogenetic tree where each leaf node represents a homologous protein sequence. ...
... The convex recoloring problem can measure the gap between phylogeny and taxonomy [4]. Figure 1 illustrates three species by three colors on the phylogenetic tree where each leaf node represents a homologous protein sequence. ...
... The subsequently developed 'taxonomy to tree' approach (McDonald et al., 2012) matches existing taxonomic levels onto newly generated trees, allowing classification of unidentified sequences and proposal of changes to the taxonomic nomenclature based on tree topology. Finally, Matsen & Gallagher (2012) have developed algorithms that find mismatches between taxonomy and phylogeny using a convex subcoloring approach. The new tool presented here, the R package MonoPhy, is a quick and user-friendly method for assessing monophyly of taxa in a given phylogeny. ...
Article
Full-text available
The monophyly of taxa is an important attribute of a phylogenetic tree. A lack of it may hint at shortcomings of either the tree or the current taxonomy, or can indicate cases of incomplete lineage sorting or horizontal gene transfer. Whichever is the reason, a lack of monophyly can misguide subsequent analyses. While monophyly is conceptually simple, it is manually tedious and time consuming to assess on modern phylogenies of hundreds to thousands of species. Results. The R package MonoPhy allows assessment and exploration of monophyly of taxa in a phylogeny. It can assess the monophyly of genera using the phylogeny only, and with an additional input file any other desired higher order taxa or unranked groups can be checked as well. Conclusion. Summary tables, easily subsettable results and several visualization options allow quick and convenient exploration of monophyly issues, thus making MonoPhy a valuable tool for any researcher working with phylogenies.
... Tax2tree uses a heuristic algorithm to reassign sequence taxonomic labels so that they are concordant with a given rooted phylogenetic tree in a way that allows polyphyletic taxonomic groups. Matsen and Gallagher (2012) developed algorithms to quantify discordance between phylogeny and taxonomy based on a coloring problem previously described in the computer science literature (Moran and Snir 2008). Although it is wonderful that several groups are actively working on taxonomic revision, it can be frustrating to have multiple different taxonomies with no easy way to translate between them or to the taxonomic names provided in the NCBI or EMBL sequence databases. ...
Article
Full-text available
The human microbiome is the ensemble of genes in the microbes that live inside and on the surface of humans. Because microbial sequencing information is now much easier to come by than phenotypic information, there has been an explosion of sequencing and genetic analysis of microbiome samples. Much of the analytical work for these sequences involves phylogenetics, at least indirectly, but methodology has developed in a somewhat different direction than for other applications of phylogenetics. In this paper I review the field and its methods from the perspective of a phylogeneticist, as well as describing current challenges for phylogenetics coming from this type of work.
... This comparison was undertaken using two approaches. First, to determine whether taxonomic classification was consistent with cbhI phylogenetic placement, we used the algorithm described by Matsen & Gallagher (2012). This approach builds upon the work of Moran & Snir (2007) and identifies the minimal number of taxa that must be pruned to obtain congruence within a phylogeny at a given level of taxonomic classification. ...
... In general, the Basidiomycota and Ascomycota clustered independently in the phylogeny constructed using cbhI genes from identified fungal isolates (Fig. 4). The first analysis (Matsen and Gallagher 2012) found concordance between the taxonomic classification of the isolates and cbhI phylogeny following the removal of 26 taxa. Of these, 25 were Ascomycota from multiple classes, including Dothidiomycetes, Eurotiomycetes, Pezizomycetes, Sordariomycetes and the single sequence from mitosporic Ascomycota (AEO09145). ...
Article
Human activities have resulted in increased nitrogen inputs into terrestrial ecosystems, but the impact of nitrogen on ecosystem function, such as nutrient cycling, will depend at least in part on the response of soil fungal communities. We examined the response of soil fungi to experimental nitrogen addition in a loblolly pine forest (North Carolina, USA) using a taxonomic marker (large subunit rDNA, LSU) and a functional marker involved in a critical step of cellulose degradation (cellobiohydrolase, cbhI) at five time points that spanned fourteen months. Sampling date had no impact on fungal community richness or composition for either gene. Based on the LSU, nitrogen addition led to increased fungal community richness, reduced relative abundance of fungi in the phylum Basidiomycota, and altered community composition; however, similar shifts were not observed with cbhI. Fungal community dissimilarity of the LSU and cbhI genes was significantly correlated in the ambient plots, but not in nitrogen-amended plots, suggesting either functional redundancy of fungi with the cbhI gene, or shifts in other functional groups in response to nitrogen addition. To determine if sequence similarity of cbhI could be predicted based on taxonomic relatedness of fungi, we conducted a phylogenetic analysis of publically-available cbhI sequences from known isolates, and found that for a subset of isolates, similar cbhI genes were found within distantly related fungal taxa. Together, these findings suggest that taxonomic shifts in the total fungal community do not necessarily result in changes in the functional diversity of fungi.This article is protected by copyright. All rights reserved.
... Frequently used phylogenetic placement programs within such frameworks are pplacer (Matsen et al., 2010) or EPA/ RAxML (Berger et al., 2011), which both operate in a probabilistic framework to place a query gene sequence in a pre-computed reference phylogeny of a particular gene family. If this gene tree is an approximate representation of the respective species tree-or reference taxonomy-this can be used to assign a taxon identifier (ID) to the query sequence (Stark et al., 2010;Matsen and Gallagher, 2012). Taxon abundances are then derived from the individual read counts or gene frequencies within each taxonomic group. ...
Article
Full-text available
Motivation: Metagenomics characterizes microbial communities by random shotgun sequencing of DNA isolated directly from an environment of interest. An essential step in computational metagenome analysis is taxonomic sequence assignment, which allows identifying the sequenced community members and reconstructing taxonomic bins with sequence data for the individual taxa. For the massive datasets generated by next-generation sequencing technologies, this cannot be performed with de-novo phylogenetic inference methods. We describe an algorithm and the accompanying software, taxator-tk, which performs taxonomic sequence assignment by fast approximate determination of evolutionary neighbors from sequence similarities. Results: Taxator-tk was precise in its taxonomic assignment across all ranks and taxa for a range of evolutionary distances and for short as well as for long sequences. In addition to the taxonomic binning of metagenomes, it is well suited for profiling microbial communities from metagenome samples because it identifies bacterial, archaeal and eukaryotic community members without being affected by varying primer binding strengths, as in marker gene amplification, or copy number variations of marker genes across different taxa. Taxator-tk has an efficient, parallelized implementation that allows the assignment of 6 Gb of sequence data per day on a standard multiprocessor system with 10 CPU cores and microbial RefSeq as the genomic reference data.
... Reads from the vaginal and oral studies were placed on a tree created from a curated set of taxonomically annotated reference sequences. As phylogenetic entropy and q D(T) operate on a rooted phylogeny, reference trees were assigned a root taxonomically (Matsen & Gallagher, 2012) meaning that a root was found that best separated high-level taxonomic groupings. pplacer was run in posterior probability mode (using the -p and --informative-prior flags), which defines an informative prior for pendant branch lengths with a mean derived from the average distances from the edge in question to the leaves of the tree. ...
Article
Full-text available
In microbial ecology studies, the most commonly used ways of investigating alpha (within-sample) diversity are either to apply non-phylogenetic measures such as Simpson's index to Operational Taxonomic Unit (OTU) groupings, or to use classical phylogenetic diversity (PD), which is not abundance-weighted. Although alpha diversity measures that use abundance information in a phylogenetic framework do exist, they are not widely used within the microbial ecology community. The performance of abundance-weighted phylogenetic diversity measures compared to classical discrete measures has not been explored, and the behavior of these measures under rarefaction (sub-sampling) is not yet clear. In this paper we compare the ability of various alpha diversity measures to distinguish between different community states in the human microbiome for three different datasets. We also present and compare a novel one-parameter family of alpha diversity measures, BWPDθ, that interpolates between classical phylogenetic diversity (PD) and an abundance-weighted extension of PD. Additionally, we examine the sensitivity of these phylogenetic diversity measures to sampling, via computational experiments and by deriving a closed form solution for the expectation of phylogenetic quadratic entropy under re-sampling. On the three datasets, a phylogenetic measure always performed best, and two abundance-weighted phylogenetic diversity measures were the only measures ranking in the top four across all datasets. OTU-based measures, on the other hand, are less effective in distinguishing community types. In addition, abundance-weighted phylogenetic diversity measures are less sensitive to differing sampling intensity than their unweighted counterparts. Based on these results we encourage the use of abundance-weighted phylogenetic diversity measures, especially for cases such as microbial ecology where species delimitation is difficult.