Fig 6 - uploaded by Vincent Berry
Content may be subject to copyright.
Bipartitions in an unrooted level-1 network whose leaves are in the order σ = abcdef g around the outer face: {b, c}|{a, d, e, f, g} is represented by edge x and {f, g}|{a, b, c, d, e} is represented by the cut {x 1 , x 2 }.

Bipartitions in an unrooted level-1 network whose leaves are in the order σ = abcdef g around the outer face: {b, c}|{a, d, e, f, g} is represented by edge x and {f, g}|{a, b, c, d, e} is represented by the cut {x 1 , x 2 }.

Source publication
Article
Full-text available
Phylogenetic networks were introduced to describe evolution in the presence of exchanges of genetic material between coexisting species or individuals. Split networks in particular were introduced as a special kind of abstract network to visualize conflicts between phylogenetic trees which may correspond to such exchanges. More recently, methods we...

Contexts in source publication

Context 1
... A| ¯ A is represented by a single cut-edge e in N as in Fig. 6(i), then we can draw a closed curve intersecting only edge e of N . Otherwise, we draw a closed curve intersecting only edges e 1 and e 2 of N , as in Fig. 6(ii). In both cases, this curve splits the outerface of N in two parts, one containing A and the other ¯ A, and the set of leaves contained in one of these parts appears as an ...
Context 2
... A| ¯ A is represented by a single cut-edge e in N as in Fig. 6(i), then we can draw a closed curve intersecting only edge e of N . Otherwise, we draw a closed curve intersecting only edges e 1 and e 2 of N , as in Fig. 6(ii). In both cases, this curve splits the outerface of N in two parts, one containing A and the other ¯ A, and the set of leaves contained in one of these parts appears as an interval of σ. Thus, these leaves also appear consecutively around the cycle of N ′ , which im- plies that A| ¯ A also belongs to S(N ′ ...

Similar publications

Conference Paper
Full-text available
Phylogenetic supertrees synthesize a set of phylogenetic trees carrying overlapping taxa set, preferably with the consensus topologies of individual taxa subsets. Supertree construction is an NP-hard problem, and the methods based on decomposition and synthesis of fixed size subtree topologies (such as triplets or quartets) are the most popular. Ti...

Citations

... This type of graphical model has been studied in the computational phylogenetics literature, where it is referred to as a "tree-based phylogenetic network" [9]. The estimation of phylogenetic networks is very challenging, both for statistical reasons (i.e., potential non-identifiability) and computational reasons (see discussion in [5]); although tree-based phylogenetic networks are a restricted subclass of phylogenetic networks, there are still substantial challenges in estimating these phylogenetic networks, as discussed in [10,18]. ...
... A natural candidate is the algorithm from Section 7.1 of [10], which correctly constructs the unrooted topology of level-1 networks N given Q(N ), and does so in O(n 4 ) time, where n is the number of leaves in the network N . However, any algorithm (e.g., [18]) that correctly computes the unrooted phylogenetic network topology for any level-1 network N given Q(N ) can be used. ...
... • Use the algorithm from [10] applied to Q to produce an estimate of the unrooted topology of N . The proof is provided in the appendix. ...
Preprint
Full-text available
In 2006, Warnow, Evans, Ringe, and Nakhleh proposed a stochastic model (hereafter, the WERN 2006 model) of multi-state linguistic character evolution that allowed for homoplasy and borrowing. They proved that if there is no borrowing between languages and homoplastic states are known in advance, then the phylogenetic tree of a set of languages is statistically identifiable under this model, and they presented statistically consistent methods for estimating these phylogenetic trees. However, they left open the question of whether a phylogenetic network -- which would explicitly model borrowing between languages that are in contact -- can be estimated under the model of character evolution. Here, we establish that under some mild additional constraints on the WERN 2006 model, the phylogenetic network topology is statistically identifiable, and we present algorithms to infer the phylogenetic network. We discuss the ramifications for linguistic phylogenetic network estimation in practice, and suggest directions for future research.
... One may also consider rerooting a semidirected network N − : either at a node or on an edge (Gambette et al. 2012). Specifically, rerooting at node s refers to designating a node s in N − as root and directing all undirected (tree) edges away from s, if this leads to a valid rooted network. ...
... Our definition of level follows (Gambette et al. 2012) and is nonstandard in using 2-edge-connected components rather than biconnected components. A graph is biconnected if the removal of one vertex does not disconnect the graph. ...
... However, the two definitions agree on binary networks. For binary networks, the level of a blob B is the same as the number of hybrid nodes in B (Gambette et al. 2012). If hybrid nodes may have more than two parents, the level of a blob could be greater than its number of hybrid nodes. ...
Article
Full-text available
Phylogenetic networks extend phylogenetic trees to model non-vertical inheritance, by which a lineage inherits material from multiple parents. The computational complexity of estimating phylogenetic networks from genome-wide data with likelihood-based methods limits the size of networks that can be handled. Methods based on pairwise distances could offer faster alternatives. We study here the information that average pairwise distances contain on the underlying phylogenetic network, by characterizing local and global features that can or cannot be identified. For general networks, we clarify that the root and edge lengths adjacent to reticulations are not identifiable, and then focus on the class of zipped-up semidirected networks. We provide a criterion to swap subgraphs locally, such as 3-cycles, resulting in indistinguishable networks. We propose the “distance split tree”, which can be constructed from pairwise distances, and prove that it is a refinement of the network’s tree of blobs, capturing the tree-like features of the network. For level-1 networks, this distance split tree is equal to the tree of blobs refined to separate polytomies from blobs, and we prove that the mixed representation of the network is identifiable. The information loss is localized around 4-cycles, for which the placement of the reticulation is unidentifiable. The mixed representation combines split edges for 4-cycles, regular tree and hybrid edges from the semidirected network, and edge parameters that encode all information identifiable from average pairwise distances.
... The set L(N) = X is the leaf set of N and V 0 the set of inner vertices. By definition, every non-trivial biconnected component in a galled-tree N forms an "undirected" cycle in N [5,42] and is also known as block [16] or gall [17]. Hence, a cycle C of a galled-tree N is composed of two directed paths P 1 and P 2 in N with the same start-vertex ρ C and end-vertex η C , and whose vertices distinct from ρ C and η C are pairwise distinct. ...
Preprint
Full-text available
The modular decomposition of a graph $G$ is a natural construction to capture key features of $G$ in terms of a labeled tree $(T,t)$ whose vertices are labeled as "series" ($1$), "parallel" ($0$) or "prime". However, full information of $G$ is provided by its modular decomposition tree $(T,t)$ only, if $G$ is a cograph, i.e., $G$ does not contain prime modules. In this case, $(T,t)$ explains $G$, i.e., $\{x,y\}\in E(G)$ if and only if the lowest common ancestor $\mathrm{lca}_T(x,y)$ of $x$ and $y$ has label "$1$". Pseudo-cographs, or, more general, GaTEx graphs $G$ are graphs that can be explained by labeled galled-trees, i.e., labeled networks $(N,t)$ that are obtained from the modular decomposition tree $(T,t)$ of $G$ by replacing the prime vertices in $T$ by simple labeled cycles. GaTEx graphs can be recognized and labeled galled-trees that explain these graphs can be constructed in linear time. In this contribution, we provide a novel characterization of GaTEx graphs in terms of a set $\mathfrak{F}_{\mathrm{GT}}$ of 25 forbidden induced subgraphs. This characterization, in turn, allows us to show that GaTEx graphs are closely related to many other well-known graph classes such as $P_4$-sparse and $P_4$-reducible graphs, weakly-chordal graphs, perfect graphs with perfect order, comparability and permutation graphs, murky graphs as well as interval graphs, Meyniel graphs or very strongly-perfect and brittle graphs.
... The quartet distance (QD) is defined as the number of quartets of tree leaves that induce a subtree topology that occur in only one of the two compared trees. This distance has been extensively used in computational biology not only for inferring phylogenetic trees and networks (Berry et al. 1999, Gambette et al. 2012, but also for building supertrees (Snir and Rao 2008). ...
Article
Full-text available
Motivation: Each gene has its own evolutionary history which can substantially differ from evolutionary histories of other genes. For example, some individual genes or operons can be affected by specific horizontal gene transfer or recombination events. Thus, the evolutionary history of each gene should be represented by its own phylogenetic tree which may display different evolutionary patterns from the species tree that accounts for the main patterns of vertical descent. However, the output of traditional consensus tree or supertree inference methods is a unique consensus tree or supertree. Results: We present a new efficient method for inferring multiple alternative consensus trees and supertrees to best represent the most important evolutionary patterns of a given set of gene phylogenies. We show how an adapted version of the popular k-means clustering algorithm, based on some remarkable properties of the Robinson and Foulds distance, can be used to partition a given set of trees into one (for homogeneous data) or multiple (for heterogeneous data) cluster(s) of trees. Moreover, we adapt the popular Caliński-Harabasz, Silhouette, Ball and Hall, and Gap cluster validity indices to tree clustering with k-means. Special attention is given to the relevant but very challenging problem of inferring alternative supertrees. The use of the Euclidean property of the objective function of the method makes it faster than the existing tree clustering techniques, and thus better suited for analyzing large evolutionary datasets. Availability and implementation: Our KMeansSuperTreeClustering program along with its C ++ source code is available at: https://github.com/TahiriNadia/KMeansSuperTreeClustering.
... In this contribution, we are interested in the characterization of graphs that can be explained by rooted level-1 networks N with binary labeling t. We present the formal definition for this type of network in the next section but, essentially, level-1 networks are directed acyclic graphs with a single root and in which any two "undirected cycles" [52] (also known as blocks [29] or galls [30]) are edge disjoint. We provide two generalizations of cographs: the class of pseudo-cographs and the class prime of prime polar-cats. ...
... If such a biconnected component is not a single vertex or an edge, then it is called non-trivial. Non-trivial biconnected components are also known as cycle [52] (or block [29] or gall [30]). A network N is a level-k network, if each biconnected component C of N contains at most k vertices of indegree 2 in C, i.e., hybrid-vertices of N whose parents belong to C. [12]. ...
Preprint
Full-text available
We characterize graphs $G$ that can be explained by rooted labeled level-1 networks $(N,t)$, i.e., $N$ is equipped with a binary vertex-labeling $t$ such that $\{x,y\}\in E(G)$ if and only if the lowest common ancestor $\mathrm{lca}_N(x,y)$ of $x$ and $y$ has label ``$1$''. This generalizes the concept of graphs that can be explained by labeled trees, that is, cographs. We provide three novel graph classes: polar-cats are a proper subclass of pseudo-cographs which forms a proper subclass of \PrimeCat. In particular, every cograph is a pseudo-cograph and \PrimeCat is precisely the class of graphs the can be explained by a labeled level-1 network. The class \PrimeCat is defined in terms of the modular decomposition of graphs and the property that all prime modules ``induce'' polar-cats. We provide a plethora of structural results and characterizations for graphs of these new classes and give linear-time algorithms to recognize them and to construct level-1 networks to explain them.
... We provide a plethora of structural results and characterizations for graphs of these new classes and give linear-time algorithms to recognize them and to construct level-1 networks to explain them. the next section but, essentially, level-1 networks are directed acyclic graphs with a single root and in which any two "cycles" [47] (also known as blocks [27] or galls [28]) are edge disjoint. We provide two generalizations of cographs: the class of pseudo-cographs and the class prime . ...
... A biconnected component of a graph is a maximal biconnected subgraph. If such a biconnected component is not a single vertex or an edge, then it is called a cycle [47] (or block [27] or gall [28]). A network N is a level-k network, if each bi-connected component of N contains at most k hybrid-vertices [12]. ...
Article
Full-text available
We characterize graphs G that can be explained by rooted labeled level-1 networks (N,t), i.e., N is equipped with a binary vertex-labeling t such that {x, y} ∈ E(G) if and only if the lowest common ancestor lca N (x, y) of x and y has label "1". This generalizes the concept of graphs that can be explained by labeled trees, that is, cographs. We provide three novel graph classes: polar-cats are a proper subclass of pseudo-cographs which forms a proper subclass of prime-polar-cats. In particular, every cograph is a pseudo-cograph and prime-polar-cat is precisely the class of graphs the can be explained by a labeled level-1 network. The class prime-polar-cats is defined in terms of the modular decomposition of graphs and the property that all prime modules "induce" polar-cats. We provide a plethora of structural results and characterizations for graphs of these new classes and give linear-time algorithms to recognize them and to construct level-1 networks to explain them.
... The quartet distance (QD) is defined as the number of quartets of tree leaves that induce a subtree topology that occur in only one of the two compared trees. This distance has been ex-23 tensively used in computational biology not only for inferring phylogenetic trees and networks (Berry et al. 1999, Gambette et al. 2012), but also for building supertrees (Snir and Rao 2008). According to its definition, the quartet distance is a symmetric difference distance. ...
Preprint
Full-text available
Each gene has its own evolutionary history which can substantially differ from the evolutionary histories of other genes. For example, some individual genes or operons can be affected by specific horizontal gene transfer and recombination events. Thus, the evolutionary history of each gene should be represented by its own phylogenetic tree which may display different evolutionary patterns from the species tree that accounts for the main patterns of vertical descent. The output of traditional consensus tree or supertree inference methods is a unique consensus tree or supertree. We describe a new efficient method for inferring multiple alternative consensus trees and supertrees to best represent the most important evolutionary patterns of a given set of gene phylogenies. We show how an adapted version of the popular k-means clustering algorithm, based on some interesting properties of the Robinson and Foulds distance, can be used to partition a given set of trees into one (for homogeneous data) or multiple (for heterogeneous data) cluster(s) of trees. Moreover, we adapt the popular Caliński-Harabasz, Silhouette , Ball and Hall, and Gap cluster validity indices to tree clustering with k-means. A special attention is given to the relevant but very challenging problem of inferring alternative supertrees. The use of the Euclidean property of the objective function of the method makes it faster than the existing tree clustering techniques, and thus perfectly suitable for analyzing large evolutionary datasets. We apply the new method to discover alternative supertrees characterizing the main patterns of evolution of SARS-CoV-2 and genetically related betacoronaviruses.
... We conclude this section by discussing undirected phylogenetic networks. A phylogenetic network N 0 ¼ ðn 0 ; S; hÞ is undirected if n 0 is undirected and connected, with no parallel edges or self-loops (Gambette et al., 2012). Again, we assume that N 0 is binary, so leaves are vertices with degree 1 and all other vertices have degree 3. It is easy to transform a directed phylogenetic network N into an undirected network N 0 simply by ignoring edge directions and suppressing the vertex previously designated as the root (Gambette et al., 2012). ...
... A phylogenetic network N 0 ¼ ðn 0 ; S; hÞ is undirected if n 0 is undirected and connected, with no parallel edges or self-loops (Gambette et al., 2012). Again, we assume that N 0 is binary, so leaves are vertices with degree 1 and all other vertices have degree 3. It is easy to transform a directed phylogenetic network N into an undirected network N 0 simply by ignoring edge directions and suppressing the vertex previously designated as the root (Gambette et al., 2012). ...
... The reverse operation of transforming an undirected network N 0 into a directed network N, referred to as orienting the network, is not so straightforward, because the location of the root does not uniquely determine the direction of all edges in an unrooted network (Gambette et al., 2012;Huber et al., 2019). Notably, Huber et al. (2019) recently showed that specifying the the location of the root (i.e. the edge that the root subdivides) and the admixture nodes in an undirected phylogenetic network N 0 results in a unique orientation of N, provided that an orientation exists (Theorem 2 in Huber et al., 2019). ...
Article
Full-text available
Motivation Admixture, the interbreeding between previously distinct populations, is a pervasive force in evolution. The evolutionary history of populations in the presence of admixture can be modeled by augmenting phylogenetic trees with additional nodes that represent admixture events. While enabling a more faithful representation of evolutionary history, admixture graphs present formidable inferential challenges, and there is an increasing need for methods that are accurate, fully automated and computationally efficient. One key challenge arises from the size of the space of admixture graphs. Given that exhaustively evaluating all admixture graphs can be prohibitively expensive, heuristics have been developed to enable efficient search over this space. One heuristic, implemented in the popular method TreeMix, consists of adding edges to a starting tree while optimizing a suitable objective function. Results Here, we present a demographic model (with one admixed population incident to a leaf) where TreeMix and any other starting-tree-based maximum likelihood heuristic using its likelihood function is guaranteed to get stuck in a local optimum and return an incorrect network topology. To address this issue, we propose a new search strategy that we term maximum likelihood network orientation (MLNO). We augment TreeMix with an exhaustive search for an MLNO, referring to this approach as OrientAGraph. In evaluations including previously published admixture graphs, OrientAGraph outperformed TreeMix on 4/8 models (there are no differences in the other cases). Overall, OrientAGraph found graphs with higher likelihood scores and topological accuracy while remaining computationally efficient. Lastly, our study reveals several directions for improving maximum likelihood admixture graph estimation. Availability and implementation OrientAGraph is available on Github (https://github.com/sriramlab/OrientAGraph) under the GNU General Public License v3.0. Supplementary information Supplementary data are available at Bioinformatics online.
... The quartet distance (QD) is defined as the number of quartets of tree leaves that induce a subtree topology that occur in only one of the two compared trees. This distance has been extensively used in computational biology not only for inferring phylogenetic trees and networks (Berry et al. 1999, Gambette et al. 2012, but also for building supertrees (Snir and Rao 2008). ...
Preprint
Full-text available
Each gene has its own evolutionary history which can substantially differ from the evolutionary histories of other genes. For example, some individual genes or operons can be affected by specific horizontal gene transfer and recombination events. Thus, the evolutionary history of each gene should be represented by its own phylogenetic tree which may display different evolutionary patterns from the species tree that accounts for the main patterns of vertical descent. The output of traditional consensus tree or supertree inference methods is a unique consensus tree or supertree. Here, we describe a new efficient method for inferring multiple alternative consensus trees and supertrees to best represent the most important evolutionary patterns of a given set of phylogenetic trees. We show how an adapted version of the popular k-means clustering algorithm, based on some interesting properties of the Robinson and Foulds topological distance, can be used to partition a given set of trees into one (when the data are homogeneous) or multiple (when the data are heterogeneous) cluster(s) of trees. Moreover, we adapt the popular Caliński-Harabasz, Silhouette, Ball and Hall, and Gap cluster validity indices to tree clustering with k-means. A special attention is given to the relevant but very challenging problem of inferring alternative supertrees. The use of the Euclidean property of the objective function of the method makes it faster than the existing tree clustering techniques, and thus perfectly suitable for the analysis of large evolutionary datasets. We apply the new method to discover alternative supertrees characterizing the main patterns of evolution of SARS-CoV-2 and genetically related betacoronaviruses.
... We can obtain an (n + 1)-leaf unrooted level-k network N ′ from an n-leaf rooted level-k network N by adding a vertex adjacent to the root of N and then labeling it with an extra leaf label. [23,Theorem 1] guarantees that N ′ is also a level-k network. Conversely, we can obtain a rooted level-k network from an unrooted level-k network [9,32] although the resulting rooted network needs not be unique. ...
Preprint
Full-text available
In phylogenetics, it is important for the phylogenetic network model parameters to be identifiable so that the evolutionary histories of a group of species can be consistently inferred. However, as the complexity of the phylogenetic network models grows, the identifiability of network models becomes increasingly difficult to analyze. As an attempt to analyze the identifiability of network models, we check whether two networks are distinguishable. In this paper, we specifically study the distinguishability of phylogenetic network models associated with level-2 networks. Using an algebraic approach, namely using discrete Fourier transformation, we present some results on the distinguishability of some level-2 networks, which generalize earlier work on the distinguishability of level-1 networks. In particular, we study simple and semisimple level-2 networks. Simple and semisimple level-2 networks can be thought as generalizations of level-1 sunlet and cycle networks, respectively. Moreover, we also compare the varieties associated with semisimple level-2 and cycle networks.