Fig 2 - uploaded by Falk Hüffner
Content may be subject to copyright.
An example of the transformation to Balanced Subgraph. An arbitrary tanglegram of the input trees (upper left) is transformed into a bipartite graph (lower left). Continuous lines denote =-edges, dashed lines =-edges. This instance can be solved by breaking one edge, e.g. {u 2 , v 2 }, which leads to a valid 2-coloring of the vertices (lower right). The vertices of one color, here u 1 and u 2 , are switched to obtain an optimal tanglegram (upper right).  

An example of the transformation to Balanced Subgraph. An arbitrary tanglegram of the input trees (upper left) is transformed into a bipartite graph (lower left). Continuous lines denote =-edges, dashed lines =-edges. This instance can be solved by breaking one edge, e.g. {u 2 , v 2 }, which leads to a valid 2-coloring of the vertices (lower right). The vertices of one color, here u 1 and u 2 , are switched to obtain an optimal tanglegram (upper right).  

Source publication
Conference Paper
Full-text available
Given two binary phylogenetic trees covering the same n species, it is useful to compare them by drawing them with leaves arranged side-by-side. To facilitate comparison, we would like to arrange the trees to minimize the number of crossings k induced by connecting pairs of identical species. This is the NP-hard Tanglegram Layout problem. By provid...

Citations

... The (tanglegram) crossing number crt(T ) of a tanglegram T is defined as the minimum number of crossings among its layouts. The Tanglegram Layout Problem problem is NP-hard [2,4], but is Fixed Parameter Tractable [2,1]. It does not allow constant factor approximation under the Unique Game Conjecture [2]. ...
Article
A tanglegram $\mathcal{T}$ consists of two rooted binary trees with the same number of leaves, and a perfect matching between the two leaf sets. In a layout, the tanglegrams is drawn with the leaves on two parallel lines, the trees on either side of the strip created by these lines are drawn as plane trees, and the perfect matching is drawn in straight line segments inside the strip. The tanglegram crossing number ${\rm cr}({\mathcal{T}})$ of $\mathcal{T}$ is the smallest number of crossings of pairs of matching edges, over all possible layouts of $\mathcal{T}$. The size of the tanglegram is the number of matching edges, say $n$. An earlier paper showed that the maximum of the tanglegram crossing number of size $n$ tanglegrams is $<\frac{1}{2}\binom{n}{2}$; but is at least $\frac{1}{2}\binom{n}{2}-\frac{n^{3/2}-n}{2}$ for infinitely many $n$. Now we make better bounds: the maximum crossing number of a size $n$ tanglegram is at most $\frac{1}{2}\binom{n}{2}-\frac{n}{4}$, but for infinitely many $n$, at least $\frac{1}{2}\binom{n}{2}-\frac{n\log_2 n}{4}$. The problem shows analogy with the Unbalancing Lights Problem of Gale and Berlekamp.
... The TLP has been studied mainly for binary trees, as phylogenetic trees are mostly binary: If both leaf sets can be ordered then the problem is known to be NP-complete [15], even if the trees are complete [7]. However, there exist approximation results [7], fixed-parameter algorithms [5,7,15], integer linear programming formulations [4], and heuristics [23]. If the leaf order of one tree is fixed, then the problem is solvable in polynomial time [15]; if the trees are not binary, however, then even this problem is NP-complete [9]. ...
... Furthermore, if π starts with 1 then the first block is removed, and if π ends with n then the block at the end is removed (see [10]). For instance, if π = (3,1,2,8,9,4,5,6,7,10), then gl(π) = (2, 1, 4, 3). Two important lemmata that will be used throughout the paper are given below. ...
... We consider all cases of the left and right side of Equation (5) and show that they result in the same permutation. Let us first consider the left side. ...
Preprint
Full-text available
Tanglegrams are drawings of two rooted binary phylogenetic trees and a matching between their leaf sets. The trees are drawn crossing-free on opposite sides with their leaf sets facing each other on two vertical lines. Instead of minimizing the number of pairwise edge crossings, we consider the problem of minimizing the number of block crossings, that is, two bundles of lines crossing each other locally. With one tree fixed, the leaves of the second tree can be permuted according to its tree structure. We give a complete picture of the algorithmic complexity of minimizing block crossings in one-sided tanglegrams by showing NP-completeness, constant-factor approximations, and a fixed-parameter algorithm. We also state first results for non-binary trees.
... First, the RAxML modul of Epos was used setting the WAG matrix as substitution model with 100 replicates and for the latter the MrBayes model was applied with 1,000,000 generations. To investigate the co-evolution of Nit and Fhit, respective trees in a tanglegram-approach were compared [35]. Briefly, since the same topology can be adopted by various trees, one of the trees is rearranged to maximal resemble the second tree without changing evolutionary information of the tree. ...
Article
Full-text available
In previous studies, we have identified the tumor suppressor proteins Fhit (fragile histidine triad) and Nit1 (Nitrilase1) as interaction partners of β-catenin both acting as repressors of the canonical Wnt pathway. Interestingly, in D. melanogaster and C. elegans these proteins are expressed as NitFhit fusion proteins. According to the Rosetta Stone hypothesis, if proteins are expressed as fusion proteins in one organism and as single proteins in others, the latter should interact physically and show common signaling function. Here, we tested this hypothesis and provide the first biochemical evidence for a direct association between Nit1 and Fhit. In addition, size exclusion chromatography of purified recombinant human Nit1 showed a tetrameric structure as also previously observed for the NitFhit Rosetta Stone fusion protein Nft-1 in C. elegans. Finally, in line with the Rosetta Stone hypothesis we identified Hsp60 and Ubc9 as other common interaction partners of Nit1 and Fhit. The interaction of Nit1 and Fhit may affect their enzymatic activities as well as interaction with other binding partners.
... A decade-old literature is devoted to tanglegram drawings (see e.g. [3][4][5][6][7][8][9]). Finding a tanglegram drawing that minimizes the number of crossings among the edges in F is known to be NP-complete, even if the trees are binary trees and the graph G is a matching [7]. 1. ...
... Moreover, we would like to adapt heuristics for the reduction of the crossings of tanglegram drawings, such as those in [4,8,9], to our problem. ...
Article
We introduce a hybrid metaphor for the visualization of the reconciliations of co-phylogenetic trees, that are mappings among the nodes of two trees with constraints on the leaves. The typical application is the visualization of the co-evolution of hosts and parasites in biology. Our strategy combines a space-filling and a node-link approach. Differently from traditional methods, it guarantees an unambiguous and downward representation whenever the reconciliation is time-consistent (i.e., biologically-feasible). We address the problem of the minimization of the number of crossings in the representation, by giving a characterization of planar instances and by establishing the complexity of the problem. Finally, we propose heuristics for computing representations with few crossings.
... ( Figure 2). There has been extensive work on the problem of finding the layout of a given tanglegram in the plane that minimizes crossings, with the goal of most clearly visualizing co-evolutionary relationships between species [16][17][18][19][20][21]. ...
Article
Full-text available
Many discrete mathematics problems in phylogenetics are defined in terms of the relative labeling of pairs of leaf-labeled trees. These relative labelings are naturally formalized as tanglegrams, which have previously been an object of study in coevolutionary analysis. Although there has been considerable work on planar drawings of tanglegrams, they have not been fully explored as combinatorial objects until recently. In this paper, we describe how many discrete mathematical questions on trees "factor" through a problem on tanglegrams, and how understanding that factoring can simplify analysis. Depending on the problem, it may be useful to consider a unordered version of tanglegrams, and/or their unrooted counterparts. For all of these definitions, we show how the isomorphism types of tanglegrams can be understood in terms of double cosets of the symmetric group, and we investigate their automorphisms. Understanding tanglegrams better will isolate the distinct problems on leaf-labeled pairs of trees and reveal natural symmetries of spaces associated with such problems.
... They also presented a fixed-parameter algorithm for binary tanglegrams. Recently, an improved fixed-parameter algorithm was presented by Böcker et al. [16] which can solve large binary instances quickly in practice, provided that the number of crossings is not too large. Finally, while in the recent paper by Venkatachalam et al. [121] the focus is on binary instances, a fixed-parameter algorithm for general tanglegram instances is presented. ...
Article
We consider combinatorial optimization problems with nonlinear objective functions. Solution approaches for this class of problems proposed so far are either highly problem-specific or they apply generic algorithms for constrained nonlinear optimization, which often does not yield satisfactory results in practice. Our aim is to develop, implement and experimentally evaluate exact algorithms that address the nonlinearity of the objective function and at the same time exploit the underlying combinatorial structure of the problem. To this end we follow two approaches. The first combines good polyhedral descriptions of the objective function and the feasible set in a branch and cut-algorithm. The second approach is based on Lagrangean decomposition. By decomposing the original problem into an unconstrained nonlinear problem and a linear combinatorial problem, we are able to compute strong dual bounds for the optimal value. The computation of lower bounds is then embedded into a branch and bound-algorithm. For many applications there already exist efficient algorithms for the combinatorial subproblem, thus an important aspect of this thesis is the study of the corresponding unconstrained nonlinear subproblems. Both approaches have the advantage that they can easily be adapted to a wide range of nonlinear combinatorial problems.We devise both polyhedral and decomposition- based algorithms for submodular applications from wireless network design and portfolio optimization and evaluate their performance experimentally. Exploiting the equivalence between unconstrained binary quadratic optimization and the maximum cut problem gives rise to a branch and cut-algorithm for quadratic combinatorial problems which we use to compute optimal layouts of tanglegrams, an application from computational biology. Additionally we study the effect of quadratic reformulation of linear constraints, both theoretically and experimentally. The last class of nonlinear combinatorial problems we consider are two-scenario problems. Here we propose a new technique to compute lower bounds in the unconstrained subproblem of the decomposition. Our computational study of the two-scenario minimum spanning tree problem shows that the new Lagrangean decomposition-based algorithm is able to solve significantly larger instances than the standard linearization approach.
... We only mention Refs. [20,4142434445 to give the reader some impression. ...
Article
We analyze a common feature of p-Kemeny AGGregation (p-KAGG) and p-One-Sided Crossing Minimization (p-OSCM) to provide new insights and findings of interest to both the graph drawing community and the social choice community. We obtain parameterized subexponential-time algorithms for p-KAGG — a problem in social choice theory — and for p-OSCM — a problem in graph drawing. These algorithms run intime O∗(2O(√k log k)), where k is the parameter, and significantly improve the previous best algorithms with running times O∗(1.403k) and O∗(1.4656k), respectively. We also study natural “above-guarantee” versions of these problems and show them to be fixed parameter tractable. In fact, we show that the above-guarantee versions of these problems are equivalent to a weighted variant of p-directed feedback arc set. Our results for the above-guarantee version of p-KAGG reveal an interesting contrast. We show that when the number of “votes” in the input to p-KAGG is odd the above guarantee version can still be solved in time O∗(2O(√l log k)), while if it is eventhen the problem cannot have a subexponential time algorithm unless the exponential time hypothesis fails (equivalently, unless FPT = M[1]).
... Again, the LG matrix was not available, and the WAG matrix was hence used as substitution model. To compare the maximum likelihood and the bayesian trees, a tanglegram layout was chosen (Böcker et al. 2009). Additionally, pairwise protein distances were calculated with protdist 3.69, which is part of the PHYLIP package (Felsenstein 2005; see Table S1, Sheet 2). ...
Article
The nonproteinogenic amino acid 4-hydroxyphenylglycine (HPG) arises from the diversion of the tyrosine degradation pathway into secondary metabolism, and its biosynthesis requires a set of three enzymes. The gene cassette for HPG biosynthesis is widely spread in actinomycete bacteria, which incorporate the amino acid as a building block into various peptide antibiotics, but it has never been reported from another taxonomic group of eubacteria. A genome mining study has now revealed a putative HPG pathway in the predatory bacterium Herpetosiphon aurantiacus, which is phylogenetically distinct from Actinomycetes. Anomalies in the active center of one annotated key enzyme raised questions about the true product of this pathway, prompting an in vitro reconstitution attempt. This study confirmed the capability of H. aurantiacus for HPG production. Sequence analysis of the aberrant 4-hydroxymandelate synthase refines the existing model on the catalytic differentiation of iron(II)-dependent dioxygenases. Furthermore, we report a comprehensive analysis on the phylogeny of these enzymes, which sheds light on the evolution of paralogous gene sets and the ensuing metabolic diversity in a barely studied bacterium.
... This is also one of very few randomized polynomial kernelizations (we are aware only of Harnik and Naor's probabilistic compression for Subset Sum [35]). Our result also implies añ O(k 3 ) compression for the Tanglegram Layout problem from computational biology [23] via the reduction given in [6]; a polynomial kernel for this problem was left open in [6]. Related work. ...
... This is also one of very few randomized polynomial kernelizations (we are aware only of Harnik and Naor's probabilistic compression for Subset Sum [35]). Our result also implies añ O(k 3 ) compression for the Tanglegram Layout problem from computational biology [23] via the reduction given in [6]; a polynomial kernel for this problem was left open in [6]. Related work. ...
... As discussed in Section 2.1, using NP-completeness of the problems in question, we get the following polynomial coRP-kernelization results. Corollary 3. Odd Cycle Transversal, Edge Bipartization, Balanced Subgraph, and Tanglegram Layout (see [6]) have polynomial coRP-kernelizations. ...
Article
The Odd Cycle Transversal problem (OCT) asks whether a given graph can be made bipartite by deleting at most $k$ of its vertices. In a breakthrough result Reed, Smith, and Vetta (Operations Research Letters, 2004) gave a $\BigOh(4^kkmn)$ time algorithm for it, the first algorithm with polynomial runtime of uniform degree for every fixed $k$. It is known that this implies a polynomial-time compression algorithm that turns OCT instances into equivalent instances of size at most $\BigOh(4^k)$, a so-called kernelization. Since then the existence of a polynomial kernel for OCT, i.e., a kernelization with size bounded polynomially in $k$, has turned into one of the main open questions in the study of kernelization. This work provides the first (randomized) polynomial kernelization for OCT. We introduce a novel kernelization approach based on matroid theory, where we encode all relevant information about a problem instance into a matroid with a representation of size polynomial in $k$. For OCT, the matroid is built to allow us to simulate the computation of the iterative compression step of the algorithm of Reed, Smith, and Vetta, applied (for only one round) to an approximate odd cycle transversal which it is aiming to shrink to size $k$. The process is randomized with one-sided error exponentially small in $k$, where the result can contain false positives but no false negatives, and the size guarantee is cubic in the size of the approximate solution. Combined with an $\BigOh(\sqrt{\log n})$-approximation (Agarwal et al., STOC 2005), we get a reduction of the instance to size $\BigOh(k^{4.5})$, implying a randomized polynomial kernelization.
... For example, a number of articles (Bansal et al., 2009;Böcker et al., 2009;Buchin et al., 2009;Fernau et al., 2005;Nöllenburg et al., 2009;Venkatachalam et al., 2010) consider the One-Tree Crossing Minimization (OTCM) and the Two-Tree Crossing Minimization (TTCM) problems that both aim at minimizing the number of crossings between connectors. In the former problem, the layout of one of the trees is fixed and that of the other is mutable whereas in the latter formulation the layout of both trees are allowed to be changed. ...
Article
Full-text available
In systematic biology, one is often faced with the task of comparing different phylogenetic trees, in particular in multi-gene analysis or cospeciation studies. One approach is to use a tanglegram in which two rooted phylogenetic trees are drawn opposite each other, using auxiliary lines to connect matching taxa. There is an increasing interest in using rooted phylogenetic networks to represent evolutionary history, so as to explicitly represent reticulate events, such as horizontal gene transfer, hybridization or reassortment. Thus, the question arises how to define and compute a tanglegram for such networks. In this article, we present the first formal definition of a tanglegram for rooted phylogenetic networks and present a heuristic approach for computing one, called the NN-tanglegram method. We compare the performance of our method with existing tree tanglegram algorithms and also show a typical application to real biological datasets. For maximum usability, the algorithm does not require that the trees or networks are bifurcating or bicombining, or that they are on identical taxon sets. The algorithm is implemented in our program Dendroscope 3, which is freely available from www.dendroscope.org. scornava@informatik.uni-tuebingen.de; huson@informatik.uni-tuebingen.de.