Three different ways of viewing an RNA sequence. (a) A schematic 2-dimensional description of an RNA folding. (b) A linear representation of the RNA. (c) The RNA as a rooted ordered tree.

Source publication

Locality and Gaps in RNA Comparison

Article

Full-text available

Oct 2007

Locality is an important and well-studied notion in comparative analysis of biological sequences. Similarly, taking into account affine gap penalties when calculating biological sequence alignments is a well-accepted technique for obtaining better alignments. When dealing with RNA, one has to take into consideration not only sequential features, bu...

Context 1

... have been quite a few approaches for defining alignments in terms of RNAs. The first one is due to the seminal paper of Zhang and Shasha (1989) which represented RNA sequences as rooted ordered trees ( Fig. 1), and defined editing operations on trees which correspond to editing operations on RNA sequences. In this way, an alignment of two RNA sequences corresponds to a sequence of editing operations on two corresponding trees, and any tree editing algorithm can be used to compute the optimal alignment of two RNAs. Furthermore, this approach ...

View in full-text

Extended Topological Persistence and Contact Arrangements in Folded Linear Molecules

Article

Full-text available

May 2016

Structure plays a pivotal role in determining the functional properties of self-interacting linear biomolecular chains, for example proteins and nucleic acids. In this paper, we propose a method for representing each such molecule combinatorially - as a one-dimensional simplicial complex - in a novel way that takes into account intra-chain contacts. The representation allows for efficient quantification of structural similarities and differences between molecules, and for studying molecular topology using extended persistence. This method performs a multi-scale analysis on a filtered simplicial complex as it tracks clusters, holes, and higher dimensional voids in the filtration. From extended persistence we extract information about the arrangement of intra-chain interactions, a topological property which demonstrably affects folding and unfolding dynamics of the linear chains.

An Algorithm for Comparing Similarity Between Two Trees

Article

Aug 2015

Hangjun Xu

An important problem in geometric computing is defining and computing similarity between two geometric shapes, e.g. point sets, curves and surfaces, etc. Important geometric and topological information of many shapes can be captured by defining a tree structure on them (e.g. medial axis and contour trees). Hence, it is natural to study the problem of comparing similarity between trees. We study gapped edit distance between two ordered labeled trees, first proposed by Touzet \cite{Touzet2003}. Given two binary trees $T_{1}$ and $T_{2}$ with $m$ and $n$ nodes. We compute the general gap edit distance in $O(m^{3}n^{2} + m^{2}n^{3})$ time. The computation of this distance in the case of arbitrary trees has shown to be NP-hard \cite{Touzet2003}. We also give an algorithm for computing the complete subtree gap edit distance, which can be applied to comparing contour trees of terrains in $\mathbb{R}^{3}$.

Modeling Dynamic Programming Problems over Sequences and Trees with Inverse Coupled Rewrite Systems

Article

Full-text available

Mar 2014

Dynamic programming is a classical algorithmic paradigm, which often allows the evaluation of a search space of exponential size in polynomial time. Recursive problem decomposition, tabulation of intermediate results for re-use, and Bellman's Principle of Optimality are its well-understood ingredients. However, algorithms often lack abstraction and are difficult to implement, tedious to debug, and delicate to modify. The present article proposes a generic framework for specifying dynamic programming problems. This framework can handle all kinds of sequential inputs, as well as tree-structured data. Biosequence analysis, document processing, molecular structure analysis, comparison of objects assembled in a hierarchic fashion, and generally, all domains come under consideration where strings and ordered, rooted trees serve as natural data representations. The new approach introduces inverse coupled rewrite systems. They describe the solutions of combinatorial optimization problems as the inverse image of a term rewrite relation that reduces problem solutions to problem inputs. This specification leads to concise yet translucent specifications of dynamic programming algorithms. Their actual implementation may be challenging, but eventually, as we hope, it can be produced automatically. The present article demonstrates the scope of this new approach by describing a diverse set of dynamic programming problems which arise in the domain of computational biology, with examples in biosequence and molecular structure analysis.

Accelerating dynamic programming

Article

Mar 2010

Oren Weimann

Dynamic Programming (DP) is a fundamental problem-solving technique that has been widely used for solving a broad range of search and optimization problems. While DP can be invoked when more specialized methods fail, this generality often incurs a cost in efficiency. We explore a unifying toolkit for speeding up DP, and algorithms that use DP as subroutines. Our methods and results can be summarized as follows. - Acceleration via Compression. Compression is traditionally used to efficiently store data. We use compression in order to identify repeats in the table that imply a redundant computation. Utilizing these repeats requires a new DP, and often different DPs for different compression schemes. We present the first provable speedup of the celebrated Viterbi algorithm (1967) that is used for the decoding and training of Hidden Markov Models (HMMs). Our speedup relies on the compression of the HMM's observable sequence. - Totally Monotone Matrices. It is well known that a wide variety of DPs can be reduced to the problem of finding row minima in totally monotone matrices. We introduce this scheme in the context of planar graph problems. In particular, we show that planar graph problems such as shortest paths, feasible flow, bipartite perfect matching, and replacement paths can be accelerated by DPs that exploit a total-monotonicity property of the shortest paths. - Combining Compression and Total Monotonicity. We introduce a method for accelerating string edit distance computation by combining compression and totally monotone matrices.

Tree Edit Distance Problems: Algorithms and Applications to Bioinformatics

Article

Feb 2010
IEICE T INF SYST

Tatsuya Akutsu

Tree structured data often appear in bioinformatics. For example, glycans, RNA secondary structures and phylogenetic trees usually have tree structures. Comparison of trees is one of fundamental tasks in analysis of these data. Various distance measures have been proposed and utilized for comparison of trees, among which extensive studies have been done on tree edit distance. In this paper, we review key results and our recent results on the tree edit distance problem and related problems. In particular, we review polynomial time exact algorithms and more efficient approximation algorithms for the edit distance problem for ordered trees, and approximation algorithms for the largest common sub-tree problem for unordered trees. We also review applications of tree edit distance and its variants to bioinformatics with focusing on comparison of glycan structures.

Fast algorithms for computing tree LCS

Article

Oct 2009
THEOR COMPUT SCI

The LCS of two rooted, ordered, and labeled trees F and G is the largest forest that can be obtained from both trees by deleting nodes. We present algorithms for computing tree LCS which exploit the sparsity inherent to the tree LCS problem. Assuming G is smaller than F, our first algorithm runs in time , where r is the number of pairs (v∈F,w∈G) such that v and w have the same label. Our second algorithm runs in time , where L is the size of the LCS of F and G. For this algorithm we present a novel three-dimensional alignment graph. Our third algorithm is intended for the constrained variant of the problem in which only nodes with zero or one children can be deleted. For this case we obtain an time algorithm, where .

A Practical Edit-Distance Model for RNA Secondary-Structure Comparison

Conference Paper

Full-text available

Jun 2009

We point out the importance to incorporate affine-gap penalties in RNA secondary-structure comparison. Two notions of affine-gap penalties, one for sequences and the other for structures, are developed. A model from Jiang et al. in [JComput Biol, 2002, 9, (2), pp. 371-388] is extended to allow this facility, and a polynomial-time algorithm is provided in this paper. Experimental results in this paper revealed that our new model generates more accurate and biological meaningful alignments than several existing algorithms.

Fast Algorithms for Computing Tree LCS

Conference Paper

Full-text available

Jun 2008

The LCS of two rooted, ordered, and labeled trees F and G is the largest forest that can be obtained from both trees by deleting nodes. We present algorithms for computing tree LCS which exploit the sparsity inherent to the tree LCS problem. Assuming G is smaller than F, our first algorithm runs in time O(rheight(F) height(G)lglg|G|)O(r\cdot {\rm height}(F) \cdot {\rm height}(G)\cdot \lg\lg |G|) , where r is the number of pairs (v ∈ F, w ∈ G) such that v and w have the same label. Our second algorithm runs in time O(L r lgr lglg|G|)O(L r \lg r \cdot \lg\lg|G|) , where L is the size of the LCS of F and G. For this algorithm we present a novel three dimensional alignment graph. Our third algorithm is intended for the constrained variant of the problem in which only nodes with zero or one children can be deleted. For this case we obtain an O(r h lglg|G|)O(r h \lg \lg|G|) time algorithm, where h = height(F) + height(G).

Topics in Computing Similarity and Distance

Thesis

Full-text available

Oct 2008

Shihyen Chen

Local Exact Pattern Matching for Non-fixed RNA Structures

Conference Paper

Jul 2012
IEEE ACM T COMPUT BI

Detecting local common sequence-structure regions of RNAs is a biologically meaningful problem. By detecting such regions, biologists are able to identify functional similarity between the inspected molecules. We developed dynamic programming algorithms for finding common structure-sequence patterns between two RNAs. The RNAs are given by their sequence and a set of potential base pairs with associated probabilities. In contrast to prior work which matches fixed structures, we support the arc breaking edit operation; this allows to match only a subset of the given base pairs. We present an O(n 3) algorithm for local exact pattern matching between two nested RNAs, and an O(n 3logn) algorithm for one nested RNA and one bounded-unlimited RNA.

Three different ways of viewing an RNA sequence. (a) A schematic 2-dimensional description of an RNA folding. (b) A linear representation of the RNA. (c) The RNA as a rooted ordered tree.

Context in source publication

Citations