Benny Porat's research while affiliated with Bar Ilan University and other places

What is this page?


This page lists the scientific contributions of an author, who either does not have a ResearchGate profile, or has not yet added these contributions to their profile.

It was automatically created by ResearchGate to create a record of this author's body of work. We create such pages to advance our goal of creating and maintaining the most comprehensive scientific repository possible. In doing so, we process publicly available (personal) data relating to the author as a member of the scientific community.

If you're a ResearchGate member, you can follow this page to keep up with this author's work.

If you are this author, and you don't want us to display this page anymore, please let us know.

Publications (16)


Fig. 1 An example of the mask properties describes in Lemmas 1-5: (1) The suffixes starting at indices with 1-bit at the mask are periodic with periods abababcababab, abababc (trivial) and ab (Lemma 1). (2) The first 1-bit at the mast (except for the starting 1-bit) is at index 14 after half of the mask (Lemma 2). (3) The gaps between 1-bits are either equal to last gap in index 25, or at most half of the last gap in indices 21 and 23 (Lemma 3). (4) The substrings between 1-bits with equal gaps at indices 21,23,25 are all equal to ab (Lemma 4).
Fig. 4 An example of the execution of the basic dynamic programming algorithm. All entries are initialized to ∞, and the execution is done from right to left of the string: M IN [14] is set to H(aba, abb) = 1; M IN [13] = min{M IN [15] + 1, M IN [16] + 2} = ∞; M IN [12] = min{M IN [14]+2, M IN [15]+2} = 3; M IN [11] = min{M IN [13]+1, M IN [14]+ 1} = 2; M IN [10] = min{M IN [12] + 0, M IN [13] + 1} = 3; M IN [9] = min{M IN [11] + 1, M IN [12] + 2} = 3; M IN [8] = min{M IN [10] + 1, M IN [11] + 1} = 3; M IN [7] = min{M IN [9] + 2, M IN [10] + 2} = 5; M IN [6] = min{M IN [8] + 0, M IN [9] + 0} = 3; M IN [5] = min{M IN [7]+1, M IN [8]+2} = 5; M IN [4] = min{M IN [6]+2, M IN [7]+2} = 5; M IN [3] = min{M IN [5]+0, M IN [6]+0} = 3; M IN [2] = min{M IN [4]+1, M IN [5]+2} = 6; M IN [1] = min{M IN [3] + 1, M IN [4] + 1} = 4.
a The neighborhood of abba at position i=4\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$i=4$$\end{document}. b The neighborhood of abaaaba at position i=5\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$i=5$$\end{document}
The dynamic programming algorithm for the candidate relaxation of the ACP
An example of the execution of the basic dynamic programming algorithm. All entries are initialized to ∞\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\infty $$\end{document}, and the execution is done from right to left of the string: MIN[14] is set to H(aba,abb)=1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$H(aba,abb)=1$$\end{document}; MIN[13]=min{MIN[15]+1,MIN[16]+2}=∞\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$MIN[13]=\min \{MIN[15]+1,MIN[16]+2\}=\infty $$\end{document}; MIN[12]=min{MIN[14]+2,MIN[15]+2}=3\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$MIN[12]=\min \{MIN[14]+2,MIN[15]+2\}=3$$\end{document}; MIN[11]=min{MIN[13]+1,MIN[14]+1}=2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$MIN[11]=\min \{MIN[13]+1,MIN[14]+1\}=2$$\end{document}; MIN[10]=min{MIN[12]+0,MIN[13]+1}=3\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$MIN[10]=\min \{MIN[12]+0,MIN[13]+1\}=3$$\end{document}; MIN[9]=min{MIN[11]+1,MIN[12]+2}=3\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$MIN[9]=\min \{MIN[11]+1,MIN[12]+2\}=3$$\end{document}; MIN[8]=min{MIN[10]+1,MIN[11]+1}=3\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$MIN[8]=\min \{MIN[10]+1,MIN[11]+1\}=3$$\end{document}; MIN[7]=min{MIN[9]+2,MIN[10]+2}=5\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$MIN[7]=\min \{MIN[9]+2,MIN[10]+2\}=5$$\end{document}; MIN[6]=min{MIN[8]+0,MIN[9]+0}=3\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$MIN[6]=\min \{MIN[8]+0,MIN[9]+0\}=3$$\end{document}; MIN[5]=min{MIN[7]+1,MIN[8]+2}=5\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$MIN[5]=\min \{MIN[7]+1,MIN[8]+2\}=5$$\end{document}; MIN[4]=min{MIN[6]+2,MIN[7]+2}=5\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$MIN[4]=\min \{MIN[6]+2,MIN[7]+2\}=5$$\end{document}; MIN[3]=min{MIN[5]+0,MIN[6]+0}=3\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$MIN[3]=\min \{MIN[5]+0,MIN[6]+0\}=3$$\end{document}; MIN[2]=min{MIN[4]+1,MIN[5]+2}=6\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$MIN[2]=\min \{MIN[4]+1,MIN[5]+2\}=6$$\end{document}; MIN[1]=min{MIN[3]+1,MIN[4]+1}=4\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$MIN[1]=\min \{MIN[3]+1,MIN[4]+1\}=4$$\end{document}
Can We Recover the Cover?
  • Article
  • Full-text available

July 2019

·

207 Reads

·

12 Citations

Algorithmica

·

·

·

[...]

·

Benny Porat

Data analysis typically involves error recovery and detection of regularities as two different key tasks. In this paper we show that there are data types for which these two tasks can be powerfully combined. A common notion of regularity in strings is that of a cover. Data describing measures of a natural coverable phenomenon may be corrupted by errors caused by the measurement process, or by the inexact features of the phenomenon itself. Due to this reason, different variants of approximate covers have been introduced, some of which are \(\mathcal {NP}\)-hard to compute. In this paper we assume that the Hamming distance metric measures the amount of corruption experienced, and study the problem of recovering the correct cover from data corrupted by mismatch errors, formally defined as the cover recovery problem (CRP). We show that for the Hamming distance metric, coverability is a powerful property allowing detecting the original cover and correcting the data, under suitable conditions. We also study a relaxation of another problem, which is called the approximate cover problem (ACP). Since the ACP is proved to be \(\mathcal {NP}\)-hard (Amir et al. in: Approximate cover of strings. CPM, 2017), we study a relaxation, which we call the candidate-relaxation of the ACP, and show it has a polynomial time complexity. As a result, we get that the ACP also has a polynomial time complexity in many practical situations. An important application of our ACP relaxation study is also a polynomial time algorithm for the CRP.

Download
Share

On the Hardness of Optimal Vertex Relabeling and Restricted Vertex Relabeling

June 2015

·

23 Reads

·

1 Citation

Vertex Relabeling is a variant of the graph relabeling problem. In this problem, the input is a graph and two vertex labelings, and the question is to determine how close are the labelings. The distance measure is the minimum number of label swaps necessary to transform the graph from one labeling to the other, where a swap is the interchange of the labels of two adjacent nodes. We are interested in the complexity of determining the swap distance. The problem has been recently explored for various restricted classes of graphs, but its complexity in general graphs has not been established. We show that the problem is \(\mathcal {NP}\)-hard. In addition we consider restricted versions of the problem where a node can only participate in a bounded number of swaps. We show that the problem is \(\mathcal {NP}\)-hard under these restrictions as well.


Approximate On-line Palindrome Recognition, and Applications

June 2014

·

27 Reads

·

5 Citations

Palindrome recognition is a classic problem in computer science. It is an example of a language that can not be recognized by a deterministic finite automaton and is often brought as an example of a problem whose decision by a single-tape Turing machine requires quadratic time. In this paper we re-visit the palindrome recognition problem. We define a novel fingerprint that allows recognizing palindromes on-line in linear time with high probability. We then use group testing techniques to show that the fingerprint can be adapted to recognizing approximate palindromes on-line, i.e. it can recognize that a string is a palindrome with no more than k mismatches, where k is given. Finally, we show that this fingerprint can be used as a tool for solving other problems on-line. In particular we consider approximate pattern matching by non-overlapping reversals. This is the problem where two strings S and T are given and the question is whether applying a sequence of non-overlapping reversals to S results in string T.


Pattern Matching with Non Overlapping Reversals - Approximation and On-line Algorithms

December 2013

·

38 Reads

·

2 Citations

The Sorting by Reversals problem is known to be NP-hard. A simplification, Sorting by Signed Reversals is polynomially computable. Motivated by the pattern matching with rearrangements model, we consider Pattern Matching with Reversals. Since this is a generalization of the Sorting by Reversals problem, it is clearly NP-hard. We, therefore consider the simplification where reversals cannot overlap. Such a constrained version has been researched in the past for various metrics in the rearrangement model – the swap metric and the interchange metric. We show that the constrained problem can be solved in linear time. We then consider the Approximate Pattern Matching with non-overlapping Reversals problem, i.e. where mismatch errors are introduced. We show that the problem can be solved in quadratic time and space. Finally, we consider the on-line version of the problem. We introduce a novel signature for palindromes and show that it has a pleasing behavior, similar to the Karp-Rabin signature. It allows solving the Pattern Matching with non-overlapping Reversals problem on-line in linear time w.h.p.


Parameterized Matching in the Streaming Model

September 2011

·

66 Reads

·

14 Citations

We study the problem of parameterized matching in a stream where we want to output matches between a pattern of length m and the last m symbols of the stream before the next symbol arrives. Parameterized matching is a natural generalisation of exact matching where an arbitrary one-to-one relabelling of pattern symbols is allowed. We show how this problem can be solved in constant time per arriving stream symbol and sublinear, near optimal space with high probability. Our results are surprising and important: it has been shown that almost no streaming pattern matching problems can be solved (not even randomised) in less than Theta(m) space, with exact matching as the only known problem to have a sublinear, near optimal space solution. Here we demonstrate that a similar sublinear, near optimal space solution is achievable for an even more challenging problem. The proof is considerably more complex than that for exact matching.


Pattern Matching under Polynomial Transformation

September 2011

·

235 Reads

·

10 Citations

SIAM Journal on Computing

We consider a class of pattern matching problems where a normalising transformation is applied at every alignment. Normalised pattern matching plays a key role in fields as diverse as image processing and musical information processing where application specific transformations are often applied to the input. By considering the class of polynomial transformations of the input, we provide fast algorithms and the first lower bounds for both new and old problems. Given a pattern of length m and a longer text of length n where both are assumed to contain integer values only, we first show O(n log m) time algorithms for pattern matching under linear transformations even when wildcard symbols can occur in the input. We then show how to extend the technique to polynomial transformations of arbitrary degree. Next we consider the problem of finding the minimum Hamming distance under polynomial transformation. We show that, for any epsilon>0, there cannot exist an O(n m^(1-epsilon)) time algorithm for additive and linear transformations conditional on the hardness of the classic 3SUM problem. Finally, we consider a version of the Hamming distance problem under additive transformations with a bound k on the maximum distance that need be reported. We give a deterministic O(nk log k) time solution which we then improve by careful use of randomisation to O(n sqrt(k log k) log n) time for sufficiently small k. Our randomised solution outputs the correct answer at every position with high probability.


The approximate swap and mismatch edit distance

October 2010

·

55 Reads

·

8 Citations

Theoretical Computer Science

There is no known algorithm that solves the general case of the Approximate Edit Distance problem, where the edit operations are: insertion, deletion, mismatch, and swap, in time o(nm), where n is the length of the text and m is the length of the pattern. In the effort to study this problem, the edit operations were analyzed independently. Karloff [Kar93] showed an algorithm that approximates the edit distance problem with only the mismatch operation in time O(1 ǫ 2 n log 3 m). Amir et. al. [AEP04] showed that if the only edit opera-tions allowed are swap and mismatch, then the exact edit distance problem can be solved in time O(n √ m log m). In this paper, we discuss the problem of approximate edit distance with swap and mismatch. We show a randomized O(1 ǫ 3 n log n log 3 m) time algorithm for the problem. The algorithm guarantees an approximation factor of (1 + ǫ) with probability of at least 1 − 1 n .


String matching with up to k swaps and mismatches

September 2010

·

17 Reads

·

8 Citations

Information and Computation

Finding the similarity between two sequences is a major problem in computer science. It is motivated by many issues from computational biology as well as from information retrieval and image processing. These fields take into account possible corruptions of the data caused by genome rearrangements, typing mistakes, and more. Therefore, many applications do not require merely complete resemblance of the sequences, but rather an approximate matching. We consider mismatches and swaps as natural mistakes which are allowed in a meagre number. The edit distance problem with swap and mismatch operations was solved in O(nmlogm) time. Yet, the problem of string matching with at most k swaps and mismatches errors was open.In this paper, we present an algorithm that finds all locations where the pattern has at most k mismatch and swap errors in time O(nklogm).


Exact and Approximate Pattern Matching in the Streaming Model

October 2009

·

226 Reads

·

81 Citations

Foundations of Computer Science, 1975., 16th Annual Symposium on

We present a fully online randomized algorithm for the classical pattern matching problem that uses merely O(log m) space, breaking the O(m) barrier that held for this problem for a long time. Our method can be used as a tool in many practical applications, including monitoring Internet traffic and firewall applications. In our online model we first receive the pattern P of size m and preprocess it. After the preprocessing phase, the characters of the text T of size n arrive one at a time in an online fashion. For each index of the text input we indicate whether the pattern matches the text at that location index or not. Clearly, for index i, an indication can only be given once all characters from index i till index i+m-1 have arrived. Our goal is to provide such answers while using minimal space, and while spending as little time as possible on each character (time and space which are in O(poly(log n)) ).We present an algorithm whereby both false positive and false negative answers are allowed with probability of at most 1/n3. Thus, overall, the correct answer for all positions is returned with a probability of 1/n2. The time which our algorithm spends on each input character is bounded by O(log m), and the space complexity is O(log m) words. We also present a solution in the same model for the pattern matching with k mismatches problem. In this problem, a match means allowing up to k symbol mismatches between the pattern and the subtext beginning at index i. We provide an algorithm in which the time spent on each character is bounded by O(k2poly(log m)), and the space complexity is O(k3poly(log m)) words.


Mismatch Sampling.

November 2008

·

42 Reads

·

1 Citation

Information and Computation

We consider the well known problem of pattern matching under the Hamming distance. Previous approaches have shown how to count the number of mismatches efficiently, especially when a bound is known for the maximum Hamming distance. Our interest is different in that we wish collect a random sample of mismatches of fixed size at each position in the text. Given a pattern p of length m and a text t of length n, we show how to sample with high probability c mismatches where possible from every alignment of p and t in O((c + logn)(n + mlogm)logm) time. Further, we guarantee that the mismatches are sampled uniformly and can therefore be seen as representative of the types of mismatches that occur.


Citations (13)


... However, the notion of cover may still not be sufficiently general to capture repetitive signal in strings like those coming from biological sequences, which would benefit from considering approximate repeats. In this paper we elaborate on the work of Amir et al. [1][2][3][4][5] who introduce and study approximate string covers, which can be briefly defined as string covers in the presence of errors. Intuitively, given an original string, the idea is to cover a second string at a minimal distance from the original one. ...

Reference:

Approximation and Fixed Parameter Algorithms for the Approximate Cover Problem
Can We Recover the Cover?

Algorithmica

... One of the most fundamental family of problems in string algorithms is to compute the distance between a given pattern P of length m and each location in given larger text T of length n both, over alphabet Σ, under some string distance metric (See [31,22,2,32,8,6,3,7,37,13,35,33,9,12,39,34,20,11,16,19,18,17,5,4,38]). The most important distance metric in this setting is the Hamming Distance of two strings, which is the number of aligned character mismatches between the strings. ...

Note: Pattern matching with pair correlation distance
  • Citing Article
  • November 2008

Theoretical Computer Science

... One of the most fundamental family of problems in string algorithms is to compute the distance between a given pattern P of length m and each location in given larger text T of length n both, over alphabet Σ, under some string distance metric (See [31,22,2,32,8,6,3,7,37,13,35,33,9,12,39,34,20,11,16,19,18,17,5,4,38]). The most important distance metric in this setting is the Hamming Distance of two strings, which is the number of aligned character mismatches between the strings. ...

Jump-Matching with Errors
  • Citing Conference Paper
  • October 2007

... One of the most fundamental family of problems in string algorithms is to compute the distance between a given pattern P of length m and each location in given larger text T of length n both, over alphabet Σ, under some string distance metric (See [31, 22, 2, 32, 8, 6, 3, 7, 37, 13, 35, 33, 9, 12, 39, 34, 20, 11, 16, 19, 18, 17, 5, 4, 38]). The most important distance metric in this setting is the Hamming Distance of two strings, which is the number of aligned character mismatches between the strings. ...

Pattern Matching with Pair Correlation Distance
  • Citing Conference Paper
  • November 2008

Theoretical Computer Science

... The classical algorithms use space that is proportional to the pattern size. In a surprising work [PP09], Porat and Porat were the first to design a pattern matching algorithm that uses less space. They designed an on-line algorithm that pre-processes the pattern P into a small data structure, and then it receives the text symbol by symbol. ...

Exact and Approximate Pattern Matching in the Streaming Model
  • Citing Conference Paper
  • October 2009

Foundations of Computer Science, 1975., 16th Annual Symposium on

... Here c is a small constant. We extend this algorithm to pattern matching with mismatches and wild cards, in Recent work has also addressed the online version of pattern matching, where the text is received in a streaming model, one character at a time, and it cannot be stored in its entirety (see e.g., [16], [70], [71]). Another version of this problem matches the pattern against multiple input streams (see e.g., [15]). ...

A Black Box for Online Approximate Pattern Matching
  • Citing Conference Paper
  • June 2008

Information and Computation