Vaguery + strings   5

[1204.3293] Efficiently decoding strings from their shingles
"Determining whether an unordered collection of overlapping substrings (called shingles) can be uniquely decoded into a consistent string is a problem that lies within the foundation of a broad assortment of disciplines ranging from networking and information theory through cryptography and even genetic engineering and linguistics. We present three perspectives on this problem: a graph theoretic framework due to Pevzner, an automata theoretic approach from our previous work, and a new insight that yields a time-optimal streaming algorithm for determining whether a string of $n$ characters over the alphabet $Sigma$ can be uniquely decoded from its two-character shingles. Our algorithm achieves an overall time complexity $Theta(n)$ and space complexity $O(|Sigma|)$. As an application, we demonstrate how this algorithm can be extended to larger shingles for efficient string reconciliation."
strings  algorithms  computational-complexity  nudge-targets 
5 weeks ago by Vaguery
[1110.5296] Computing a Longest Common Palindromic Subsequence
"The {em longest common subsequence (LCS)} problem is a classic and well-studied problem in computer science. Palindrome is a word which reads the same forward as it does backward. The {em longest common palindromic subsequence (LCPS)} problem is an interesting variant of the classic LCS problem which finds the longest common subsequence between two given strings such that the computed subsequence is also a palindrome. In this paper, we study the LCPS problem and give efficient algorithms to solve this problem. To the best of our knowledge, this is the first attempt to study and solve this interesting problem."
combinatorics  strings  algorithms  nudge-targets 
december 2011 by Vaguery
[1008.1191] Improved Fast Similarity Search in Dictionaries
"We engineer an algorithm to solve the approximate dictionary matching problem. Given a list of words $\mathcal{W}$, maximum distance $d$ fixed at preprocessing time and a query word $q$, we would like to retrieve all words from $\mathcal{W}$ that can be transformed into $q$ with $d$ or less edit operations. We present data structures that support fault tolerant queries by generating an index. On top of that, we present a generalization of the method that eases memory consumption and preprocessing time significantly. At the same time, running times of queries are virtually unaffected. We are able to match in lists of hundreds of thousands of words and beyond within microseconds for reasonable distances."
nudge-targets  strings  search-engines  clustering  algorithms  heuristics 
august 2010 by Vaguery
[1005.4033] Polylogarithmic Approximation for Edit Distance and the Asymmetric Query Complexity
"We present a near-linear time algorithm that approximates the edit distance between two strings within a polylogarithmic factor; specifically, for strings of length n and every fixed epsilon>0, it can compute a (log n)^O(1/epsilon) approximation in n^(1+epsilon) time. This is an exponential improvement over the previously known factor, 2^(O (sqrt(log n))), with a comparable running time (Ostrovsky and Rabani J.ACM 2007; Andoni and Onak STOC 2009). Previously, no efficient polylogarithmic approximation algorithm was known for any computational task involving edit distance (e.g., nearest neighbor search or sketching). "
nudge-targets  algorithms  computational-complexity  strings  bioinformatics  compression 
may 2010 by Vaguery

Copy this bookmark:



description:


tags: