Vaguery + bioinformatics   21

[1109.5635] Approximating Edit Distance in Near-Linear Time
"We show how to compute the edit distance between two strings of length n up to a factor of 2^{~O(sqrt(log n))} in n^(1+o(1)) time. This is the first sub-polynomial approximation algorithm for this problem that runs in near-linear time, improving on the state-of-the-art n^(1/3+o(1)) approximation. Previously, approximation of 2^{~O(sqrt(log n))} was known only for embedding edit distance into l_1, and it is not known if that embedding can be computed in less than quadratic time."
algorithms  string-editing  Levenshtein-distance  rewriting-systems  bioinformatics  nudge-targets 
october 2011 by Vaguery
[1109.1275] A Formal Verification Approach to the Design of Synthetic Gene Networks
"The design of genetic networks with specific functions is one of the major goals of synthetic biology. However, constructing biological devices that work "as required" remains challenging, while the cost of uncovering flawed designs experimentally is large. To address this issue, we propose a fully automated framework that allows the correctness of synthetic gene networks to be formally verified in silico from rich, high level functional specifications.
Given a device, we automatically construct a mathematical model from experimental data characterizing the parts it is composed of. The specific model structure guarantees that all experimental observations are captured and allows us to construct finite abstractions through polyhedral operations. The correctness of the model with respect to temporal logic specifications can then be verified automatically using methods inspired by model checking.
Overall, our procedure is conservative but it can filter through a large number of potential device designs and select few that satisfy the specification to be implemented and tested further experimentally. Illustrative examples of the application of our methods to the design of simple synthetic gene networks are included."
genetic-regulatory-networks  bioinformatics  biological-engineering  design-automation  emergent-design  acceptance-testing  performance-measure  nudge 
october 2011 by Vaguery
[1108.1150] Epistasis can lead to fragmented neutral spaces and contingency in evolution
"Under neutral reciprocal sign epistasis, two genetic changes are jointly neutral, even though their individual effects are deleterious. By using the widely studied mapping from an RNA sequence to secondary structure, we investigate the effect of this kind of epistasis on neutral spaces corresponding to networks of genotypes that fold to the same secondary structure. Neutral networks for RNA structures with n bonds are typically fragmented into at least 2^n different neutral components that cannot be connected by single point mutations. By exhaustive enumeration of all RNA secondary structures of sequences of length 15 we show that most networks are not dominated by one neutral component, but are rather broken up into multiple large components. Although they generate the same phenotype, components of a single neutral network are heterogeneous, showing wide variations in their robustness and their evolvability. Both properties are correlated with component size, rather than with the size of their underlying neutral network. In particular, sets of accessible phenotypes can vary quite strongly between components. Thus, the potential for future innovation is contingent on which neutral component a population occupies. We further argue that neutral reciprocal sign epistasis may have similar consequences for neutral evolution of other biological systems as well."
combinatorics  RNA  neutral-networks  complexology  bioinformatics  polymer-models  mathematical-recreations  nudge-targets 
october 2011 by Vaguery
[1008.2555] Succinct Data Structures for Assembling Large Genomes
"Motivation: Second generation sequencing technology makes it feasible for many researches to obtain enough sequence reads to attempt the de novo assembly of higher eukaryotes (including mammals). De novo assembly not only provides a tool for understanding wide scale biological variation, but within human bio-medicine, it offers a direct way of observing both large scale structural variation and fine scale sequence variation. Unfortunately, improvements in the computational feasibility for de novo assembly have not matched the improvements in the gathering of sequence data. This is for two reasons: the inherent computational complexity of the problem, and the in-practice memory requirements of tools."
genomics  bioinformatics  discrete-mathematics  algorithms  nudge-targets  inference  data-driven 
august 2010 by Vaguery
[1007.5075] Microbiome profiling by Illumina sequencing of combinatorial sequence-tagged PCR products
"e developed a low-cost, high-throughput microbiome profiling method that uses combinatorial sequence tags attached to PCR primers that amplify the rRNA V6 region. Amplified PCR products are sequenced using an Illumina paired-end protocol to generate millions of overlapping reads. Combinatorial sequence tagging can be used to examine hundreds of samples with far fewer primers than is required when sequence tags are incorporated at only a single end.…"
ecoinformatics  bioinformatics  sequencing  microbiology  community-assembly 
august 2010 by Vaguery
[1007.3424] Bacterial Community Reconstruction Using A Single Sequencing Reaction
"Bacteria are the unseen majority on our planet, with millions of species and comprising most of the living protoplasm. While current methods enable in-depth study of a small number of communities, a simple tool for breadth studies of bacterial population composition in a large number of samples is lacking. We propose a novel approach for reconstruction of the composition of an unknown mixture of bacteria using a single Sanger-sequencing reaction of the mixture. This method is based on compressive sensing theory, which deals with reconstruction of a sparse signal using a small number of measurements. Utilizing the fact that in many cases each bacterial community is comprised of a small subset of the known bacterial species, we show the feasibility of this approach for determining the composition of a bacterial mixture.…"
bacteria  community-assembly  microbiology  bioinformatics  sequenomics  ecology  complexology  datasets  it's-more-complicated-than-you-think  stuff-I-wish-we-had-20-years-ago-DAMMIT 
july 2010 by Vaguery
[1007.3964] Non-hereditary maximum parsimony trees
"In this paper, we investigate a conjecture by von Haeseler concerning the Maximum Parsimony method for phylogenetic estimation, which was published by the Newton Institute in Cambridge on a list of open phylogenetic problems in 2007. This conjecture deals with the question whether Maximum Parsimony trees are hereditary. The conjecture suggests that a Maximum Parsimony tree for a particular (DNA) alignment necessarily has subtrees of all possible sizes which are most parsimonious for the corresponding subalignments. We answer the conjecture affirmatively for binary alignments on five taxa but also show how to construct examples for which Maximum Parsimony trees are not hereditary. …we also show that compatible most parsimonious quartets do not have to provide a most parsimonious supertree. Last, we show that our results can be generalized to Maximum Likelihood for certain nucleotide substitution models."
cladistics  sequences  bioinformatics  modeling  algorithms  modeling-is-not-mathematics  it's-more-complicated-than-you-think  nudge-targets 
july 2010 by Vaguery
[1003.3987] RNA-RNA interaction prediction based on multiple sequence alignments
"Many computerized methods for RNA-RNA interaction structure prediction have been developed. Recently, $O(N^6)$ time and $O(N^4)$ space dynamic programming algorithms have become available that compute the partition function of RNA-RNA interaction complexes. However, few of these methods incorporate the knowledge concerning related sequences, thus relevant evolutionary information is often neglected from the structure determination. Therefore, it is of considerable practical interest to introduce a method taking into consideration both thermodynamic stability and sequence covariation. We present the \emph{a priori} folding algorithm \texttt{ripalign}, whose input consists of two (given) multiple sequence alignments (MSA). \texttt{ripalign} outputs (1) the partition function, (2) base-pairing probabilities, (3) hybrid probabilities and (4) a set of Boltzmann-sampled suboptimal structures consisting of canonical joint structures that are compatible to the alignments.…"
RNA-folding  algorithms  numerical-methods  bioinformatics  nudge-targets 
july 2010 by Vaguery
[1006.3246] Sparse approaches for the exact distribution of patterns in long multi-states sequences generated by a Markov source
"We present two novel approaches for the computation of the exact distribution of a pattern in a long sequence. Both approaches take into account the sparse structure of the problem. The first approach relies on a partial recursion computing the largest eigenvalue of the the transition matrix of a Markov chain embedding. The second approach uses fast Taylor expansions of an exact bivariate rational reconstruction of the distribution. We illustrate the interest of both approaches on a simple toy-example and two biological applications: the transcription factors of the Human Chromosome 5 and the PROSITE signatures of functional motifs in proteins. On these examples our methods demonstrate their complementarity and their hability to extend the domain of feasibility for exact computations in pattern problems to a new level."
bioinformatics  nudge-targets  sequences  statistics  models  computational-mechanics  automata 
june 2010 by Vaguery
[1005.4033] Polylogarithmic Approximation for Edit Distance and the Asymmetric Query Complexity
"We present a near-linear time algorithm that approximates the edit distance between two strings within a polylogarithmic factor; specifically, for strings of length n and every fixed epsilon>0, it can compute a (log n)^O(1/epsilon) approximation in n^(1+epsilon) time. This is an exponential improvement over the previously known factor, 2^(O (sqrt(log n))), with a comparable running time (Ostrovsky and Rabani J.ACM 2007; Andoni and Onak STOC 2009). Previously, no efficient polylogarithmic approximation algorithm was known for any computational task involving edit distance (e.g., nearest neighbor search or sketching). "
nudge-targets  algorithms  computational-complexity  strings  bioinformatics  compression 
may 2010 by Vaguery
[1005.2107] Efficient parameter search for qualitative models of regulatory networks using symbolic model checking
"Investigating the relation between the structure and behavior of complex biological networks often involves posing the following two questions: Is a hypothesized structure of a regulatory network consistent with the observed behavior? And can a proposed structure generate a desired behavior? Answering these questions presupposes that we are able to test the compatibility of network structure and behavior. We cast these questions into a parameter search problem for qualitative models of regulatory networks…. We develop a method based on symbolic model checking that avoids enumerating all possible parametrizations, and show that this method performs well on real biological problems, using the IRMA synthetic network and benchmark experimental data sets. We test the consistency between the IRMA network structure and the time-series data, and search for parameter modifications that would improve the robustness of the external control of the system behavior."
complexology  theoretical-biology  bioinformatics  modeling-is-not-mathematics  nudge-targets 
may 2010 by Vaguery
[0904.4458] Learning Character Strings via Mastermind Queries, with a Case Study Involving mtDNA
"We study the degree to which a character string, $Q$, leaks details about itself any time it engages in comparison protocols with a strings provided by a querier, Bob, even if those protocols are cryptographically guaranteed to produce no additional information other than the scores that assess the degree to which $Q$ matches strings offered by Bob. We show that such scenarios allow Bob to play variants of the game of Mastermind with $Q$ so as to learn the complete identity of $Q$."
mathematical-recreations  bioinformatics  algorithms  preprint  learning-by-doing 
april 2010 by Vaguery
I patent your ass. And your leg. And your nostril. – Bad Science
"Then they tested their model against reality: in a giant computing task, they took all the 15-nucleotide sequences from the BRCA1 gene, and searched for them, just on chromosome 1: they found 340,000 matches, roughly the same as their theoretical prediction, and the equivalent of 14 infringing sequences on every human gene. The BRCA1 gene, incidentally, is on chromosome 17.
The claims in this patent therefore extend, if properly enforced, to almost every single gene, in every single person on the planet. There is a moral and practical argument to be had about patenting nature, but the rights conferred in this patent are basically absurd."
intellectual-property  biopatents  patent-abuse  genomics  bioinformatics  lawyers  expertise-as-a-weapon 
april 2010 by Vaguery
Sequenomics
Thom LaBean and Erik Schultes start a science thing the three of us know lots about: directed combinatorial molecular design. Looking forward to seeing how they monetize expertise.
startups  Thom-LaBean  Erik-Schultes  sequenomics  science  molecular-design  combinatorial-libraries  biopolymers  bioinformatics  irrational-design 
july 2007 by Vaguery

related tags

academia  acceptance-testing  algorithms  analytics  automata  bacteria  bibliography  biochemistry  bioinformatics  biological-engineering  biopatents  biopolymers  CFP  citation  cladistics  classification  collaboration  combinatorial-libraries  combinatorics  community-assembly  complexology  compression  computational-complexity  computational-mechanics  crystallography  data-driven  data-mining  datasets  design-automation  discrete-mathematics  dynamics  ecoinformatics  ecology  emergent-design  epistasis  Erik-Schultes  expertise-as-a-weapon  firehose-drinking  genetic-regulatory-networks  genomics  healthcare  impact  inference  intellectual-property  irrational-design  it's-more-complicated-than-you-think  journals  lawyers  learning-by-doing  Levenshtein-distance  machine-learning  magazines  mathematical-recreations  mathematics  metaoptimization  microbiology  modeling  modeling-is-not-mathematics  models  molecular-design  MRSA  neutral-networks  nudge  nudge-targets  numerical-methods  open-access  patent-abuse  pattern-discovery  performance-measure  pharmaceutical  phenotype-genotype-stuff  polymer-models  preprint  protein-folding  publishing  R  research  rewriting-systems  RNA  RNA-folding  science  sequences  sequencing  sequenomics  software  Staph-aureus  startups  statistics  string-editing  strings  structural-biology  stuff-I-wish-we-had-20-years-ago-DAMMIT  systems-biology  target  technical  theoretical-biology  Thom-LaBean 

Copy this bookmark:



description:


tags: