Vaguery + bioinformatics 21
[1109.5635] Approximating Edit Distance in Near-Linear Time
october 2011 by Vaguery
"We show how to compute the edit distance between two strings of length n up to a factor of 2^{~O(sqrt(log n))} in n^(1+o(1)) time. This is the first sub-polynomial approximation algorithm for this problem that runs in near-linear time, improving on the state-of-the-art n^(1/3+o(1)) approximation. Previously, approximation of 2^{~O(sqrt(log n))} was known only for embedding edit distance into l_1, and it is not known if that embedding can be computed in less than quadratic time."
algorithms
string-editing
Levenshtein-distance
rewriting-systems
bioinformatics
nudge-targets
october 2011 by Vaguery
[1109.1275] A Formal Verification Approach to the Design of Synthetic Gene Networks
october 2011 by Vaguery
"The design of genetic networks with specific functions is one of the major goals of synthetic biology. However, constructing biological devices that work "as required" remains challenging, while the cost of uncovering flawed designs experimentally is large. To address this issue, we propose a fully automated framework that allows the correctness of synthetic gene networks to be formally verified in silico from rich, high level functional specifications.
Given a device, we automatically construct a mathematical model from experimental data characterizing the parts it is composed of. The specific model structure guarantees that all experimental observations are captured and allows us to construct finite abstractions through polyhedral operations. The correctness of the model with respect to temporal logic specifications can then be verified automatically using methods inspired by model checking.
Overall, our procedure is conservative but it can filter through a large number of potential device designs and select few that satisfy the specification to be implemented and tested further experimentally. Illustrative examples of the application of our methods to the design of simple synthetic gene networks are included."
genetic-regulatory-networks
bioinformatics
biological-engineering
design-automation
emergent-design
acceptance-testing
performance-measure
nudge
Given a device, we automatically construct a mathematical model from experimental data characterizing the parts it is composed of. The specific model structure guarantees that all experimental observations are captured and allows us to construct finite abstractions through polyhedral operations. The correctness of the model with respect to temporal logic specifications can then be verified automatically using methods inspired by model checking.
Overall, our procedure is conservative but it can filter through a large number of potential device designs and select few that satisfy the specification to be implemented and tested further experimentally. Illustrative examples of the application of our methods to the design of simple synthetic gene networks are included."
october 2011 by Vaguery
[1108.1150] Epistasis can lead to fragmented neutral spaces and contingency in evolution
october 2011 by Vaguery
"Under neutral reciprocal sign epistasis, two genetic changes are jointly neutral, even though their individual effects are deleterious. By using the widely studied mapping from an RNA sequence to secondary structure, we investigate the effect of this kind of epistasis on neutral spaces corresponding to networks of genotypes that fold to the same secondary structure. Neutral networks for RNA structures with n bonds are typically fragmented into at least 2^n different neutral components that cannot be connected by single point mutations. By exhaustive enumeration of all RNA secondary structures of sequences of length 15 we show that most networks are not dominated by one neutral component, but are rather broken up into multiple large components. Although they generate the same phenotype, components of a single neutral network are heterogeneous, showing wide variations in their robustness and their evolvability. Both properties are correlated with component size, rather than with the size of their underlying neutral network. In particular, sets of accessible phenotypes can vary quite strongly between components. Thus, the potential for future innovation is contingent on which neutral component a population occupies. We further argue that neutral reciprocal sign epistasis may have similar consequences for neutral evolution of other biological systems as well."
combinatorics
RNA
neutral-networks
complexology
bioinformatics
polymer-models
mathematical-recreations
nudge-targets
october 2011 by Vaguery
[1008.2555] Succinct Data Structures for Assembling Large Genomes
august 2010 by Vaguery
"Motivation: Second generation sequencing technology makes it feasible for many researches to obtain enough sequence reads to attempt the de novo assembly of higher eukaryotes (including mammals). De novo assembly not only provides a tool for understanding wide scale biological variation, but within human bio-medicine, it offers a direct way of observing both large scale structural variation and fine scale sequence variation. Unfortunately, improvements in the computational feasibility for de novo assembly have not matched the improvements in the gathering of sequence data. This is for two reasons: the inherent computational complexity of the problem, and the in-practice memory requirements of tools."
genomics
bioinformatics
discrete-mathematics
algorithms
nudge-targets
inference
data-driven
august 2010 by Vaguery
[1007.5075] Microbiome profiling by Illumina sequencing of combinatorial sequence-tagged PCR products
august 2010 by Vaguery
"e developed a low-cost, high-throughput microbiome profiling method that uses combinatorial sequence tags attached to PCR primers that amplify the rRNA V6 region. Amplified PCR products are sequenced using an Illumina paired-end protocol to generate millions of overlapping reads. Combinatorial sequence tagging can be used to examine hundreds of samples with far fewer primers than is required when sequence tags are incorporated at only a single end.…"
ecoinformatics
bioinformatics
sequencing
microbiology
community-assembly
august 2010 by Vaguery
[1007.3424] Bacterial Community Reconstruction Using A Single Sequencing Reaction
july 2010 by Vaguery
"Bacteria are the unseen majority on our planet, with millions of species and comprising most of the living protoplasm. While current methods enable in-depth study of a small number of communities, a simple tool for breadth studies of bacterial population composition in a large number of samples is lacking. We propose a novel approach for reconstruction of the composition of an unknown mixture of bacteria using a single Sanger-sequencing reaction of the mixture. This method is based on compressive sensing theory, which deals with reconstruction of a sparse signal using a small number of measurements. Utilizing the fact that in many cases each bacterial community is comprised of a small subset of the known bacterial species, we show the feasibility of this approach for determining the composition of a bacterial mixture.…"
bacteria
community-assembly
microbiology
bioinformatics
sequenomics
ecology
complexology
datasets
it's-more-complicated-than-you-think
stuff-I-wish-we-had-20-years-ago-DAMMIT
july 2010 by Vaguery
[1007.3964] Non-hereditary maximum parsimony trees
july 2010 by Vaguery
"In this paper, we investigate a conjecture by von Haeseler concerning the Maximum Parsimony method for phylogenetic estimation, which was published by the Newton Institute in Cambridge on a list of open phylogenetic problems in 2007. This conjecture deals with the question whether Maximum Parsimony trees are hereditary. The conjecture suggests that a Maximum Parsimony tree for a particular (DNA) alignment necessarily has subtrees of all possible sizes which are most parsimonious for the corresponding subalignments. We answer the conjecture affirmatively for binary alignments on five taxa but also show how to construct examples for which Maximum Parsimony trees are not hereditary. …we also show that compatible most parsimonious quartets do not have to provide a most parsimonious supertree. Last, we show that our results can be generalized to Maximum Likelihood for certain nucleotide substitution models."
cladistics
sequences
bioinformatics
modeling
algorithms
modeling-is-not-mathematics
it's-more-complicated-than-you-think
nudge-targets
july 2010 by Vaguery
[1003.3987] RNA-RNA interaction prediction based on multiple sequence alignments
july 2010 by Vaguery
"Many computerized methods for RNA-RNA interaction structure prediction have been developed. Recently, $O(N^6)$ time and $O(N^4)$ space dynamic programming algorithms have become available that compute the partition function of RNA-RNA interaction complexes. However, few of these methods incorporate the knowledge concerning related sequences, thus relevant evolutionary information is often neglected from the structure determination. Therefore, it is of considerable practical interest to introduce a method taking into consideration both thermodynamic stability and sequence covariation. We present the \emph{a priori} folding algorithm \texttt{ripalign}, whose input consists of two (given) multiple sequence alignments (MSA). \texttt{ripalign} outputs (1) the partition function, (2) base-pairing probabilities, (3) hybrid probabilities and (4) a set of Boltzmann-sampled suboptimal structures consisting of canonical joint structures that are compatible to the alignments.…"
RNA-folding
algorithms
numerical-methods
bioinformatics
nudge-targets
july 2010 by Vaguery
[1006.4929] Detecting epistasis via Markov bases
june 2010 by Vaguery
Specifically: "Genome-wide association study of hair length in dogs"
nudge-targets
epistasis
bioinformatics
genomics
data-mining
firehose-drinking
phenotype-genotype-stuff
june 2010 by Vaguery
[1006.3246] Sparse approaches for the exact distribution of patterns in long multi-states sequences generated by a Markov source
june 2010 by Vaguery
"We present two novel approaches for the computation of the exact distribution of a pattern in a long sequence. Both approaches take into account the sparse structure of the problem. The first approach relies on a partial recursion computing the largest eigenvalue of the the transition matrix of a Markov chain embedding. The second approach uses fast Taylor expansions of an exact bivariate rational reconstruction of the distribution. We illustrate the interest of both approaches on a simple toy-example and two biological applications: the transcription factors of the Human Chromosome 5 and the PROSITE signatures of functional motifs in proteins. On these examples our methods demonstrate their complementarity and their hability to extend the domain of feasibility for exact computations in pattern problems to a new level."
bioinformatics
nudge-targets
sequences
statistics
models
computational-mechanics
automata
june 2010 by Vaguery
[1005.4033] Polylogarithmic Approximation for Edit Distance and the Asymmetric Query Complexity
may 2010 by Vaguery
"We present a near-linear time algorithm that approximates the edit distance between two strings within a polylogarithmic factor; specifically, for strings of length n and every fixed epsilon>0, it can compute a (log n)^O(1/epsilon) approximation in n^(1+epsilon) time. This is an exponential improvement over the previously known factor, 2^(O (sqrt(log n))), with a comparable running time (Ostrovsky and Rabani J.ACM 2007; Andoni and Onak STOC 2009). Previously, no efficient polylogarithmic approximation algorithm was known for any computational task involving edit distance (e.g., nearest neighbor search or sketching). "
nudge-targets
algorithms
computational-complexity
strings
bioinformatics
compression
may 2010 by Vaguery
[1005.2107] Efficient parameter search for qualitative models of regulatory networks using symbolic model checking
may 2010 by Vaguery
"Investigating the relation between the structure and behavior of complex biological networks often involves posing the following two questions: Is a hypothesized structure of a regulatory network consistent with the observed behavior? And can a proposed structure generate a desired behavior? Answering these questions presupposes that we are able to test the compatibility of network structure and behavior. We cast these questions into a parameter search problem for qualitative models of regulatory networks…. We develop a method based on symbolic model checking that avoids enumerating all possible parametrizations, and show that this method performs well on real biological problems, using the IRMA synthetic network and benchmark experimental data sets. We test the consistency between the IRMA network structure and the time-series data, and search for parameter modifications that would improve the robustness of the external control of the system behavior."
complexology
theoretical-biology
bioinformatics
modeling-is-not-mathematics
nudge-targets
may 2010 by Vaguery
[0904.4458] Learning Character Strings via Mastermind Queries, with a Case Study Involving mtDNA
april 2010 by Vaguery
"We study the degree to which a character string, $Q$, leaks details about itself any time it engages in comparison protocols with a strings provided by a querier, Bob, even if those protocols are cryptographically guaranteed to produce no additional information other than the scores that assess the degree to which $Q$ matches strings offered by Bob. We show that such scenarios allow Bob to play variants of the game of Mastermind with $Q$ so as to learn the complete identity of $Q$."
mathematical-recreations
bioinformatics
algorithms
preprint
learning-by-doing
april 2010 by Vaguery
I patent your ass. And your leg. And your nostril. – Bad Science
april 2010 by Vaguery
"Then they tested their model against reality: in a giant computing task, they took all the 15-nucleotide sequences from the BRCA1 gene, and searched for them, just on chromosome 1: they found 340,000 matches, roughly the same as their theoretical prediction, and the equivalent of 14 infringing sequences on every human gene. The BRCA1 gene, incidentally, is on chromosome 17.
The claims in this patent therefore extend, if properly enforced, to almost every single gene, in every single person on the planet. There is a moral and practical argument to be had about patenting nature, but the rights conferred in this patent are basically absurd."
intellectual-property
biopatents
patent-abuse
genomics
bioinformatics
lawyers
expertise-as-a-weapon
The claims in this patent therefore extend, if properly enforced, to almost every single gene, in every single person on the planet. There is a moral and practical argument to be had about patenting nature, but the rights conferred in this patent are basically absurd."
april 2010 by Vaguery
[0712.4248] Computer algebra in systems biology
january 2008 by Vaguery
The reinvention of the Block Diagram?
bioinformatics
systems-biology
dynamics
biochemistry
january 2008 by Vaguery
Sequenomics
july 2007 by Vaguery
Thom LaBean and Erik Schultes start a science thing the three of us know lots about: directed combinatorial molecular design. Looking forward to seeing how they monetize expertise.
startups
Thom-LaBean
Erik-Schultes
sequenomics
science
molecular-design
combinatorial-libraries
biopolymers
bioinformatics
irrational-design
july 2007 by Vaguery
Sharing Detailed Research Data Is Associated with Increased Citation Rate : Nature Precedings
july 2007 by Vaguery
This may or may not be true each discipline; depends on their folkways.
academia
publishing
collaboration
citation
bibliography
research
impact
bioinformatics
july 2007 by Vaguery
related tags
academia ⊕ acceptance-testing ⊕ algorithms ⊕ analytics ⊕ automata ⊕ bacteria ⊕ bibliography ⊕ biochemistry ⊕ bioinformatics ⊖ biological-engineering ⊕ biopatents ⊕ biopolymers ⊕ CFP ⊕ citation ⊕ cladistics ⊕ classification ⊕ collaboration ⊕ combinatorial-libraries ⊕ combinatorics ⊕ community-assembly ⊕ complexology ⊕ compression ⊕ computational-complexity ⊕ computational-mechanics ⊕ crystallography ⊕ data-driven ⊕ data-mining ⊕ datasets ⊕ design-automation ⊕ discrete-mathematics ⊕ dynamics ⊕ ecoinformatics ⊕ ecology ⊕ emergent-design ⊕ epistasis ⊕ Erik-Schultes ⊕ expertise-as-a-weapon ⊕ firehose-drinking ⊕ genetic-regulatory-networks ⊕ genomics ⊕ healthcare ⊕ impact ⊕ inference ⊕ intellectual-property ⊕ irrational-design ⊕ it's-more-complicated-than-you-think ⊕ journals ⊕ lawyers ⊕ learning-by-doing ⊕ Levenshtein-distance ⊕ machine-learning ⊕ magazines ⊕ mathematical-recreations ⊕ mathematics ⊕ metaoptimization ⊕ microbiology ⊕ modeling ⊕ modeling-is-not-mathematics ⊕ models ⊕ molecular-design ⊕ MRSA ⊕ neutral-networks ⊕ nudge ⊕ nudge-targets ⊕ numerical-methods ⊕ open-access ⊕ patent-abuse ⊕ pattern-discovery ⊕ performance-measure ⊕ pharmaceutical ⊕ phenotype-genotype-stuff ⊕ polymer-models ⊕ preprint ⊕ protein-folding ⊕ publishing ⊕ R ⊕ research ⊕ rewriting-systems ⊕ RNA ⊕ RNA-folding ⊕ science ⊕ sequences ⊕ sequencing ⊕ sequenomics ⊕ software ⊕ Staph-aureus ⊕ startups ⊕ statistics ⊕ string-editing ⊕ strings ⊕ structural-biology ⊕ stuff-I-wish-we-had-20-years-ago-DAMMIT ⊕ systems-biology ⊕ target ⊕ technical ⊕ theoretical-biology ⊕ Thom-LaBean ⊕Copy this bookmark: