arthegall + research-article   578

Zhu, et al. "Inferring Taxi Status Using GPS Trajectories"
If I had a million free hours, I'd be thinking of spending some of them on applying this to MBTA buses ("Inferring bus fullness using GPS trajectories.") Maybe someday.
mbta  idea  gps  arxiv  research-article  inference 
3 days ago by arthegall
Van Durme, Lall, "Probabilistic Counting with Randomized Storage" ICJAI, 2009.
I like the "TOMB Counter" name -- this is a reasonably important technique, and this is the first place I've found it referenced in the literature.
probabilistic-methods  morris-counter  bloom-filters  research-article  big-data  to-re-read 
4 days ago by arthegall
Zhu, Schadt, et al. "Stitching together Multiple Data Dimensions Reveals Interacting Metabolomic and Transcriptomic Networks That Modulate Cell Regulation" (PLoS Biology)
Someone on twitter posted a link to this as an example of actual "data integration." Mmm... color me skeptical (here "data integration" reads to me as a synonym for "ad hoc systems biology,") but I need to read it more closely.
data-integration  bioinformatics  eric-schadt  plos-biology  research-article  systems-biology 
6 days ago by arthegall
Frasconi et al. "kLog: A Language for Logical and Relational Learning with Kernels" (arXiv)
Also reading this morning -- unforch, the refs in their PDF are messed up, and their source-code is missing some import, so I can't recompile it (from .tex source) unaided. GRrrrr.
machinelearning  arxiv  research-article  learning  programminglanguage  to-read 
9 days ago by arthegall
DROPS - Algorithmic Differentiation Through Automatic Graph Elimination Ordering (ADTAGEO)
"Algorithmic Differentiation Through Automatic Graph Elimination Ordering (ADTAGEO) is based on the principle of Instant Elimination: At runtime we dynamically maintain a DAG representing only active variables that are alive at any time. Whenever an active variable is deallocated or its value is overwritten the corresponding vertex in the Live-DAG will be eliminated immediately by the well known vertex elimination rule [1]. Consequently, the total memory requirement is equal to that of the sparse forward mode."
automatic-differentiation  research-article  numerical-methods  algorithms 
9 days ago by arthegall
Need, Shashi et al. "Clinical application of exome sequencing in undiagnosed genetic conditions," Journal of Medical Genetics (2012)
"This study provides evidence that next-generation sequencing can have high success rates in a clinical setting, but also highlights key challenges. It further suggests that the presentation of known Mendelian conditions may be considerably broader than currently recognised." -- Looks like the authors are from Duke Medical School...
research-article  sequencing  exomes  clinical-genetics  biology  genetics  genomics  mendelian-disease 
11 days ago by arthegall
Fiskerstrand et al. "Familial Diarrhea Syndrome Caused by an Activating GUCY2C Mutation" NEJM (2012)
"Other causes include inflammatory bowel disease, infections, paraneoplastic hormones, celiac disease, malabsorption syndromes, and bacterial overgrowth in the small intestine. In addition to organic causes, psychological factors have an important effect on bowel function."
celiac-disease  genetics  disease  research-article  medicine  nejm 
13 days ago by arthegall
Stahl etl al. "Bayesian inference analyses of the polygenic architecture of rheumatoid arthritis," Nature Genetics (2012)
"Our results are consistent with simulated genetic models in which hundreds of associated loci harbor common causal variants and a smaller number of loci harbor multiple rare causal variants. These analyses suggest that GWAS will continue to be highly productive for the discovery of additional susceptibility loci for common diseases."
gwas  genetic-testing  arthritis  genetics  genomics  research-article  nature-genetics 
13 days ago by arthegall
Jones, Ruzzo, Peng, Katze, "A new approach to bias correction in RNA-Seq" (Bioinformatics, 2012)
"We present a new method to measure and correct for these influences using a simple graphical model. Our model does not rely on existing gene annotations, and model selection is performed automatically making it applicable with few assumptions."
bioinformatics  research-article  rna-seq  bias-correction  graphical-models  normalization 
16 days ago by arthegall
Schadt, Woo, and Hao, "Bayesian method to predict individual SNP genotypes from gene expression data"
"We show that, not only can genotypic barcodes be derived from individual gene expression data sets that can in turn be used to identify the genotype vector corresponding to the given individual, but such barcodes also have the potential to reliably identify first-degree relatives and even reconstruct nuclear pedigrees directly from gene expression data." -- I think that the logical possibility of doing this is pretty obvious to anyone who's thought this data through (and there are a *lot* more examples of similar kinds of re-identification possible out there), but it's still awesome to see it actually carried out and demonstrated on real data.
genomics  eric-schadt  privacy  genotype  research-article  bioinformatics  data 
6 weeks ago by arthegall
[1203.2570] Differential Privacy for Functions and Functional Data
"Previous work has focused mainly on methods for which the output is a finite dimensional vector, or an element of some discrete set. We develop methods for releasing functions while preserving differential privacy. Specifically, we show that adding an appropriate Gaussian process to the function of interest yields differential privacy."
larry-wasserman  differential-privacy  databases  arxiv  research-article  gaussian-processes 
7 weeks ago by arthegall
Genome Medicine | Full text | Locus Reference Genomic sequences: an improved basis for describing human DNA variants.
One of the original papers on LRG-- check out boxes 1-3, which provide great canned use-cases for variant naming and coordinate/version handling.
genomics  bioinformatics  locus-reference-genomic  research-article  sequence-variants  data-management 
8 weeks ago by arthegall
Wang, Hellerstein, et al. "BayesStore: managing large, uncertain data repositories with probabilistic graphical models."
Making me think about the old idea of using probabilistic relational models as a way of compressing a database-- shipping the graphical model across the network for distributed (noisy) joins.
database  graphical-models  prms  sql  idea  joe-hellerstein  research-article 
8 weeks ago by arthegall
[1203.0697] Learning High-Dimensional Mixtures of Graphical Models
"We now propose a method for learning the mixture components given n i.i.d. samples y_n
drawn from a graphical mixture model P(y). Our method proceeds in two stages. First, we estimate the graph G_∪ := U_{r}^{h=1} G_h, which is the union of the Markov graphs of the mixture. This is accomplished via a series of rank tests. Note that in the special case when G_h ≡ G_∪, this also gives the graph estimates of the component models. We then use the graph estimate hat{G}_∪ to obtain the pairwise marginals of the respective mixture components via a spectral decomposition method. Finally, we use the Chow-Liu algorithm to obtain tree approximations {T_h}_h of the individual mixture components." -- To do: review how this works in the context of gene expression experiments for transcription factor regulatory relationships, which are (presumably) mixtures of a couple different underlying models or modes.
gene-expression  bioinformatics  research-article  arxiv  via:cshalizi  graphical-models  mixture-models  machinelearning 
11 weeks ago by arthegall
PLoS ONE: Transcriptomic Analysis of Toxoplasma Development Reveals Many Novel Functions and Structures Specific to Sporozoites and Oocysts
"A single felid host is capable of shedding millions of oocysts, which can survive for years in the environment, are resistant to most methods of microbial inactivation during water-treatment and are capable of producing infection in warm-blooded hosts at doses as low as 1–10 ingested oocysts." --- Shudder.
toxoplasma-gondii  feline-behavior  cat-person  plos-one  research-article  biology  genomics  awful 
february 2012 by arthegall
Zhi, Chen "Statistical Guidance for Experimental Design and Data Analysis of Mutation Detection in Rare Monogenic Mendelian Diseases by Exome Sequencing" (PLoS ONE)
"... we present a statistical modeling framework to calculate the power, the probability of identifying truly disease-causing genes, under various inheritance models and experimental conditions, providing guidance for both proper experimental design and data analysis."
sequencing  plos  research-article  genomics  mendelian-disease  statistics  to-read 
february 2012 by arthegall
[1112.6045] Comparing intermittency and network measurements of words and their dependency on authorship
Other generic text features that can be used to determine authorship. 5th-grade science-fair project on steroids.
clustering  machinelearning  writing  authorship  classification  arxiv  research-article  nlp 
january 2012 by arthegall
[1104.1605] Efficient Top-K Retrieval in Online Social Tagging Networks
"We first consider a key aspect of the problem, which is accessing the closest or most relevant users for a given seeker. We describe how this can be done on the fly (without any pre-computations) for several possible choices - arguably the most natural ones - of proximity computation in a user network."
social-networks  tagging  folksonomies  research-article  arxiv 
december 2011 by arthegall
Staley, "Using Inferential Robustness to Establish the Security of an Evidence Claim"
"I argue that robustness can be understood as a means of establishing the partial security of evidence claims." -- maybe could have some use, in these Watson-style many-paths-to-the-same-fact inference engines, often built on top of triple stores.
inference  epistemology  research-article  philosophy  watson  belief  knowledge 
december 2011 by arthegall
[1011.5287] Distributed Storage Allocations
"By using an appropriate code, successful recovery can be achieved whenever the total amount of data accessed is at least the size of the original data object. The goal is to find an optimal storage allocation that maximizes the probability of successful recovery."
coding  arxiv  research-article  genomics  idea  storage  data 
december 2011 by arthegall
Goodrich, Mitzenmacher, "Invertible Bloom Lookup Tables"
"We present a version of the Bloom filter data structure that supports not only the insertion, deletion, and lookup of key-value pairs, but also allows a complete listing of its contents with high probability, as long the number of key-value pairs is below a designed threshold."
arxiv  research-article  computerscience  bloom-filters  data-structures  invertible-bloom-lookup-tables 
december 2011 by arthegall
GrOWL: A tool for visualization and editing of OWL ontologies
Found this in a compilation of Semantic Web papers -- feel like they get about 70% of the idea correct. Their notation for intersections and unions is wrong, and they don't have any notation for Everything, Nothing, or disjoint sets... but it's definitely a step in teh right direction.
owl  ontologies  visualization  semanticweb  research-article 
november 2011 by arthegall
A probability-based approach for the analysis of large-scale RNAi screens - Nature Methods
The "RSA" method for identifying genes form highthroughput siRNA screens. Basically, ranking plus some ad hoc p-value multiplications. <gACK>
bioinformatics  pharma  research-article  hts-screening  nature-publishing  rnai  doing-terrible-things-with-p-values 
september 2011 by arthegall
Hopkins, Groom, "The druggable genome" : Article : Nature Reviews Drug Discovery
(doi:10.1038/nrd892) People keep talking about this paper at work -- and Hopkins gave one of the plenaries at IDD this year. I think what this is, is just mapping a lot of the intermediate data that's useful for drug discovery down on to the "genomic coordinate system," and making it available in one location for browsing. Not sure how useful that really is; does it omit something key?
research-article  drug-discovery  genomics  science  annotation  nature  review 
may 2011 by arthegall
Assessing the significance of chromosomal aberrations in cancer: Methodology and application to glioma
The paper that develops the GISTIC scoring method for genomic variation in cancer. This is showing up more and more frequently in some of what I'm dealing with at work, so time to read up on it...
gistic  cancer  genomics  research-article  genetic-variation  genetics 
may 2011 by arthegall
PLoS ONE: Abundant Human DNA Contamination Identified in Non-Primate Genome Databases
If the contamination is from the researchers who collected or manipulated the samples, does that mean that we should be having them sign medical consent forms before publishing their data in public repositories? I'm only half-joking.
consent  bioinformatics  genomics  dna  contamination  plos  research-article 
february 2011 by arthegall
Groth, Gibson, and Velterop, "The Anatomy of a Nanopublication"
This is basically SWAN, but five years after the fact -- but drives home the point of how much fo a missed opportunity SWAN really was.
swan  publishing  rdf  semanticweb  science  nanopublications  research-article 
january 2011 by arthegall
« earlier      

related tags

5-htt  accomplishment  acm  active-learning  adaptive-sampling  adolescence  affinity-propagation  age  aggregation  aging  aic  alan-willsky  algebra  algebraic-datatypes  algebraic-geometry  algorithm  algorithms  alignment  alloy  alzheimers  ancestry  andrew-gelman  andrew-ng  animation  annotation  anntotation  anonymity  answer  antibodies  antisense-expression  apolipoprotein  app  approximation  approximation-algorithms  arbitrage  architecture  argument  arithmetic-coding  arthritis  artificial-intelligence  artxiv  arvix  arxiv  asarwate  assembly  attribute-grammars  authorship  autism  automatic-differentiation  aviv-regev  awful  axel-polleres  bacteria  bam-file-format  banking  barcodes  baseball  bayes  bayesian-methods  bayesian-networks  bayesian-probability  baysian-methods  belief  belief-propagation  best-of  beta-process  bezier-curves  bias  bias-correction  big-data  biking  bioinformatics  biological-simulation  biology  biomedical-research  biosecurity  blockmodeling  blog  bloom-filters  blot-logic  book-chapter  bookmarks  boosting  bounds  bowtie  bradley-efron  brain-science  branch-and-bound  brightness  broad  broad-institute  building  burrows-wheeler  business  by:cshalizi  by:judea-pearl  calculus  cancer  cap-and-trade  casuality  cat-person  category-theory  causal-networks  causality  celiac-disease  cell  cell-populations  cell-types  cellular-automata  change-point  change-points  channel-coding  chemistry  chip  chip-chip  chip-seq  chipchip  christina-roemer  chromatin  chromatin-immunoprecipitation  chromosomal-conformation  circle-packing  citation  citations  citeseer  cito  claire-monteleoni  classification  classifiers  cle  climate-change  clinical-discovery  clinical-genetics  closest-pairs  clustering  coding  coding-theory  cognitive_science  coinduction  coins  color  colors  combinatorial-regulation  combinatorics  combinators  communication  communities  compilation  compiler  compilers  complex-analysis  complexity  compressed-sensing  compression  computability  computation  computational-complexity  computational-geometry  computational-linguistics  computational-music  computational-structure  computer-algebra  computer-vision  computers  computerscience  computervision  computing  concurrency  conditional-probability  congestion  consent  conservation  constraint-satisfaction  contamination  contingency-tables  continuations  control  control-theory  conversation  convex-optimization  convex-programming  copula  coq  coroutines  correlation  cosma-shalizi  cost-benefit-analysis  counterfeit-coin-problem  counterfeiting  counting  credit-default  credit-derivatives  crime  cssr  cultural-ratchet-effect  culture  cycle-basis  cyp450  daniel-jackson  daniel-roy  data  data-analysis  data-cube  data-integration  data-management  data-mining  data-munging  data-processing-language  data-structure  data-structures  data-warehouse  database  databases  datalog  datamining  datastream  david-haussler  david-li  david-mumford  david-pennock  david-shotton  david-wolpert  deborah-mayo  debugging  decision-making  decision-theory  degrees-of-freedom  delay  delicious  democracy  dempster-shafer  density-estimation  dependent-types  depression  derivatives  description-logic  description-logic-programs  design  detection  development  diabetes  diagrams  differential-equations  differential-expression  differential-geometry  differential-privacy  differentiation  diffusion  digit-ratio  direct-effects  directors  dirichlet-process  dirichlet-processes  dirichlet_processes  discovery  discriminative-training  disease  distributed  distributed-computing  distributed-systems  divergence  dna  documents  doing-terrible-things-with-p-values  don-knuth  draft  driving  drosophila  drug-design  drug-discovery  drug-sensitivity  drugs  duncan-watts  dynamics  ecology  econometrics  economics  edit-distance  effects-systems  efficiency  elections  electronic-medical-records  elicited-models  email  encryption  energy  entrepreneur  enumeration  environment  envy  epidemiology  epistemology  epitope  eric-schadt  eric-xing  estimation  ethnicity  eukaryotes  evaluation  evolution  ewas  exomes  experiment  experimental-design  experimental-science  experimentation  exponential-distributions  exponential-families  expression  expression-analysis  expression-from-sequence  factor-models  family  feature-selection  features  feline-behavior  figures  file-format  film  finance  financial  financial-advice  financial-engineering  flynn-effect  fmri  folksonomies  folksonomy  food  FOPL  foreignpolicy  formal-methods  forth  fountain-codes  frames  free-energy  friends  functional-rna  functionalprogramming  game-theory  games  gary-king  gattaca  gaussian-elimination  gaussian-processes  gaussianprocesses  gender  gene-expression  gene-regulation  generalized-linear-models  genetic-association-studies  genetic-testing  genetic-variation  genetics  genius  genomic-variation  genomics  genotype  geometry  geometry-of-data  geotagging  germany  gestalt-theory  gian-carlo-rota  gibbs-sampling  gillette  gina  gistic  google  government  gps  gradient-ascent  gradient-descent  graduate-school  grammar  grammars  graph  graph-algorithms  graph-layout  graph-theory  graphical-models  graphics  graphs  group-theory  growth  gwas  happiness  hardware  harr-chen  hashing  haskell  health  health-care  health-insurance  health-outcomes  heavy-tailed-distributions  herbrands-theorem  heredity  heritability  hidden-markov-models  histograms  historical-inference  history  hiv  hmms  horn-clauses  hospitals  hox-genes  hts-screening  huffman-codes  human  humor  hybrids  hypothesis-testing  icml  idea  identifiability  ieee  igem  image  image-analysis  images  independence  index  indian-buffet-process  inequality  infection  inference  informatics  information  information-markets  information-theory  infotheory  innovation  inside-joke  integration  intellectual-property  intelligence  intelligent-design  internet  interpretation  interpreter  interval-algebra  intervals  invasive-species  invertible-bloom-lookup-tables  ips-reprogramming  iq  irods  iterative-methods  jason-rennie  javascript  jbd  jim-gray  jim-watson  jmlr  joe-hellerstein  john-guttag  john-hopcroft  journal  journal-article  joyal  jstor  judea-pearl  jun-liu  just-a-though  kernel-methods  knowledge  knowledge-base  knowledge-transmission  kullback-leibler-distance  lactase  lambda-calculus  language  larry-wasserman  lasso  latent-variables  law  layout  lda  learning  learning-theory  least-squares  legal  leslie-valiant  lhc  lifted-inference  limits  linear-algebra  linear-logic  linear-programming  linear-regression  lingpipe  linguistics  link  list  load-balancing  local-linear-embedding  localization  locus-reference-genomic  logic  logic-programming  logistic-regression  long-tail  loop-calculus  luc-devroye  machine-learning  machinelearning  machinlearning  macsyma  mallows-models  mammals  manimals  mapping  maps  marcel-brun  markets  markov-chain  markov-chains  markov-models  markov-random-fields  marriage  martingales  marvin-minsky  masonry  matching  mathematics  matrices  matrix-representations  mbta  mcmc  mcmc-sampling  measurement  media-lab  medication  medicine  memory  mendelian-disease  message-passing  metadata  metagenomics  methamphetamine  methylation  mice  michael-bernstein  michael-jordan  michael-mitzenmacher  micro-rna  microarray  microarray-analysis  microarrays  milk  mind-reading  minimum-linear-arrangement  minsky-moment  missing-values  mit  mixture-models  model-checking  model-selection  model-theory  modeling  monads  money  morris-counter  mortality  motifs  mouse  mrna  mucin  multilevel-modeling  multiple-testing  music  mutation  naive-bayes  nanopublications  nati-srebro  nature  nature-genetics  nature-publishing  nber  ncbi  ncrna  nejm  netflix  netlog  network  network-models  networking  networks  neural-networks  neurobiology  neurology  neuronal-networks  neurons  neuroscience  nips  nlp  nmd  nnmf  nocoding-rna  noncoding-rna  nonparametric-methods  nonparametric-statistics  normalization  northern-blot  notation  nothing-new-under-the-sun  novartis  nudged  numbers  numeracy  numerical-methods  numerical-techniques  obscurely-referential  ocd  ode  odr  olap  oncology  online-algorithm  online-algorithms  online-optimization  ontologies  ontology  optimization  option-pricing  organization  oscillation  osdi  owl  p2p  pagerank  pain  paleogenomics  pandemic  paper  papers  paraccel  parallel  parallel-computing  parameters  parasites  parsing  partially-ordered-sets  partitions  partsregistry  patents  path-queries  pathogens  pathways  patients-like-me  patterns  pca  pdf  pebbling-game  peccoud  pegasos  perception  permanent  permutations  persi-diaconis  personal  personal-genomics  personalized-medicine  perturbation  pet-ideas  pets  pharma  pharmaceuticals  phenotype  phil-lord  philosophy  philosophy-of-science  photography  photoshop  phylogenetics  physics  pi-calculus  plant-biology  plos  plos-biology  plos-one  pnas  poisson-process  pol2  policy  political-science  politics  polling  polymorphictypes  polymorphism  pooling  population-effects  population-genetics  population-graphs  portfolio  positive-law  prediction  prediction-markets  prefix-matching  preprint  presentation  pricing  privacy  prms  probabilistic-methods  probabilistic-models  probabilistic-relational-models  probability  programming  programminglanguage  programminglanguages  project  proof-nets  proof-theory  proofs  protein  protein-structure  proteins  proteomics  provenance  psychiatry  psychology  psychometrics  public-health  publishing  pubmed  pun  put-call-parity  pyschiatry  pyschology  query  query-language  query-optimization  question  questions-of-fraud  r-programming  rabies  race  random-projections  random-walks  randomization  randy-rettberg  rank-tests  ranking  rationality  ray-tracing  razors-and-blades  rdf  reading  reasoning  rediscovering-what-chl-already-knew  regression  regression-trees  regret  regulation  regulatory-networks  relational-algebra  relational-data  rendering  repressilator  reproducible-research  research  research-article  reverse-engineering  review  review-article  rewrite-rules  rewriting-systems  richard-fateman  risk  risk-aversion  rna  rna-seq  rnai  robert-harper  robert-lucas  routing  row-reduction  rudolf-jaenisch  saeed-tavazoie  safety  salil-vadhan  samping  sampling  samtools  sanger-center  sbol  scfgs  schadenfreude  science  search  search-engines  security  segmentation  selection  self-avoiding-walk  semantic-matching  semantics  semanticweb  sensor-networks  sequence-analysis  sequence-assembly  sequence-variants  sequencing  sequent-calculus  serotonin  set-functions  seuss  shakespeare  shaving  short-reads  siam  siggraph  sigmod  signal-processing  signaling  simulation  sketches  slavery  slc6a4  sleep  smoothing  snp  soccer  social  social-capital  social-networks  social-science  sociology  software  solexa  sparse-coding  sparse-regression  sparsity  spatial-data  spatial-reasoning  spectral-graph-theory  spending  sports  spreadsheet  sql  ssrn  staffing  state-space-models  statistical-mechanics  statistics  statnet  steins-method  stem-cells  stemcells  stochastic-gradient-descent  stochastic-processes  storage  strange  streaming  string-algorithms  string-processing  strings  structural-variation  structure  structured-data  stuart-russel  suffix-trees  support-vector-machines  supreme-court  susanne-lohmann  svm  swan  swine-flu  sybil-attacks  symbolic-integration  symbolic-methods  synthetic-biology  systems  systems-biology  tag-transmission  tagging  tags  taxation  taxes  teaching  technical-report  technology  temporal-constraint-networks  temporal-data  temporal-difference-learning  terence-speed  term-rewriting-system  terrorism  testing  testosterone  text  theoretical-biology  theory  thesis  tiling  time-series  timing-effects  to-re-read  to-read  to-think-through  to:read  tommi-jaakkola  topic-models  topology  toxins  toxoplasma-gondii  trading  traffic  transcription  transcriptional-regulation  transmission  transportation  transposable-elements  trees  triples  trivia  tropical-disease  tumor  type-systems  typetheory  typography  uai  ultimatum-game  uncertainty  unemployment  united-states  universal-hashing  university  unsupervised-learning  us-code  utility  vaccination  variable-selection  vaults  vector-graphics  via:?  via:alex-tabarrok  via:andrew-samwick  via:arolfe  via:arsyed  via:bryan  via:chl  via:creeder  via:csantos  via:cshalizi  via:cshalizi?  via:ded_maxim  via:domke  via:geopapa  via:in-the-pipeline  via:inkdroid  via:jar  via:johnsnavely  via:kjhealy  via:krugman  via:marcua  via:monkey-cage  via:probably-cshalizi  via:rdowell  via:shivak  via:sjcockell  via:vaguery  via:WanderingAengus  virginia-tech  virtual-machine  vision  visualization  vldb  vocabulary  von-neumann  voting  vultures  war  watson  web  wildlife  word-stress  wordnet  work  workflow  writing  xml  xquery  yahoo  yeast  z  zombies  zoubin-ghahramani 

Copy this bookmark:



description:


tags: