cshalizi + re:aos_project   70

[1204.2477] A Simple Explanation of A Spectral Algorithm for Learning Hidden Markov Models
"A simple linear algebraic explanation of the algorithm in "A Spectral Algorithm for Learning Hidden Markov Models" (COLT 2009). Most of the content is in Figure 2; the text just makes everything precise in four nearly-trivial claims."
to:NB  to_read  statistics  markov_models  re:AoS_project  spectral_methods 
6 weeks ago by cshalizi
[1203.6130] Spectral dimensionality reduction for HMMs
"Hidden Markov Models (HMMs) can be accurately approximated using co-occurrence frequencies of pairs and triples of observations by using a fast spectral method in contrast to the usual slow methods like EM or Gibbs sampling. We provide a new spectral method which significantly reduces the number of model parameters that need to be estimated, and generates a sample complexity that does not depend on the size of the observation vocabulary. We present an elementary proof giving bounds on the relative accuracy of probability estimates from our model. (Correlaries show our bounds can be weakened to provide either L1 bounds or KL bounds which provide easier direct comparisons to previous work.) Our theorem uses conditions that are checkable from the data, instead of putting conditions on the unobservable Markov transition matrix."
to:NB  to_read  markov_models  statistics  machine_learning  dimension_reduction  re:AoS_project  spectral_clustering 
8 weeks ago by cshalizi
Henze : A Multivariate Two-Sample Test Based on the Number of Nearest Neighbor Type Coincidences
"For independent $d$-variate random samples $X_1, cdots, X_{n_1}$ i.i.d. $f(x), Y_1, cdots, Y_{n_2}$ i.i.d. $g(x)$, where the densities $f$ and $g$ are assumed to be continuous a.e., consider the number $T$ of all $k$ nearest neighbor comparisons in which observations and their neighbors belong to the same sample. We show that, if $f = g$ a.e., the limiting (normal) distribution of $T$, as $min(n_1, n_2) rightarrow infty, n_1/(n_1 + n_2) rightarrow tau, 0 < tau < 1$, does not depend on $f$. An omnibus procedure for testing the hypothesis $H_0: f = g$ a.e. is obtained by rejecting $H_0$ for large values of $T$. The result applies to a general distance (generated by a norm on $mathbb{R}^d$) for determining nearest neighbors, and it generalizes to the multisample situation."
to:NB  to_read  statistics  hypothesis_testing  two-sample_tests  re:AoS_project 
february 2012 by cshalizi
[1202.1523] Information Forests
"We describe Information Forests, an approach to classification that generalizes Random Forests by replacing the splitting criterion of non-leaf nodes from a discriminative one -- based on the entropy of the label distribution -- to a generative one -- based on maximizing the information divergence between the class-conditional distributions in the resulting partitions. The basic idea consists of deferring classification until a measure of "classification confidence" is sufficiently high, and instead breaking down the data so as to maximize this measure. In an alternative interpretation, Information Forests attempt to partition the data into subsets that are "as informative as possible" for the purpose of the task, which is to classify the data. Classification confidence, or informative content of the subsets, is quantified by the Information Divergence. Our approach relates to active learning, semi-supervised learning, mixed generative/discriminative learning."

After reading: meh.
have_read  decision_trees  information_theory  classifiers  machine_learning  to_teach:data-mining  re:AoS_project 
february 2012 by cshalizi
[1201.2334] Universal Estimation of Directed Information
"We propose four approaches to estimating the directed information rate between a pair of jointly stationary ergodic processes with the help of universal probability assignments. The four approaches yield estimators with different merits such as nonnegativity and boundedness. We establish consistency of these estimators in various senses and derive near-optimal rates of convergence in the minimax sense under mild conditions. The estimators carry over directly to estimating other information measures of stationary ergodic processes, such as entropy rate and mutual information rate, and provide alternatives to classical approaches in the existing literature. Guided by the theoretical results, we use context tree weighting as the vehicle for the implementations of the proposed estimators. Experiments on synthetic and real data are presented, demonstrating the potential of the proposed schemes in practice and the efficacy of directed information estimation as a tool for detecting and measuring causality and delay."
in_NB  to_read  information_theory  entropy_estimation  directed_information  stochastic_processes  nonparametrics  statistics  re:AoS_project 
january 2012 by cshalizi
[1201.2056] Adaptive Context Tree Weighting
"We describe an adaptive context tree weighting (ACTW) algorithm, as an extension to the standard context tree weighting (CTW) algorithm. Unlike the standard CTW algorithm, which weights all observations equally regardless of the depth, ACTW gives increasing weight to more recent observations, aiming to improve performance in cases where the input sequence is from a non-stationary distribution. Data compression results show ACTW variants improving over CTW on merged files from standard compression benchmark tests while never being significantly worse on any individual file."
to:NB  information_theory  non-stationarity  markov_models  stochastic_processes  re:AoS_project 
january 2012 by cshalizi
[1112.3257] Exact Computation of Kullback-Leibler Distance for Hidden Markov Trees and Models
"We suggest new recursive formulas to compute the exact value of the Kullback-Leibler distance (KLD) between two general Hidden Markov Trees (HMTs). For homogeneous HMTs with regular topology, such as homogeneous Hidden Markov Models (HMMs), we obtain a closed-form expression for the KLD when no evidence is given. We generalize our recursive formulas to the case of HMMs conditioned on the observable variables. Our proposed formulas are validated through several numerical examples in which we compare the exact KLD value with Monte Carlo estimations."
to:NB  to_read  re:AoS_project  markov_models  stochastic_processes  information_theory 
december 2011 by cshalizi
[1111.3182] Context Tree Switching
"This paper describes the Context Tree Switching technique, a modification of Context Tree Weighting for the prediction of binary, stationary, n-Markov sources. By modifying Context Tree Weighting's recursive weighting scheme, it is possible to mix over a strictly larger class of models without increasing the asymptotic time or space complexity of the original algorithm. We prove that this generalization preserves the desirable theoretical properties of Context Tree Weighting on stationary n-Markov sources, and show empirically that this new technique leads to consistent improvements over Context Tree Weighting as measured on the Calgary Corpus."
to:NB  re:AoS_project  markov_models 
november 2011 by cshalizi
[1111.1177] Partially observed Markov random fields are variable neighborhood random fields
"The present paper has two goals. First to present a natural example of a new class of random fields which are the variable neighborhood random fields. The example we consider is a partially observed nearest neighbor binary Markov random field. The second goal is to establish sufficient conditions ensuring that the variable neighborhoods are almost surely finite. We discuss the relationship between the almost sure finiteness of the interaction neighborhoods and the presence/absence of phase transition of the underlying Markov random field. In the case where the underlying random field has no phase transition we show that the finiteness of neighborhoods depends on a specific relation between the noise level and the Dobrushin coefficient. The case in which there is phase transition is addressed in the frame of the ferromagnetic Ising model. We prove that the existence of infinite interaction neighborhoods depends on the phase. The first result has a probabilistic proof using a Kalikow type decomposition of a Glauber dynamics associated to the field. The second result is proved using cluster expansion."
to:NB  to_read  markov_models  random_fields  phase_transitions  re:AoS_project  stochastic_processes 
november 2011 by cshalizi
[1110.6530] Attractive regular stochastic chains: perfect simulation and phase transition
"We prove that uniqueness of the stationary chain compatible with an attractive regular probability kernel is equivalent to the following two assertions for this chain: (1) it is a finitary coding of an i.i.d. process with discrete state space, (2) the concentration of measure holds at exponential rate. We show in particular that if a stationary chain is uniquely defined by a kernel which is continuous and attractive, then this chain can be sampled using a coupling-from-the-past algorithm. For the original Bramson-Kalikow model we further prove that there exists a unique compatible chain if and only if the chain is a finitary coding of a finite alphabet i.i.d. process. Finally, we obtain some partial results on conditions for phase transition for general chains of infinite order."
to:NB  to_read  stochastic_processes  re:AoS_project 
november 2011 by cshalizi
[1108.3984] Process Dimension of Classical and Non-Commutative Processes
"We treat observable operator models (OOM) and their non-commutative generalisation, which we call NC-OOMs. A natural characteristic of a stochastic process in the context of classical OOM theory is the process dimension. We investigate its properties within the more general formulation, which allows to consider process dimension as a measure of complexity of non-commutative processes: We prove lower semi-continuity, and derive an ergodic decomposition formula. Further, we obtain results on the close relationship between the canonical OOM and the concept of causal states which underlies the definition of statistical complexity. In particular, the topological statistical complexity, i.e. the logarithm of the number of causal states, turns out to be an upper bound to the logarithm of process dimension."
complexity_measures  predictive_state_representations  observable_operator_models  ay.nihat  lohr.wolfgang  kith_and_kin  to_read  re:AoS_project  to:NB 
august 2011 by cshalizi
[1107.4353] Upper Bounds for Markov Approximations of Ergodic Processes
"Chains of infinite order are generalizations of Markov chains that constitute a flexible class of models in the general theory of stochastic processes. These processes can be naturally studied using approximating Markov chains. Here we derive new upper bounds for the $bar{d}$-distance and the K"ullback-Leibler divergence between chains of infinite order and their canonical $k$-step Markov approximations. In contrast to the bounds available in the literature our results apply to chains of infinite order compatible with general classes of probability kernels. In particular, we allow kernels with discontinuities and null transition probabilities.""  (Pedantry: Pretty sure Kullback did not spell his name with an umlaut!)
markov_models  stochastic_processes  re:AoS_project  to_read  in_NB  approximation  re:your_favorite_dsge_sucks 
july 2011 by cshalizi
[1011.2624] Clustering using Unsupervised Binary Trees: CUBT
"We introduce a new clustering method based on unsupervised binary trees. It is a three stages procedure, which performs on a first stage recursive binary splits reducing the heterogeneity of the data within the new subsamples. On the second stage (pruning) adjacent nodes are considered to be aggregated. Finally, on the third stage (joining) similar clusters are joined even if they do not descend from the same node. Consistency results are obtained and the procedure is tested on simulated and real data sets"
clustering  re:AoS_project 
december 2010 by cshalizi
Reduced-Rank Hidden Markov Models
"Hsu et al.(2009) recently proposed an efficient, accurate spectral learning algorithm for Hidden Markov Models (HMMs). In this paper we relax their assumptions and prove a tighter finite-sample error bound for the case of Reduced-Rank HMMs, i.e., HMMs with low-rank transition matrices. Since rank-k RR-HMMs are a larger class of models than k-state HMMs while being equally efficient to work with, this relaxation greatly increases the learning algorithm's scope. In addition, we generalize the algorithm and bounds to models where multiple observations are needed to disambiguate state, and to models that emit multivariate real-valued observations. Finally we prove consistency for learning Predictive State Representations, an even larger class of models. Experiments on synthetic data and a toy video, as well as on difficult robot vision data, yield accurate models that compare favorably with alternatives in simulation quality and prediction accuracy."
markov_models  statistical_inference_for_stochastic_processes  re:AoS_project  to_read 
august 2010 by cshalizi
IEEE Xplore - On Rate of Convergence of Statistical Estimation of Stationary Ergodic Processes
Growing-order Markov approximations to the non-Markov ergodic process.  Assumes all predictive probabilities are > 0, and a sort of continuity property for the predictive distribution.  This rules out many interesting processes (e.g., even process) --- could proofs be modified?
statistical_inference_for_stochastic_processes  ergodic_theory  re:AoS_project  have_read  csiszar.imre 
july 2010 by cshalizi
[1007.2986] Variable length Markov chains and dynamical sources
???? How is a VLMC not a source by definition? I can't tell what on earth they're doing from their abstract...
markov_models  re:AoS_project  to_read  to_be_shot_after_a_fair_trial  stochastic_processes 
july 2010 by cshalizi
[1005.5459] Perfect simulation for stochastic chains of infinite memory: relaxing the continuity assumption
"This paper is composed of two main results concerning chains of infinite order which are not necessarily continuous. The first one is a decomposition of the transition probability kernel as a countable mixture of unbounded probabilistic context trees. This decomposition is used to design a simulation algorithm which works as a combination of the algorithms given by Comets et al. (2002) and Gallo (2009). The second main result gives sufficient conditions on the kernel for this algorithm to stop after an almost surely finite number of steps. Direct consequences of this last result are existence and uniqueness of the stationary chain compatible with the kernel."
re:AoS_project  synchronizing_words  markov_models  to_read 
june 2010 by cshalizi
Ehm, Kornmeier, Heinrich: Multiple testing along a tree
"Suitable sequentially rejective multiple test procedures allow to “zoom in" on clusters of relevant variables in high-dimensional regression (Meinshausen [7]), or on regions of interest in some search space (Heinrich et al. [3]; Meinshausen et al. [8]). As a common framework for these schemes we propose to consider multiple testing along a tree of hypotheses together with a “keep rejecting until first acceptance" rule. Particular topics addressed in this note are control of the familywise error, and some variants and basic properties of the procedure."
multiple_testing  hypothesis_testing  model_selection  re:AoS_project  to_read 
may 2010 by cshalizi
[0901.4063] A New Approach to Equations with Memory
"In this work, we present a novel approach to the mathematical analysis of equations with memory based on the notion of a state, namely, the initial configuration of the system which can be unambiguously determined by the knowledge of the future dynamics. "
dynamical_systems  state-space_models  to_read  re:AoS_project 
may 2010 by cshalizi
Perfect Simulation of Infinite Range Gibbs Measures and Coupling with Their Finite Range Approximations
"address the questions of perfectly sampling a Gibbs measure with infinite range interactions and of perfectly sampling the measure together with its finite range approximations. We solve these questions by introducing a perfect simulation algorithm for the measure and for the coupled measures. The algorithm works for general Gibbsian interaction under requirements on the tails of the interaction. As a consequence we obtain an upper bound for the error we make when sampling from a finite range approximation instead of the true infinite range measure."
statistical_mechanics  monte_carlo  gibbs_distributions  interacting_particle_systems  markov_models  chains_with_complete_connections  in_NB  re:AoS_project  coupling  convergence_of_stochastic_processes 
march 2010 by cshalizi
[0912.4883] On Finding Predictors for Arbitrary Families of Processes
" A sequence $x_1,...,x_n,...$ of discrete-valued observations is generated according to some unknown [measure] $\mu$. After observing each outcome, ... give the conditional probabilities of the next observation. ... $\mu$ [is in] an arbitrary but known class $C$ of stochastic process measures. We [want] predictors ... whose conditional probabilities converge (in some sense) to the [true] conditional probabilities if any $\mu\in C$ is chosen to generate the sequence. ... [C]haracteriz[e] the families $C$ for which such predictors exist ... a specific and simple form in which to look for a solution. ... if any predictor works, then there exists a Bayesian predictor, whose prior is discrete, and which works too. .... sufficient and necessary conditions for the existence of a predictor, in terms of topological characterizations of the family $C$, as well as in terms of local behaviour of the measures in $C$, which in some cases lead to procedures for constructing such predictors."
prediction  universal_prediction  stochastic_processes  statistical_inference_for_stochastic_processes  statistics  re:AoS_project 
december 2009 by cshalizi
[0912.4480] Consistency of the Maximum Likelihood Estimator for general hidden Markov models
"a parametrized family of general hidden Markov models, where both the observed and unobserved components take values in a complete separable metric space. We prove that the maximum likelihood estimator (MLE) of the parameter is strongly consistent under a rather minimal set of assumptions. As special cases of our main result, we obtain consistency in a large class of nonlinear state space models, as well as general results on linear Gaussian state space models and finite state models. A novel aspect of our approach is an information-theoretic technique for proving identifiability, which does not require an explicit representation for the relative entropy rate. Our method of proof could therefore form a foundation for the investigation of MLE consistency in more general dependent and non-Markovian time series. Also of independent interest is a general concentration inequality for $V$-uniformly ergodic Markov chains."
markov_models  statistics  estimation  concentration_of_measure  information_theory  identifiability  re:AoS_project  re:XV_for_mixing  have_read 
december 2009 by cshalizi
Dependence with Complete Connections and its Applications - Cambridge University Press
"Dependence with complete connections is a more general type of stochastic process than the well-known Markovian dependence, accounting for a complete history of a stochastic evolution. This book is an authoritative survey of knowledge of the subject, dealing with the basic theoretical understanding and also with applications. These arise in a variety of situations as diverse as stochastic models of learning, branching processes in random environments, continued fractions and dynamical systems. Thus the book will appeal to mathematicians working in probability theory, ergodic theory and number theory, as well as applied mathematicians, engineers, biologists and social scientists interested in applications of stochastic methods." The "epsilon-machine" models I work with are an example of chains with complete connections.
stochastic_processes  ergodic_theory  markov_models  books:noted  re:AoS_project 
july 2009 by cshalizi
FSA utilities: A toolbox to manipulate finite-state automata
"This paper describes the FSA Utilities toolbox: a collection of utilities to manipulate finite-state automata and finite-state transducers. Manipulations include determinization (both for finite-state acceptors and finite-state transducers), minimization, composition, complementation, intersection, Kleene closure, etc. Furthermore, various visualization tools are available to browse finite-state automata. The toolbox is implemented in SICStus Prolog." Where's something comparable in a sane language? (N.B., must handle transducers.)
automata_theory  programming  re:AoS_project 
june 2009 by cshalizi
[0905.3369] Learning Nonlinear Dynamic Models
... from a quick scan (the abstract is completely uninformative), this seems to be yet another near-reinvention of Knight's "prediction process". To be read, and to act as a goad for me to finish CSSR II w/ KLK. (I confess I am somewhat boggled at the idea that all HMMs are linear.)

Update: After a careful read, this really is just a rediscovery of predictive representations, with the trick of using regression to learn the state-transition and emission functions. On the one hand, I feel kind of burned by seeing them calling this "entirely new" (never mind me or my teachers, Littman, Sutton & Singh should be very upset; so should Knight if he were still alive). On the other hand, they got it _done_, which is a very real virtue.

Also: You need to put **** error bars on your average performance plots. (Yes, I realize I'm nit-picking because I'm jealous.)
prediction  statistics  machine_learning  time_series  markov_models  state-space_models  via:arthegall  re:AoS_project  langford.john  zhang.tong  salakhutdinov.ruslan  have_read 
june 2009 by cshalizi
On error-free filtering of finite-state singular processes under dependent distortions - Prelov and van der Meulen
When can the state of one process be recovered without error from another? (Use of infinite time limit here is not quite relevant to my immediate needs so must see how to modify proof.)
prelov.v._v.  van_der_meulen.e._c.  filtering  state_estimation  information_theory  re:AoS_project 
february 2008 by cshalizi
[0708.3412] Observability and nonlinear filtering
"This paper develops a connection between the asymptotic stability of nonlinear filters and a notion of observability. We consider a general class of hidden Markov models in continuous time with compact signal state space, and call such a model observable if no two initial measures of the signal process give rise to the same law of the observation process. We demonstrate that observability implies stability of the filter, i.e., the filtered estimates become insensitive to the initial measure at large times. For the special case where the signal is a finite-state Markov process and the observations are of the white noise type, a complete (necessary and sufficient) characterization of filter stability is obtained in terms of a slightly weaker detectability condition. In addition to observability, the role of controllability in filter stability is explored. Finally, the results are partially extended to non-compact signal state spaces."
in_NB  markov_models  filtering  re:AoS_project  van_handel.ramon  heard_the_talk 
november 2007 by cshalizi

related tags

approximation  automata_theory  ay.nihat  biochemical_networks  books:noted  books:recommended  central_limit_theorem  chains_with_complete_connections  classifiers  clustering  complexity_measures  concentration_of_measure  convergence_of_stochastic_processes  coupling  coveted  crespi.valentino  csiszar.imre  cybenko.george  decision_trees  dimension_reduction  directed_information  discretization  dynamical_systems  ensemble_methods  entropy_estimation  ergodic_theory  estimation  exponential_convergence_of_empirical_probabilities  exponential_families  filtering  galves.antonio  gene_expression_data_analysis  gibbs_distributions  graphical_models  graph_theory  have_read  heard_the_talk  heavy_tails  hsu.daniel  hypothesis_testing  identifiability  inference_to_latent_objects  information_criteria  information_theory  interacting_particle_systems  in_NB  jaeger.herbert  kakade.sham  kitchens.bruce  kith_and_kin  langford.john  learning_theory  leonardi.florencia  linguistics  lohr.wolfgang  machine_learning  markov_models  mixing  model_selection  mohri.mehryar  monte_carlo  morvai.gusztav  multiple_testing  non-stationarity  nonparametrics  observable_operator_models  pereira.fernando  phase_transitions  prediction  predictive_state_representations  prelov.v._v.  programming  random_fields  random_walks  re:almost_none  re:AoS_project  re:stacs  re:XV_for_mixing  re:your_favorite_dsge_sucks  recurrence_times  riley.michael  salakhutdinov.ruslan  sofic_processes  song.le  spectral_clustering  spectral_methods  stability_of_learning  state-space_models  state_estimation  statistical_inference_for_stochastic_processes  statistical_mechanics  statistics  stochastic_processes  symbolic_dynamics  synchronizing_words  time_series  to:NB  to_be_shot_after_a_fair_trial  to_read  to_teach:data-mining  transducers  tuncel.selim  two-sample_tests  universal_prediction  van_der_meulen.e._c.  van_handel.ramon  variable-length_markov_models  via:arthegall  via:leo  via:vaguery  waiting_times  weiss.benjamin  zhang.tong 

Copy this bookmark:



description:


tags: