cshalizi + re:xv_for_mixing   32

[1202.4283] Fast rates in learning with dependent observations
"In this paper we tackle the problem of fast rates in time series forecasting from a statistical learning perspective. In a serie of papers (e.g. Meir 2000, Modha and Masry 1998, Alquier and Wintenberger 2012) it is shown that the main tools used in learning theory with iid observations can be extended to the prediction of time series. The main message of these papers is that, given a family of predictors, we are able to build a new predictor that predicts the series as well as the best predictor in the family, up to a remainder of order $1/sqrt{n}$. It is known that this rate cannot be improved in general. In this paper, we show that in the particular case of the least square loss, and under a strong assumption on the time series (phi-mixing) the remainder is actually of order $1/n$. Thus, the optimal rate for iid variables, see e.g. Tsybakov 2003, and individual sequences, see cite{lugosi} is, for the first time, achieved for uniformly mixing processes. We also show that our method is optimal for aggregating sparse linear combinations of predictors."

--- Assumes observations are in the interval [-B,B] and gets a bound which is O(B^3), and so useless for our purposes.
in_NB  learning_theory  mixing  ergodic_theory  re:your_favorite_dsge_sucks  re:XV_for_mixing  have_read 
february 2012 by cshalizi
[1111.1876] On the stability of bootstrap estimators
"It is shown that bootstrap approximations of an estimator which is based on a continuous operator from the set of Borel probability measures defined on a compact metric space into a complete separable metric space is stable in the sense of qualitative robustness. Support vector machines based on shifted loss functions are treated as special cases."
in_NB  statistics  bootstrap  stability_of_learning  re:XV_for_mixing 
november 2011 by cshalizi
A Bernstein type inequality and moderate deviations for weakly dependent sequences
"In this paper we present a Bernstein-type tail inequality for the maximum of partial sums of a weakly dependent sequence of random variables that is not necessarily bounded. The class considered includes geometrically and subgeometrically strongly mixing sequences. The result is then used to derive asymptotic moderate deviation results. Applications are given for classes of Markov chains, iterated Lipschitz models and functions of linear processes with absolutely regular innovations." Also: http://arxiv.org/abs/0902.0582
in_NB  to_read  re:XV_for_mixing  re:your_favorite_dsge_sucks  concentration_of_measure  deviation_bounds  mixing  ergodic_theory  stochastic_processes  moderate_deviations 
november 2011 by cshalizi
[1110.2529] The Generalization Ability of Online Algorithms for Dependent Data
"We study the generalization performance of arbitrary online learning algorithms trained on samples coming from a dependent source of data. We show that the generalization error of any stable online algorithm concentrates around its regret--an easily computable statistic of the online performance of the algorithm--when the underlying ergodic process is $beta$- or $phi$-mixing. We show high probability error bounds assuming the loss function is convex, and we also establish sharp convergence rates and deviation bounds for strongly convex losses and several linear prediction problems such as linear and logistic regression, least-squares SVM, and boosting on dependent data. In addition, our results have straightforward applications to stochastic optimization with dependent data, and our analysis requires only martingale convergence arguments; we need not rely on more powerful statistical tools such as empirical process theory."
in_NB  learning_theory  individual_sequence_prediction  ergodic_theory  mixing  re:growing_ensemble_project  re:XV_for_mixing  stability_of_learning  concentration_of_measure  have_read  re:your_favorite_dsge_sucks 
october 2011 by cshalizi
Cross-Validation and Mean-Square Stability
It's a little boggling that they don't cite any of the modern (2000--) work on theoretical properties of CV, but oh well...
cross-validation  learning_theory  stability_of_learning  statistics  re:your_favorite_dsge_sucks  re:XV_for_mixing  re:XV_for_networks  to_read  via:nikete 
march 2011 by cshalizi
Adams, Nobel: Uniform convergence of Vapnik–Chervonenkis classes under ergodic sampling
Oooh: "We show that if X is a complete separable metric space and C is a countable family of Borel subsets of with finite VC dimension, then, for every stationary ergodic process with values in X, the relative frequencies of sets c \in C converge uniformly to their limiting probabilities. Beyond ergodicity, no assumptions are imposed on the sampling process, and no regularity conditions are imposed on the elements of C. The result extends existing work of Vapnik and Chervonenkis, among others, who have studied uniform convergence for i.i.d. and strongly mixing processes. Our method of proof is new and direct: it does not rely on symmetrization techniques, probability inequalities or mixing conditions. The uniform convergence of relative frequencies for VC-major and VC-graph classes of functions under ergodic sampling is established as a corollary of the basic result for sets."  No rates, but very nice.
ergodic_theory  learning_theory  stochastic_processes  vc-dimension  have_read  re:XV_for_mixing  re:your_favorite_dsge_sucks 
july 2010 by cshalizi
IEEE Xplore - On the generalization ability of on-line learning algorithms
"how to extract a hypothesis with small risk from the ensemble of hypotheses generated by an arbitrary on-line learning algorithm run on [IID data]. ... a simple large deviation argument [proves] tight data-dependent bounds for the risk of this hypothesis in terms of an easily computable statistic Mn associated with the on-line performance of the ensemble. Via sharp pointwise bounds on Mn, we then obtain risk tail bounds for kernel perceptron algorithms in terms of the spectrum of the empirical kernel matrix. ... A distinctive feature of our approach is that the key tools for our analysis come from the model of prediction of individual sequences; i.e., a model making no probabilistic assumptions on the source generating the data. In fact, these tools turn out to be so powerful that we only need very elementary statistical facts to obtain our final risk bounds." Bounced off this 2004; try again.
have_read  learning_theory  large_deviations  online_learning  individual_sequence_prediction  via:djm1107  re:your_favorite_dsge_sucks  re:XV_for_mixing  ensemble_methods  low-regret_learning 
july 2010 by cshalizi
Boente, Fraiman: Robust Nonparametric Regression Estimation for Dependent Observations
"Robust nonparametric estimators for regression and autoregression are proposed for $\varphi$- and $\alpha$-mixing processes. Two families of $M$-type robust equivariant estimators are considered: (i) estimators based on kernel methods and (ii) estimators based on $k$-nearest neighbor kernel methods. Strong consistency of both families is proved under mild conditions. For the first class the result is true under no assumptions whatsoever on the distribution of the observations."
statistical_inference_for_stochastic_processes  regression  estimation  re:XV_for_mixing 
march 2010 by cshalizi
Arlot, Blanchard, Roquain: Some nonasymptotic results on resampling in high dimension, I: Confidence regions
"We study generalized bootstrap confidence regions for the mean of a random vector whose coordinates have an unknown dependency structure. The random vector is supposed to be either Gaussian or to have a symmetric and bounded distribution. The dimensionality of the vector can possibly be much larger than the number of observations and we focus on a nonasymptotic control of the confidence level, following ideas inspired by recent results in learning theory. We consider two approaches, the first based on a concentration principle (valid for a large class of resampling weights) and the second on a resampled quantile, specifically using Rademacher weights. Several intermediate results established in the approach based on concentration principles are of interest in their own right. We also discuss the question of accuracy when using Monte Carlo approximations of the resampled quantities."
statistics  resampling  bootstrap  cross-validation  confidence_sets  to_read  re:XV_for_mixing  concentration_of_measure  learning_theory 
december 2009 by cshalizi
[0912.4480] Consistency of the Maximum Likelihood Estimator for general hidden Markov models
"a parametrized family of general hidden Markov models, where both the observed and unobserved components take values in a complete separable metric space. We prove that the maximum likelihood estimator (MLE) of the parameter is strongly consistent under a rather minimal set of assumptions. As special cases of our main result, we obtain consistency in a large class of nonlinear state space models, as well as general results on linear Gaussian state space models and finite state models. A novel aspect of our approach is an information-theoretic technique for proving identifiability, which does not require an explicit representation for the relative entropy rate. Our method of proof could therefore form a foundation for the investigation of MLE consistency in more general dependent and non-Markovian time series. Also of independent interest is a general concentration inequality for $V$-uniformly ergodic Markov chains."
markov_models  statistics  estimation  concentration_of_measure  information_theory  identifiability  re:AoS_project  re:XV_for_mixing  have_read 
december 2009 by cshalizi
Consistency and Localizability
"We show that all consistent learning methods---that is, that asymptotically achieve the lowest possible expected loss for any distribution on (X,Y)---are necessarily localizable, by which we mean that they do not significantly change their response at a particular point when we show them only the part of the training set that is close to that point. This is true in particular for methods that appear to be defined in a non-local manner, such as support vector machines in classification and least-squares estimators in regression. Aside from showing that consistency implies a specific form of localizability, we also show that consistency is logically equivalent to the combination of two properties: (1) a form of localizability, and (2) that the method's global mean (over the entire X distribution) correctly estimates the true mean. Consistency can therefore be seen as comprised of two aspects, one local and one global." Feels like it should connect to cross-validation.
statistics  machine_learning  learning_theory  consistency  to_read  re:XV_for_networks  re:XV_for_mixing 
december 2009 by cshalizi
Sensitivity Analysis of k-fold Cross-Validation in Prediction Error Estimation
Apparently IEEE makes this available solely to tease me, since, while we have a fully paid-up electronic subscription, I can't get access.
machine_learning  statistics  cross-validation  to_read  re:XV_for_mixing  re:XV_for_networks 
november 2009 by cshalizi
A Hoeffding-Type Inequality for Ergodic Time Series (Tang, 2007)
"In this paper, a Hoeffding-type inequality is presented for a class of ergodic time series. The inequality is then used to construct uniformly exponentially consistent tests, which are useful tools for studying Bayesian consistency." Preprint at http://www4.stat.ncsu.edu/~sghosal/papers/Tang.pdf, haven't compared.
ergodic_theory  deviation_bounds  hoeffdings_inequality  time_series  have_read  re:XV_for_mixing 
september 2009 by cshalizi
[0906.0791] Instability statistics and mixing rates
"We claim that looking at probability distributions of emph{finite time} largest Lyapunov exponents, and more precisely studying their large deviation properties, yields an extremely powerful technique to get quantitative estimates of polynomial decay rates of time correlations and Poincar'e recurrences in the -quite delicate- case of dynamical systems with weak chaotic properties."
dynamical_systems  large_deviations  poincare_recurrence  mixing  ergodic_theory  in_NB  to_read  re:XV_for_mixing 
june 2009 by cshalizi
On Uniform Deviations of General Empirical Risks with Unboundedness, Dependence, and High Dimensionality
"The statistical learning theory of risk minimization depends heavily on probability bounds for uniform deviations of the empirical risks. Classical probability bounds using Hoeffding's inequality cannot accommodate more general situations with unbounded loss and dependent data. The current paper introduces an inequality that extends Hoeffding's inequality to handle these more general situations. We will apply this inequality to provide probability bounds for uniform deviations in a very general framework, which can involve discrete decision rules, unbounded loss, and a dependence structure that can be more general than either martingale or strong mixing". --- This is very sweet. The dependence measure they use is, I think, the same as the "gamma' dependence coefficient of Dedecker et al., from their recent book _Weak Dependence_, though apparently independent here.
statistics  learning_theory  deviation_bounds  weak_dependence  jiang.wenxin  via:shivak  re:your_favorite_dsge_sucks  have_read  re:XV_for_mixing 
may 2009 by cshalizi

Copy this bookmark:



description:


tags: