cshalizi + re:xv_for_mixing 32
[1202.4283] Fast rates in learning with dependent observations
february 2012 by cshalizi
"In this paper we tackle the problem of fast rates in time series forecasting from a statistical learning perspective. In a serie of papers (e.g. Meir 2000, Modha and Masry 1998, Alquier and Wintenberger 2012) it is shown that the main tools used in learning theory with iid observations can be extended to the prediction of time series. The main message of these papers is that, given a family of predictors, we are able to build a new predictor that predicts the series as well as the best predictor in the family, up to a remainder of order $1/sqrt{n}$. It is known that this rate cannot be improved in general. In this paper, we show that in the particular case of the least square loss, and under a strong assumption on the time series (phi-mixing) the remainder is actually of order $1/n$. Thus, the optimal rate for iid variables, see e.g. Tsybakov 2003, and individual sequences, see cite{lugosi} is, for the first time, achieved for uniformly mixing processes. We also show that our method is optimal for aggregating sparse linear combinations of predictors."
--- Assumes observations are in the interval [-B,B] and gets a bound which is O(B^3), and so useless for our purposes.
in_NB
learning_theory
mixing
ergodic_theory
re:your_favorite_dsge_sucks
re:XV_for_mixing
have_read
--- Assumes observations are in the interval [-B,B] and gets a bound which is O(B^3), and so useless for our purposes.
february 2012 by cshalizi
[1111.1876] On the stability of bootstrap estimators
november 2011 by cshalizi
"It is shown that bootstrap approximations of an estimator which is based on a continuous operator from the set of Borel probability measures defined on a compact metric space into a complete separable metric space is stable in the sense of qualitative robustness. Support vector machines based on shifted loss functions are treated as special cases."
in_NB
statistics
bootstrap
stability_of_learning
re:XV_for_mixing
november 2011 by cshalizi
A Bernstein type inequality and moderate deviations for weakly dependent sequences
november 2011 by cshalizi
"In this paper we present a Bernstein-type tail inequality for the maximum of partial sums of a weakly dependent sequence of random variables that is not necessarily bounded. The class considered includes geometrically and subgeometrically strongly mixing sequences. The result is then used to derive asymptotic moderate deviation results. Applications are given for classes of Markov chains, iterated Lipschitz models and functions of linear processes with absolutely regular innovations." Also: http://arxiv.org/abs/0902.0582
in_NB
to_read
re:XV_for_mixing
re:your_favorite_dsge_sucks
concentration_of_measure
deviation_bounds
mixing
ergodic_theory
stochastic_processes
moderate_deviations
november 2011 by cshalizi
[1110.2529] The Generalization Ability of Online Algorithms for Dependent Data
october 2011 by cshalizi
"We study the generalization performance of arbitrary online learning algorithms trained on samples coming from a dependent source of data. We show that the generalization error of any stable online algorithm concentrates around its regret--an easily computable statistic of the online performance of the algorithm--when the underlying ergodic process is $beta$- or $phi$-mixing. We show high probability error bounds assuming the loss function is convex, and we also establish sharp convergence rates and deviation bounds for strongly convex losses and several linear prediction problems such as linear and logistic regression, least-squares SVM, and boosting on dependent data. In addition, our results have straightforward applications to stochastic optimization with dependent data, and our analysis requires only martingale convergence arguments; we need not rely on more powerful statistical tools such as empirical process theory."
in_NB
learning_theory
individual_sequence_prediction
ergodic_theory
mixing
re:growing_ensemble_project
re:XV_for_mixing
stability_of_learning
concentration_of_measure
have_read
re:your_favorite_dsge_sucks
october 2011 by cshalizi
Cross-Validation and Mean-Square Stability
march 2011 by cshalizi
It's a little boggling that they don't cite any of the modern (2000--) work on theoretical properties of CV, but oh well...
cross-validation
learning_theory
stability_of_learning
statistics
re:your_favorite_dsge_sucks
re:XV_for_mixing
re:XV_for_networks
to_read
via:nikete
march 2011 by cshalizi
Adams, Nobel: Uniform convergence of Vapnik–Chervonenkis classes under ergodic sampling
july 2010 by cshalizi
Oooh: "We show that if X is a complete separable metric space and C is a countable family of Borel subsets of with finite VC dimension, then, for every stationary ergodic process with values in X, the relative frequencies of sets c \in C converge uniformly to their limiting probabilities. Beyond ergodicity, no assumptions are imposed on the sampling process, and no regularity conditions are imposed on the elements of C. The result extends existing work of Vapnik and Chervonenkis, among others, who have studied uniform convergence for i.i.d. and strongly mixing processes. Our method of proof is new and direct: it does not rely on symmetrization techniques, probability inequalities or mixing conditions. The uniform convergence of relative frequencies for VC-major and VC-graph classes of functions under ergodic sampling is established as a corollary of the basic result for sets." No rates, but very nice.
ergodic_theory
learning_theory
stochastic_processes
vc-dimension
have_read
re:XV_for_mixing
re:your_favorite_dsge_sucks
july 2010 by cshalizi
IEEE Xplore - On the generalization ability of on-line learning algorithms
july 2010 by cshalizi
"how to extract a hypothesis with small risk from the ensemble of hypotheses generated by an arbitrary on-line learning algorithm run on [IID data]. ... a simple large deviation argument [proves] tight data-dependent bounds for the risk of this hypothesis in terms of an easily computable statistic Mn associated with the on-line performance of the ensemble. Via sharp pointwise bounds on Mn, we then obtain risk tail bounds for kernel perceptron algorithms in terms of the spectrum of the empirical kernel matrix. ... A distinctive feature of our approach is that the key tools for our analysis come from the model of prediction of individual sequences; i.e., a model making no probabilistic assumptions on the source generating the data. In fact, these tools turn out to be so powerful that we only need very elementary statistical facts to obtain our final risk bounds." Bounced off this 2004; try again.
have_read
learning_theory
large_deviations
online_learning
individual_sequence_prediction
via:djm1107
re:your_favorite_dsge_sucks
re:XV_for_mixing
ensemble_methods
low-regret_learning
july 2010 by cshalizi
Boente, Fraiman: Robust Nonparametric Regression Estimation for Dependent Observations
march 2010 by cshalizi
"Robust nonparametric estimators for regression and autoregression are proposed for $\varphi$- and $\alpha$-mixing processes. Two families of $M$-type robust equivariant estimators are considered: (i) estimators based on kernel methods and (ii) estimators based on $k$-nearest neighbor kernel methods. Strong consistency of both families is proved under mild conditions. For the first class the result is true under no assumptions whatsoever on the distribution of the observations."
statistical_inference_for_stochastic_processes
regression
estimation
re:XV_for_mixing
march 2010 by cshalizi
Stability and Generalization (Bousquet and Elisseeff, 2002)
february 2010 by cshalizi
Odd that they don't mention Domingos's "proces-oriented evaluation".
learning_theory
statistics
hilbert_space
re:your_favorite_dsge_sucks
re:XV_for_mixing
re:XV_for_networks
have_read
february 2010 by cshalizi
Arlot, Blanchard, Roquain: Some nonasymptotic results on resampling in high dimension, I: Confidence regions
december 2009 by cshalizi
"We study generalized bootstrap confidence regions for the mean of a random vector whose coordinates have an unknown dependency structure. The random vector is supposed to be either Gaussian or to have a symmetric and bounded distribution. The dimensionality of the vector can possibly be much larger than the number of observations and we focus on a nonasymptotic control of the confidence level, following ideas inspired by recent results in learning theory. We consider two approaches, the first based on a concentration principle (valid for a large class of resampling weights) and the second on a resampled quantile, specifically using Rademacher weights. Several intermediate results established in the approach based on concentration principles are of interest in their own right. We also discuss the question of accuracy when using Monte Carlo approximations of the resampled quantities."
statistics
resampling
bootstrap
cross-validation
confidence_sets
to_read
re:XV_for_mixing
concentration_of_measure
learning_theory
december 2009 by cshalizi
[0912.4480] Consistency of the Maximum Likelihood Estimator for general hidden Markov models
december 2009 by cshalizi
"a parametrized family of general hidden Markov models, where both the observed and unobserved components take values in a complete separable metric space. We prove that the maximum likelihood estimator (MLE) of the parameter is strongly consistent under a rather minimal set of assumptions. As special cases of our main result, we obtain consistency in a large class of nonlinear state space models, as well as general results on linear Gaussian state space models and finite state models. A novel aspect of our approach is an information-theoretic technique for proving identifiability, which does not require an explicit representation for the relative entropy rate. Our method of proof could therefore form a foundation for the investigation of MLE consistency in more general dependent and non-Markovian time series. Also of independent interest is a general concentration inequality for $V$-uniformly ergodic Markov chains."
markov_models
statistics
estimation
concentration_of_measure
information_theory
identifiability
re:AoS_project
re:XV_for_mixing
have_read
december 2009 by cshalizi
Consistency and Localizability
december 2009 by cshalizi
"We show that all consistent learning methods---that is, that asymptotically achieve the lowest possible expected loss for any distribution on (X,Y)---are necessarily localizable, by which we mean that they do not significantly change their response at a particular point when we show them only the part of the training set that is close to that point. This is true in particular for methods that appear to be defined in a non-local manner, such as support vector machines in classification and least-squares estimators in regression. Aside from showing that consistency implies a specific form of localizability, we also show that consistency is logically equivalent to the combination of two properties: (1) a form of localizability, and (2) that the method's global mean (over the entire X distribution) correctly estimates the true mean. Consistency can therefore be seen as comprised of two aspects, one local and one global." Feels like it should connect to cross-validation.
statistics
machine_learning
learning_theory
consistency
to_read
re:XV_for_networks
re:XV_for_mixing
december 2009 by cshalizi
Sensitivity Analysis of k-fold Cross-Validation in Prediction Error Estimation
november 2009 by cshalizi
Apparently IEEE makes this available solely to tease me, since, while we have a fully paid-up electronic subscription, I can't get access.
machine_learning
statistics
cross-validation
to_read
re:XV_for_mixing
re:XV_for_networks
november 2009 by cshalizi
A Hoeffding-Type Inequality for Ergodic Time Series (Tang, 2007)
september 2009 by cshalizi
"In this paper, a Hoeffding-type inequality is presented for a class of ergodic time series. The inequality is then used to construct uniformly exponentially consistent tests, which are useful tools for studying Bayesian consistency." Preprint at http://www4.stat.ncsu.edu/~sghosal/papers/Tang.pdf, haven't compared.
ergodic_theory
deviation_bounds
hoeffdings_inequality
time_series
have_read
re:XV_for_mixing
september 2009 by cshalizi
[0906.0791] Instability statistics and mixing rates
june 2009 by cshalizi
"We claim that looking at probability distributions of emph{finite time} largest Lyapunov exponents, and more precisely studying their large deviation properties, yields an extremely powerful technique to get quantitative estimates of polynomial decay rates of time correlations and Poincar'e recurrences in the -quite delicate- case of dynamical systems with weak chaotic properties."
dynamical_systems
large_deviations
poincare_recurrence
mixing
ergodic_theory
in_NB
to_read
re:XV_for_mixing
june 2009 by cshalizi
On Uniform Deviations of General Empirical Risks with Unboundedness, Dependence, and High Dimensionality
may 2009 by cshalizi
"The statistical learning theory of risk minimization depends heavily on probability bounds for uniform deviations of the empirical risks. Classical probability bounds using Hoeffding's inequality cannot accommodate more general situations with unbounded loss and dependent data. The current paper introduces an inequality that extends Hoeffding's inequality to handle these more general situations. We will apply this inequality to provide probability bounds for uniform deviations in a very general framework, which can involve discrete decision rules, unbounded loss, and a dependence structure that can be more general than either martingale or strong mixing". --- This is very sweet. The dependence measure they use is, I think, the same as the "gamma' dependence coefficient of Dedecker et al., from their recent book _Weak Dependence_, though apparently independent here.
statistics
learning_theory
deviation_bounds
weak_dependence
jiang.wenxin
via:shivak
re:your_favorite_dsge_sucks
have_read
re:XV_for_mixing
may 2009 by cshalizi
Stability is sufficient for generalization and necessary and sufficient for consistency of empirical risk minimization
february 2009 by cshalizi
Why hadn't I seen this before?
learning_theory
machine_learning
cross-validation
re:XV_for_networks
via:shivak
niyogi.partha
re:XV_for_mixing
re:your_favorite_dsge_sucks
february 2009 by cshalizi
related tags
arlot.sylvain ⊕ barvinok.alexander ⊕ bootstrap ⊕ calibration ⊕ concentration_of_measure ⊕ confidence_sets ⊕ consistency ⊕ cross-validation ⊕ deviation_bounds ⊕ dynamical_systems ⊕ empirical_processes ⊕ ensemble_methods ⊕ ergodic_theory ⊕ estimation ⊕ have_read ⊕ hilbert_space ⊕ hoeffdings_inequality ⊕ identifiability ⊕ individual_sequence_prediction ⊕ information_theory ⊕ in_NB ⊕ jiang.wenxin ⊕ large_deviations ⊕ learning_theory ⊕ low-regret_learning ⊕ machine_learning ⊕ markov_models ⊕ mathematics ⊕ mixing ⊕ model_selection ⊕ moderate_deviations ⊕ niyogi.partha ⊕ online_learning ⊕ poincare_recurrence ⊕ prediction ⊕ probability ⊕ re:almost_none ⊕ re:AoS_project ⊕ re:growing_ensemble_project ⊕ re:stacs ⊕ re:XV_for_mixing ⊕ re:XV_for_networks ⊕ re:your_favorite_dsge_sucks ⊕ recurrence_times ⊕ regression ⊕ resampling ⊕ stability_of_learning ⊕ statistical_inference_for_stochastic_processes ⊕ statistics ⊕ stochastic_processes ⊕ time_series ⊕ to:NB ⊕ to_read ⊕ to_teach:data-mining ⊕ universal_prediction ⊕ vc-dimension ⊕ via:djm1107 ⊕ via:nikete ⊕ via:shivak ⊕ waiting_times ⊕ weak_dependence ⊕Copy this bookmark: