cshalizi + re:growing_ensemble_project 21
[1205.3845] Forecasting with Historical Data or Process Knowledge under Misspecification: A Comparison
7 days ago by cshalizi
"When faced with the task of forecasting a dynamic system, practitioners often have available historical data, knowledge of the system, or a combination of both. While intuition dictates that perfect knowledge of the system should in theory yield perfect forecasting, often knowledge of the system is only partially known, known up to parameters, or known incorrectly. In contrast, forecasting using previous data without any process knowledge might result in accurate prediction for simple systems, but will fail for highly nonlinear and chaotic systems. In this paper, the authors demonstrate how even in chaotic systems, forecasting with historical data is preferable to using process knowledge if this knowledge exhibits certain forms of misspecification. Through an extensive simulation study, a range of misspecification and forecasting scenarios are examined with the goal of gaining an improved understanding of the circumstances under which forecasting from historical data is to be preferred over using process knowledge."
to:NB
to_read
prediction
time_series
misspecification
re:growing_ensemble_project
7 days ago by cshalizi
[1202.3323] A new look at shifting regret
12 weeks ago by cshalizi
We investigate extensions of well-known online learning algorithms such as fixed-share of Herbster and Warmuth (1998) or the methods proposed by Bousquet and Warmuth (2002). These algorithms use weight sharing schemes to perform as well as the best sequence of experts with a limited number of changes. Here we show, with a common, general, and simpler analysis, that weight sharing in fact achieves much more than what it was designed for. We use it to simultaneously prove new shifting regret bounds for online convex optimization on the simplex in terms of the total variation distance as well as new bounds for the related setting of adaptive regret. Finally, we exhibit the first logarithmic shifting bounds for exp-concave loss functions on the simplex.
online_learning
to_read
individual_sequence_prediction
non-stationarity
re:growing_ensemble_project
in_NB
low-regret_learning
have_read
12 weeks ago by cshalizi
[0801.0327] Nonparametric sequential prediction of time series
february 2012 by cshalizi
"Time series prediction covers a vast field of every-day statistical applications in medical, environmental and economic domains. In this paper we develop nonparametric prediction strategies based on the combination of a set of 'experts' and show the universal consistency of these strategies under a minimum of conditions. We perform an in-depth analysis of real-world data sets and show that these nonparametric strategies are more flexible, faster and generally outperform ARMA methods in terms of normalized cumulative prediction error."
in_NB
time_series
nonparametrics
prediction
statistics
to_teach:undergrad-ADA
re:growing_ensemble_project
february 2012 by cshalizi
[math/0701419] Strategies for prediction under imperfect monitoring
february 2012 by cshalizi
"We propose simple randomized strategies for sequential prediction under imperfect monitoring, that is, when the forecaster does not have access to the past outcomes but rather to a feedback signal. The proposed strategies are consistent in the sense that they achieve, asymptotically, the best possible average reward. It was Rustichini (1999) who first proved the existence of such consistent predictors. The forecasters presented here offer the first constructive proof of consistency. Moreover, the proposed algorithms are computationally efficient. We also establish upper bounds for the rates of convergence. In the case of deterministic feedback, these rates are optimal up to logarithmic terms."
to:NB
prediction
individual_sequence_prediction
learning_in_games
re:growing_ensemble_project
february 2012 by cshalizi
[1202.4294] Prediction of quantiles by statistical learning and application to GDP forecasting
february 2012 by cshalizi
"In this paper, we tackle the problem of prediction and confidence intervals for time series using a statistical learning approach and quantile loss functions. In a first time, we show that the Gibbs estimator (also known as Exponentially Weighted aggregate) is able to predict as well as the best predictor in a given family for a wide set of loss functions. In particular, using the quantile loss function of Koenker and Bassett (1978), this allows to build confidence intervals. We apply these results to the problem of prediction and confidence regions for the French Gross Domestic Product (GDP) growth, with promising results."
in_NB
to_read
prediction
confidence_sets
learning_theory
re:your_favorite_dsge_sucks
re:growing_ensemble_project
february 2012 by cshalizi
[1202.3079] Towards minimax policies for online linear optimization with bandit feedback
february 2012 by cshalizi
"We address the online linear optimization problem with bandit feedback. Our contribution is twofold. First, we provide an algorithm (based on exponential weights) with a regret of order $sqrt{d n log N}$ for any finite action set with $N$ actions, under the assumption that the instantaneous loss is bounded by 1. This shaves off an extraneous $sqrt{d}$ factor compared to previous works, and gives a regret bound of order $d sqrt{n log n}$ for any compact set of actions. Without further assumptions on the action set, this last bound is minimax optimal up to a logarithmic factor. Interestingly, our result also shows that the minimax regret for bandit linear optimization with expert advice in $d$ dimension is the same as for the basic $d$-armed bandit with expert advice. Our second contribution is to show how to use the Mirror Descent algorithm to obtain computationally efficient strategies with minimax optimal regret bounds in specific examples. More precisely we study two canonical action sets: the hypercube and the Euclidean ball. In the former case, we obtain the first computationally efficient algorithm with a $d sqrt{n}$ regret, thus improving by a factor $sqrt{d log n}$ over the best known result for a computationally efficient algorithm. In the latter case, our approach gives the first algorithm with a $sqrt{d n log n}$ regret, again shaving off an extraneous $sqrt{d}$ compared to previous works."
in_NB
online_learning
decision_theory
optimization
re:growing_ensemble_project
cesa-bianchi.nicolo
kakade.sham
bubeck.sebastien
february 2012 by cshalizi
[1201.5568] Dynamic trees for streaming and massive data contexts
january 2012 by cshalizi
"Data collection at a massive scale is becoming ubiquitous in a wide variety of settings, from vast offline databases to streaming real-time information. Learning algorithms deployed in such contexts must rely on single-pass inference, where the data history is never revisited. In streaming contexts, learning must also be temporally adaptive to remain up-to-date against unforeseen changes in the data generating mechanism. Although rapidly growing, the online Bayesian inference literature remains challenged by massive data and transient, evolving data streams. Non-parametric modelling techniques can prove particularly ill-suited, as the complexity of the model is allowed to increase with the sample size. In this work, we take steps to overcome these challenges by porting standard streaming techniques, like data discarding and downweighting, into a fully Bayesian framework via the use of informative priors and active learning heuristics. We showcase our methods by augmenting a modern non-parametric modelling framework, dynamic trees, and illustrate its performance on a number of practical examples. The end product is a powerful streaming regression and classification tool, whose performance compares favourably to the state-of-the-art."
to:NB
machine_learning
non-stationarity
statistics
data_mining
to_read
re:growing_ensemble_project
january 2012 by cshalizi
[1111.6337] Regret Bound by Variation for Online Convex Optimization
december 2011 by cshalizi
"In citep{Hazan-2008-extract}, the authors showed that the regret of online linear optimization can be bounded by the total variation of the cost vectors. In this paper, we extend this result to general online convex optimization. We first analyze the limitations of the algorithm in citep{Hazan-2008-extract} when applied it to online convex optimization. We then present two algorithms for online convex optimization whose regrets are bounded by the variation of cost functions. We finally consider the bandit setting, and present a randomized algorithm for online bandit convex optimization with a variation-based regret bound. We show that the regret bound for online bandit convex optimization is optimal when the variation of cost functions is independent of the number of trials."
in_NB
to_read
re:growing_ensemble_project
learning_theory
individual_sequence_prediction
december 2011 by cshalizi
[1110.6416] Adaptive Hedge
october 2011 by cshalizi
"Most methods for decision-theoretic online learning are based on the Hedge algorithm, which takes a parameter called the learning rate. In most previous analyses the learning rate was carefully tuned to obtain optimal worst-case performance, leading to suboptimal performance on easy instances, for example when there exists an action that is significantly better than all others. We propose a new way of setting the learning rate, which adapts to the difficulty of the learning problem: in the worst case our procedure still guarantees optimal performance, but on easy instances it achieves much smaller regret. In particular, our adaptive method achieves constant regret in a probabilistic setting, when there exists an action that on average obtains strictly smaller loss than all other actions. We also provide a simulation study comparing our approach to existing methods."
to:NB
to_read
re:growing_ensemble_project
online_learning
prediction
grunwald.peter
low-regret_learning
october 2011 by cshalizi
[1110.2755] Efficient Tracking of Large Classes of Experts
october 2011 by cshalizi
"In the framework for prediction of individual sequences, sequential prediction methods are to be constructed that perform nearly as well as the best expert from a given class. We consider prediction strategies that compete with the class of switching strategies that can segment a given sequence into several blocks, and follow the advice of a different "base" expert in each block. As usual, the performance of the algorithm is measured by the regret defined as the excess loss relative to the best switching strategy %(with an arbitrary number of switches) selected in hindsight for the particular sequence to be predicted. In this paper we construct %strongly sequential (i.e., horizon-independent) prediction strategies of low computational cost for the case where the set of base experts is large. In particular we derive a family of efficient tracking algorithms that, for any prediction algorithm $A$ designed for the base class, can be implemented with time and space complexity $O(n^{gamma} log n)$ times larger than that of $A$, where $n$ is the time horizon and $gamma ge 0$ is a parameter of the algorithm. With $A$ properly chosen, our algorithm achieves a regret bound of optimal order for $gamma>0$, and only $O(log n)$ times larger than the optimal order for $gamma=0$ for all typical regret bound types we examined. For example, for predicting binary sequences with switching parameters, our method achieves the optimal $O(log n)$ regret rate with time complexity $O(n^{1+gamma}log n)$ for any $gammain (0,1)$."
to:NB
to_read
re:growing_ensemble_project
learning_theory
individual_sequence_prediction
lugosi.gabor
october 2011 by cshalizi
[1110.2529] The Generalization Ability of Online Algorithms for Dependent Data
october 2011 by cshalizi
"We study the generalization performance of arbitrary online learning algorithms trained on samples coming from a dependent source of data. We show that the generalization error of any stable online algorithm concentrates around its regret--an easily computable statistic of the online performance of the algorithm--when the underlying ergodic process is $beta$- or $phi$-mixing. We show high probability error bounds assuming the loss function is convex, and we also establish sharp convergence rates and deviation bounds for strongly convex losses and several linear prediction problems such as linear and logistic regression, least-squares SVM, and boosting on dependent data. In addition, our results have straightforward applications to stochastic optimization with dependent data, and our analysis requires only martingale convergence arguments; we need not rely on more powerful statistical tools such as empirical process theory."
in_NB
learning_theory
individual_sequence_prediction
ergodic_theory
mixing
re:growing_ensemble_project
re:XV_for_mixing
stability_of_learning
concentration_of_measure
have_read
re:your_favorite_dsge_sucks
october 2011 by cshalizi
Phys. Rev. E 84, 046702 (2011): Nonparametric segmentation of nonstationary time series
october 2011 by cshalizi
"The nonstationary evolution of observable quantities in complex systems can frequently be described as a juxtaposition of quasistationary spells. Given that standard theoretical and data analysis approaches usually rely on the assumption of stationarity, it is important to detect in real time series intervals holding that property. With that aim, we introduce a segmentation algorithm based on a fully nonparametric approach. We illustrate its applicability through the analysis of real time series presenting diverse degrees of nonstationarity, thus showing that this segmentation procedure generalizes and allows one to uncover features unresolved by previous proposals based on the discrepancy of low order statistical moments only."
in_NB
statistics
change-point_problem
time_series
nonparametrics
re:growing_ensemble_project
october 2011 by cshalizi
Phys. Rev. E 82, 056206 (2010): Forecasting the evolution of nonlinear and nonstationary systems using recurrence-based local Gaussian process models
november 2010 by cshalizi
"...combining nonparametric Gaussian process (GP) modeling with certain local topological considerations ... for prediction (one-step look ahead) of ... nonlinear and nonstationary dynamics. ... partition ... trajectories into multiple near-stationary segments by aligning the boundaries of the partitions with those of the piecewise affine projections of the underlying dynamic system... alignment is achieved through the consideration of recurrence and other local topological properties ... forecasting in Lorenz system under different levels of induced noise and nonstationarity, synthetic heart-rate signals, and a real-world time-series from an industrial operation known to exhibit highly nonlinear and nonstationary dynamics. ... local Gaussian process can significantly outperform not just classical system identification, neural network and nonparametric models, but also the sequential Bayesian Monte Carlo methods in terms of prediction accuracy and computational speed."
time_series
prediction
non-stationarity
gaussian_processes
re:growing_ensemble_project
to_read
november 2010 by cshalizi
Extracting certainty from uncertainty: regret bounded by variation in costs
july 2010 by cshalizi
Cool, but not sure it works really for our setting.
learning_theory
individual_sequence_prediction
ensemble_methods
re:growing_ensemble_project
have_read
july 2010 by cshalizi
[1006.0475] Prediction with Advice of Unknown Number of Experts
june 2010 by cshalizi
"In the framework of prediction with expert advice, we consider a recently introduced kind of regret bounds: the bounds that depend on the effective instead of nominal number of experts. In contrast to the NormalHedge bound, which mainly depends on the effective number of experts and also weakly depends on the nominal one, we obtain a bound that does not contain the nominal number of experts at all. We use the defensive forecasting method and introduce an application of defensive forecasting to multivalued supermartingales."
prediction
learning_theory
re:growing_ensemble_project
june 2010 by cshalizi
related tags
bubeck.sebastien ⊕ cesa-bianchi.nicolo ⊕ change-point_problem ⊕ climate_change ⊕ climatology ⊕ concentration_of_measure ⊕ confidence_sets ⊕ data_mining ⊕ data_sets ⊕ decision_theory ⊕ ensemble_methods ⊕ ergodic_theory ⊕ gaussian_processes ⊕ grunwald.peter ⊕ have_read ⊕ individual_sequence_prediction ⊕ in_NB ⊕ kakade.sham ⊕ learning_in_games ⊕ learning_theory ⊕ low-regret_learning ⊕ lugosi.gabor ⊕ machine_learning ⊕ markov_models ⊕ misspecification ⊕ mixing ⊕ non-stationarity ⊕ nonparametrics ⊕ online_learning ⊕ optimization ⊕ prediction ⊕ re:growing_ensemble_project ⊖ re:XV_for_mixing ⊕ re:your_favorite_dsge_sucks ⊕ self-promotion ⊕ stability_of_learning ⊕ statistics ⊕ time_series ⊕ to:NB ⊕ to_read ⊕ to_teach ⊕ to_teach:undergrad-ADA ⊕Copy this bookmark: