cshalizi + re:growing_ensemble_project   21

[1205.3845] Forecasting with Historical Data or Process Knowledge under Misspecification: A Comparison
"When faced with the task of forecasting a dynamic system, practitioners often have available historical data, knowledge of the system, or a combination of both. While intuition dictates that perfect knowledge of the system should in theory yield perfect forecasting, often knowledge of the system is only partially known, known up to parameters, or known incorrectly. In contrast, forecasting using previous data without any process knowledge might result in accurate prediction for simple systems, but will fail for highly nonlinear and chaotic systems. In this paper, the authors demonstrate how even in chaotic systems, forecasting with historical data is preferable to using process knowledge if this knowledge exhibits certain forms of misspecification. Through an extensive simulation study, a range of misspecification and forecasting scenarios are examined with the goal of gaining an improved understanding of the circumstances under which forecasting from historical data is to be preferred over using process knowledge."
to:NB  to_read  prediction  time_series  misspecification  re:growing_ensemble_project 
7 days ago by cshalizi
[1202.3323] A new look at shifting regret
We investigate extensions of well-known online learning algorithms such as fixed-share of Herbster and Warmuth (1998) or the methods proposed by Bousquet and Warmuth (2002). These algorithms use weight sharing schemes to perform as well as the best sequence of experts with a limited number of changes. Here we show, with a common, general, and simpler analysis, that weight sharing in fact achieves much more than what it was designed for. We use it to simultaneously prove new shifting regret bounds for online convex optimization on the simplex in terms of the total variation distance as well as new bounds for the related setting of adaptive regret. Finally, we exhibit the first logarithmic shifting bounds for exp-concave loss functions on the simplex.
online_learning  to_read  individual_sequence_prediction  non-stationarity  re:growing_ensemble_project  in_NB  low-regret_learning  have_read 
12 weeks ago by cshalizi
[0801.0327] Nonparametric sequential prediction of time series
"Time series prediction covers a vast field of every-day statistical applications in medical, environmental and economic domains. In this paper we develop nonparametric prediction strategies based on the combination of a set of 'experts' and show the universal consistency of these strategies under a minimum of conditions. We perform an in-depth analysis of real-world data sets and show that these nonparametric strategies are more flexible, faster and generally outperform ARMA methods in terms of normalized cumulative prediction error."
in_NB  time_series  nonparametrics  prediction  statistics  to_teach:undergrad-ADA  re:growing_ensemble_project 
february 2012 by cshalizi
[math/0701419] Strategies for prediction under imperfect monitoring
"We propose simple randomized strategies for sequential prediction under imperfect monitoring, that is, when the forecaster does not have access to the past outcomes but rather to a feedback signal. The proposed strategies are consistent in the sense that they achieve, asymptotically, the best possible average reward. It was Rustichini (1999) who first proved the existence of such consistent predictors. The forecasters presented here offer the first constructive proof of consistency. Moreover, the proposed algorithms are computationally efficient. We also establish upper bounds for the rates of convergence. In the case of deterministic feedback, these rates are optimal up to logarithmic terms."
to:NB  prediction  individual_sequence_prediction  learning_in_games  re:growing_ensemble_project 
february 2012 by cshalizi
[1202.4294] Prediction of quantiles by statistical learning and application to GDP forecasting
"In this paper, we tackle the problem of prediction and confidence intervals for time series using a statistical learning approach and quantile loss functions. In a first time, we show that the Gibbs estimator (also known as Exponentially Weighted aggregate) is able to predict as well as the best predictor in a given family for a wide set of loss functions. In particular, using the quantile loss function of Koenker and Bassett (1978), this allows to build confidence intervals. We apply these results to the problem of prediction and confidence regions for the French Gross Domestic Product (GDP) growth, with promising results."
in_NB  to_read  prediction  confidence_sets  learning_theory  re:your_favorite_dsge_sucks  re:growing_ensemble_project 
february 2012 by cshalizi
[1202.3079] Towards minimax policies for online linear optimization with bandit feedback
"We address the online linear optimization problem with bandit feedback. Our contribution is twofold. First, we provide an algorithm (based on exponential weights) with a regret of order $sqrt{d n log N}$ for any finite action set with $N$ actions, under the assumption that the instantaneous loss is bounded by 1. This shaves off an extraneous $sqrt{d}$ factor compared to previous works, and gives a regret bound of order $d sqrt{n log n}$ for any compact set of actions. Without further assumptions on the action set, this last bound is minimax optimal up to a logarithmic factor. Interestingly, our result also shows that the minimax regret for bandit linear optimization with expert advice in $d$ dimension is the same as for the basic $d$-armed bandit with expert advice. Our second contribution is to show how to use the Mirror Descent algorithm to obtain computationally efficient strategies with minimax optimal regret bounds in specific examples. More precisely we study two canonical action sets: the hypercube and the Euclidean ball. In the former case, we obtain the first computationally efficient algorithm with a $d sqrt{n}$ regret, thus improving by a factor $sqrt{d log n}$ over the best known result for a computationally efficient algorithm. In the latter case, our approach gives the first algorithm with a $sqrt{d n log n}$ regret, again shaving off an extraneous $sqrt{d}$ compared to previous works."
in_NB  online_learning  decision_theory  optimization  re:growing_ensemble_project  cesa-bianchi.nicolo  kakade.sham  bubeck.sebastien 
february 2012 by cshalizi
[1201.5568] Dynamic trees for streaming and massive data contexts
"Data collection at a massive scale is becoming ubiquitous in a wide variety of settings, from vast offline databases to streaming real-time information. Learning algorithms deployed in such contexts must rely on single-pass inference, where the data history is never revisited. In streaming contexts, learning must also be temporally adaptive to remain up-to-date against unforeseen changes in the data generating mechanism. Although rapidly growing, the online Bayesian inference literature remains challenged by massive data and transient, evolving data streams. Non-parametric modelling techniques can prove particularly ill-suited, as the complexity of the model is allowed to increase with the sample size. In this work, we take steps to overcome these challenges by porting standard streaming techniques, like data discarding and downweighting, into a fully Bayesian framework via the use of informative priors and active learning heuristics. We showcase our methods by augmenting a modern non-parametric modelling framework, dynamic trees, and illustrate its performance on a number of practical examples. The end product is a powerful streaming regression and classification tool, whose performance compares favourably to the state-of-the-art."
to:NB  machine_learning  non-stationarity  statistics  data_mining  to_read  re:growing_ensemble_project 
january 2012 by cshalizi
[1111.6337] Regret Bound by Variation for Online Convex Optimization
"In citep{Hazan-2008-extract}, the authors showed that the regret of online linear optimization can be bounded by the total variation of the cost vectors. In this paper, we extend this result to general online convex optimization. We first analyze the limitations of the algorithm in citep{Hazan-2008-extract} when applied it to online convex optimization. We then present two algorithms for online convex optimization whose regrets are bounded by the variation of cost functions. We finally consider the bandit setting, and present a randomized algorithm for online bandit convex optimization with a variation-based regret bound. We show that the regret bound for online bandit convex optimization is optimal when the variation of cost functions is independent of the number of trials."
in_NB  to_read  re:growing_ensemble_project  learning_theory  individual_sequence_prediction 
december 2011 by cshalizi
[1110.6416] Adaptive Hedge
"Most methods for decision-theoretic online learning are based on the Hedge algorithm, which takes a parameter called the learning rate. In most previous analyses the learning rate was carefully tuned to obtain optimal worst-case performance, leading to suboptimal performance on easy instances, for example when there exists an action that is significantly better than all others. We propose a new way of setting the learning rate, which adapts to the difficulty of the learning problem: in the worst case our procedure still guarantees optimal performance, but on easy instances it achieves much smaller regret. In particular, our adaptive method achieves constant regret in a probabilistic setting, when there exists an action that on average obtains strictly smaller loss than all other actions. We also provide a simulation study comparing our approach to existing methods."
to:NB  to_read  re:growing_ensemble_project  online_learning  prediction  grunwald.peter  low-regret_learning 
october 2011 by cshalizi
[1110.2755] Efficient Tracking of Large Classes of Experts
"In the framework for prediction of individual sequences, sequential prediction methods are to be constructed that perform nearly as well as the best expert from a given class. We consider prediction strategies that compete with the class of switching strategies that can segment a given sequence into several blocks, and follow the advice of a different "base" expert in each block. As usual, the performance of the algorithm is measured by the regret defined as the excess loss relative to the best switching strategy %(with an arbitrary number of switches) selected in hindsight for the particular sequence to be predicted. In this paper we construct %strongly sequential (i.e., horizon-independent) prediction strategies of low computational cost for the case where the set of base experts is large. In particular we derive a family of efficient tracking algorithms that, for any prediction algorithm $A$ designed for the base class, can be implemented with time and space complexity $O(n^{gamma} log n)$ times larger than that of $A$, where $n$ is the time horizon and $gamma ge 0$ is a parameter of the algorithm. With $A$ properly chosen, our algorithm achieves a regret bound of optimal order for $gamma>0$, and only $O(log n)$ times larger than the optimal order for $gamma=0$ for all typical regret bound types we examined. For example, for predicting binary sequences with switching parameters, our method achieves the optimal $O(log n)$ regret rate with time complexity $O(n^{1+gamma}log n)$ for any $gammain (0,1)$."
to:NB  to_read  re:growing_ensemble_project  learning_theory  individual_sequence_prediction  lugosi.gabor 
october 2011 by cshalizi
[1110.2529] The Generalization Ability of Online Algorithms for Dependent Data
"We study the generalization performance of arbitrary online learning algorithms trained on samples coming from a dependent source of data. We show that the generalization error of any stable online algorithm concentrates around its regret--an easily computable statistic of the online performance of the algorithm--when the underlying ergodic process is $beta$- or $phi$-mixing. We show high probability error bounds assuming the loss function is convex, and we also establish sharp convergence rates and deviation bounds for strongly convex losses and several linear prediction problems such as linear and logistic regression, least-squares SVM, and boosting on dependent data. In addition, our results have straightforward applications to stochastic optimization with dependent data, and our analysis requires only martingale convergence arguments; we need not rely on more powerful statistical tools such as empirical process theory."
in_NB  learning_theory  individual_sequence_prediction  ergodic_theory  mixing  re:growing_ensemble_project  re:XV_for_mixing  stability_of_learning  concentration_of_measure  have_read  re:your_favorite_dsge_sucks 
october 2011 by cshalizi
Phys. Rev. E 84, 046702 (2011): Nonparametric segmentation of nonstationary time series
"The nonstationary evolution of observable quantities in complex systems can frequently be described as a juxtaposition of quasistationary spells. Given that standard theoretical and data analysis approaches usually rely on the assumption of stationarity, it is important to detect in real time series intervals holding that property. With that aim, we introduce a segmentation algorithm based on a fully nonparametric approach. We illustrate its applicability through the analysis of real time series presenting diverse degrees of nonstationarity, thus showing that this segmentation procedure generalizes and allows one to uncover features unresolved by previous proposals based on the discrepancy of low order statistical moments only."
in_NB  statistics  change-point_problem  time_series  nonparametrics  re:growing_ensemble_project 
october 2011 by cshalizi
Phys. Rev. E 82, 056206 (2010): Forecasting the evolution of nonlinear and nonstationary systems using recurrence-based local Gaussian process models
"...combining nonparametric Gaussian process (GP) modeling with certain local topological considerations ... for prediction (one-step look ahead) of ... nonlinear and nonstationary dynamics. ... partition ... trajectories into multiple near-stationary segments by aligning the boundaries of the partitions with those of the piecewise affine projections of the underlying dynamic system... alignment is achieved through the consideration of recurrence and other local topological properties ... forecasting in Lorenz system under different levels of induced noise and nonstationarity, synthetic heart-rate signals, and a real-world time-series from an industrial operation known to exhibit highly nonlinear and nonstationary dynamics. ... local Gaussian process can significantly outperform not just classical system identification, neural network and nonparametric models, but also the sequential Bayesian Monte Carlo methods in terms of prediction accuracy and computational speed."
time_series  prediction  non-stationarity  gaussian_processes  re:growing_ensemble_project  to_read 
november 2010 by cshalizi
[1006.0475] Prediction with Advice of Unknown Number of Experts
"In the framework of prediction with expert advice, we consider a recently introduced kind of regret bounds: the bounds that depend on the effective instead of nominal number of experts. In contrast to the NormalHedge bound, which mainly depends on the effective number of experts and also weakly depends on the nominal one, we obtain a bound that does not contain the nominal number of experts at all. We use the defensive forecasting method and introduce an application of defensive forecasting to multivalued supermartingales."
prediction  learning_theory  re:growing_ensemble_project 
june 2010 by cshalizi

Copy this bookmark:



description:


tags: