cshalizi + prediction   100

[1205.3845] Forecasting with Historical Data or Process Knowledge under Misspecification: A Comparison
"When faced with the task of forecasting a dynamic system, practitioners often have available historical data, knowledge of the system, or a combination of both. While intuition dictates that perfect knowledge of the system should in theory yield perfect forecasting, often knowledge of the system is only partially known, known up to parameters, or known incorrectly. In contrast, forecasting using previous data without any process knowledge might result in accurate prediction for simple systems, but will fail for highly nonlinear and chaotic systems. In this paper, the authors demonstrate how even in chaotic systems, forecasting with historical data is preferable to using process knowledge if this knowledge exhibits certain forms of misspecification. Through an extensive simulation study, a range of misspecification and forecasting scenarios are examined with the goal of gaining an improved understanding of the circumstances under which forecasting from historical data is to be preferred over using process knowledge."
to:NB  to_read  prediction  time_series  misspecification  re:growing_ensemble_project 
7 days ago by cshalizi
[1205.2609] Which Spatial Partition Trees are Adaptive to Intrinsic Dimension?
"Recent theory work has found that a special type of spatial partition tree - called a random projection tree - is adaptive to the intrinsic dimension of the data from which it is built. Here we examine this same question, with a combination of theory and experiments, for a broader class of trees that includes k-d trees, dyadic trees, and PCA trees. Our motivation is to get a feel for (i) the kind of intrinsic low dimensional structure that can be empirically verified, (ii) the extent to which a spatial partition can exploit such structure, and (iii) the implications for standard statistical tasks such as regression, vector quantization, and nearest neighbor search."
to:NB  decision_trees  prediction  regression  statistics  dimension_reduction  machine_learning 
13 days ago by cshalizi
[1205.1406] Graph Prediction in a Low-Rank and Autoregressive Setting
"We study the problem of prediction for evolving graph data. We formulate the problem as the minimization of a convex objective encouraging sparsity and low-rank of the solution, that reflect natural graph properties. The convex formulation allows to obtain oracle inequalities and efficient solvers. We provide empirical results for our algorithm and comparison with competing methods, and point out two open questions related to compressed sensing and algebra of low-rank and sparse matrices."
to:NB  network_data_analysis  prediction  statistics  low-rank_approximation 
19 days ago by cshalizi
Clarke , Clarke : Prediction in several conventional contexts
"We review predictive techniques from several traditional branches of statistics. Starting with prediction based on the normal model and on the empirical distribution function, we proceed to techniques for various forms of regression and classification. Then, we turn to time series, longitudinal data, and survival analysis. Our focus throughout is on the mechanics of prediction more than on the properties of predictors."

(to_teach tags are tentative.)
to:NB  prediction  statistics  classifiers  regression  to_teach:undergrad-ADA  to_teach:data-mining 
20 days ago by cshalizi
Ehm , Gneiting : Local proper scoring rules of order two
"Scoring rules assess the quality of probabilistic forecasts, by assigning a numerical score based on the predictive distribution and on the event or value that materializes. A scoring rule is proper if it encourages truthful reporting. It is local of order k if the score depends on the predictive density only through its value and the values of its derivatives of order up to k at the realizing event. Complementing fundamental recent work by Parry, Dawid and Lauritzen, we characterize the local proper scoring rules of order 2 relative to a broad class of Lebesgue densities on the real line, using a different approach. In a data example, we use local and nonlocal proper scoring rules to assess statistically postprocessed ensemble weather forecasts."
to:NB  prediction  scoring_rules  statistics  gneiting.tilmann 
21 days ago by cshalizi
Dawid , Lauritzen , Parry : Proper local scoring rules on discrete sample spaces
"A scoring rule is a loss function measuring the quality of a quoted probability distribution Q for a random variable X, in the light of the realized outcome x of X; it is proper if the expected score, under any distribution P for X, is minimized by quoting Q = P. Using the fact that any differentiable proper scoring rule on a finite sample space is the gradient of a concave homogeneous function, we consider when such a rule can be local in the sense of depending only on the probabilities quoted for points in a nominated neighborhood of x. Under mild conditions, we characterize such a proper local scoring rule in terms of a collection of homogeneous functions on the cliques of an undirected graph on the space . A useful property of such rules is that the quoted distribution Q need only be known up to a scale factor. Examples of the use of such scoring rules include Besag’s pseudo-likelihood and Hyvärinen’s method of ratio matching."
to:NB  prediction  scoring_rules  statistics  lauritzen.steffen  dawid.philip 
21 days ago by cshalizi
Parry , Dawid , Lauritzen : Proper local scoring rules
"We investigate proper scoring rules for continuous distributions on the real line. It is known that the log score is the only such rule that depends on the quoted density only through its value at the outcome that materializes. Here we allow further dependence on a finite number m of derivatives of the density at the outcome, and describe a large class of such m-local proper scoring rules: these exist for all even m but no odd m. We further show that for m ≥ 2 all such m-local rules can be computed without knowledge of the normalizing constant of the distribution."
to:NB  prediction  scoring_rules  lauritzen.steffen  dawid.philip  statistics 
21 days ago by cshalizi
[1204.6441] "I Wanted to Predict Elections with Twitter and all I got was this Lousy Paper" -- A Balanced Survey on Election Prediction using Twitter Data
"Predicting X from Twitter is a popular fad within the Twitter research subculture. It seems both appealing and relatively easy. Among such kind of studies, electoral prediction is maybe the most attractive, and at this moment there is a growing body of literature on such a topic. This is not only an interesting research problem but, above all, it is extremely difficult. However, most of the authors seem to be more interested in claiming positive results than in providing sound and reproducible methods. It is also especially worrisome that many recent papers seem to only acknowledge those studies supporting the idea of Twitter predicting elections, instead of conducting a balanced literature review showing both sides of the matter. After reading many of such papers I have decided to write such a survey myself. Hence, in this paper, every study relevant to the matter of electoral prediction using social media is commented. From this review it can be concluded that the predictive power of Twitter regarding elections has been greatly exaggerated, and that hard research problems still lie ahead."
to:NB  social_media  data_mining  prediction  have_read 
24 days ago by cshalizi
Assessing gross domestic product and inflation probability forecasts derived from Bank of England fan charts - Galbraith - 2011 - Journal of the Royal Statistical Society: Series A (Statistics in Society) - Wiley Online Library
"Density forecasts, including the pioneering Bank of England ‘fan charts’, are often used to produce forecast probabilities of a particular event. We use the Bank of England's forecast densities to calculate the forecast probability that annual rates of inflation and output growth exceed given thresholds. We subject these implicit probability forecasts to graphical and numerical diagnostic checks. We measure both their calibration and their resolution, providing both statistical and graphical interpretations of the results. The results reinforce earlier evidence on limitations of these forecasts and provide new evidence on their information content and on the relative performance of inflation and gross domestic product growth forecasts. In particular, gross domestic product forecasts show little or no ability to predict periods of low growth beyond the current quarter, in part because of the important role of data revisions."
to:NB  prediction  statistics  calibration  macroeconomics  to_teach:undergrad-ADA 
6 weeks ago by cshalizi
Stock Market Behavior Predicted by Rat Neurons
"We here report for the first time, to the best of our knowledge, rat motor cortex neurons predicting the behavior of the American stock market. We implanted the motor cortex of the brains of rats with silicon electrodes. Using the correlation technique, we monitored the activity of neurons in our rats while simultaneously tracking the activity of stocks in the U.S. stock market."
have_read  to:NB  neuroscience  finance  statistics  prediction  multiple_testing  bad_data_analysis  funny:geeky  funny:malicious  via:mejn  to:blog  to_teach:undergrad-ADA 
8 weeks ago by cshalizi
[1203.5422] Distribution Free Prediction Bands
"We study distribution free, nonparametric prediction bands with a special focus on their finite sample behavior. First we investigate and develop different notions of finite sample coverage guarantees. Then we give a new prediction band estimator by combining the idea of "conformal prediction" (Vovk et al. 2009) with nonparametric conditional density estimation. The proposed estimator, called COPS (Conformal Optimized Prediction Set), always has finite sample guarantee in a stronger sense than the original conformal prediction estimator. Under regularity conditions the estimator converges to an oracle band at a minimax optimal rate. A fast approximation algorithm and a data driven method for selecting the bandwidth are developed. The method is illustrated first in simulated data. Then, an application shows that the proposed method gives desirable prediction intervals in an automatic way, as compared to the classical linear regression modeling."
to:NB  prediction  statistics  nonparametrics  kith_and_kin  wasserman.larry  lei.jing  heard  confidence_sets  density_estimation 
8 weeks ago by cshalizi
Universality of Bayesian Predictions
"This paper studies the theoretical properties of Bayesian predictions and shows that under minimal conditions we can derive finite sample bounds for the loss incurred using Bayesian predictions under the Kullback-Leibler divergence. In particular, the concept of universality of predictions is discussed and universality is established for Bayesian predictions in a variety of settings. These include predictions under almost arbitrary loss functions, model averaging, predictions in a non-stationary environment and under model misspecification."
in_NB  to_read  statistics  bayesian_consistency  prediction  misspecification  universal_prediction 
12 weeks ago by cshalizi
[0805.3032] Testing earthquake predictions
"Statistical tests of earthquake predictions require a null hypothesis to model occasional chance successes. To define and quantify `chance success' is knotty. Some null hypotheses ascribe chance to the Earth: Seismicity is modeled as random. The null distribution of the number of successful predictions -- or any other test statistic -- is taken to be its distribution when the fixed set of predictions is applied to random seismicity. Such tests tacitly assume that the predictions do not depend on the observed seismicity. Conditioning on the predictions in this way sets a low hurdle for statistical significance. Consider this scheme: When an earthquake of magnitude 5.5 or greater occurs anywhere in the world, predict that an earthquake at least as large will occur within 21 days and within an epicentral distance of 50 km. We apply this rule to the Harvard centroid-moment-tensor (CMT) catalog for 2000--2004 to generate a set of predictions. The null hypothesis is that earthquake times are exchangeable conditional on their magnitudes and locations and on the predictions--a common ``nonparametric'' assumption in the literature. We generate random seismicity by permuting the times of events in the CMT catalog. We consider an event successfully predicted only if (i) it is predicted and (ii) there is no larger event within 50 km in the previous 21 days. The $P$-value for the observed success rate is $<0.001$: The method successfully predicts about 5% of earthquakes, far better than `chance,' because the predictor exploits the clustering of earthquakes -- occasional foreshocks -- which the null hypothesis lacks. Rather than condition on the predictions and use a stochastic model for seismicity, it is preferable to treat the observed seismicity as fixed, and to compare the success rate of the predictions to the success rate of simple-minded predictions like those just described. If the proffered predictions do no better than a simple scheme, they have little value."
have_read  to:NB  statistics  geology  prediction  earthquakes  to_teach:undergrad-ADA  to_teach:data-mining 
12 weeks ago by cshalizi
[0801.0327] Nonparametric sequential prediction of time series
"Time series prediction covers a vast field of every-day statistical applications in medical, environmental and economic domains. In this paper we develop nonparametric prediction strategies based on the combination of a set of 'experts' and show the universal consistency of these strategies under a minimum of conditions. We perform an in-depth analysis of real-world data sets and show that these nonparametric strategies are more flexible, faster and generally outperform ARMA methods in terms of normalized cumulative prediction error."
in_NB  time_series  nonparametrics  prediction  statistics  to_teach:undergrad-ADA  re:growing_ensemble_project 
february 2012 by cshalizi
[math/0701419] Strategies for prediction under imperfect monitoring
"We propose simple randomized strategies for sequential prediction under imperfect monitoring, that is, when the forecaster does not have access to the past outcomes but rather to a feedback signal. The proposed strategies are consistent in the sense that they achieve, asymptotically, the best possible average reward. It was Rustichini (1999) who first proved the existence of such consistent predictors. The forecasters presented here offer the first constructive proof of consistency. Moreover, the proposed algorithms are computationally efficient. We also establish upper bounds for the rates of convergence. In the case of deterministic feedback, these rates are optimal up to logarithmic terms."
to:NB  prediction  individual_sequence_prediction  learning_in_games  re:growing_ensemble_project 
february 2012 by cshalizi
[1202.4294] Prediction of quantiles by statistical learning and application to GDP forecasting
"In this paper, we tackle the problem of prediction and confidence intervals for time series using a statistical learning approach and quantile loss functions. In a first time, we show that the Gibbs estimator (also known as Exponentially Weighted aggregate) is able to predict as well as the best predictor in a given family for a wide set of loss functions. In particular, using the quantile loss function of Koenker and Bassett (1978), this allows to build confidence intervals. We apply these results to the problem of prediction and confidence regions for the French Gross Domestic Product (GDP) growth, with promising results."
in_NB  to_read  prediction  confidence_sets  learning_theory  re:your_favorite_dsge_sucks  re:growing_ensemble_project 
february 2012 by cshalizi
Proving Induction
"The hard problem of induction is to argue without begging the question that inductive inference, applied properly in the proper circumstances, is con- ducive to truth. A recent theorem seems to show that the hard problem has a deductive solution. The theorem, provable in , states that a predictive func- tion M exists with the following property: whatever world we live in, M correctly predicts the world’s present state given its previous states at all times apart from a well-ordered subset. On the usual model of time a well-ordered subset is small relative to the set of all times. M’s existence therefore seems to provide a solution to the hard problem.
My paper argues for two conclusions. First, the theorem does not solve the hard problem of induction. More positively though, it solves a version of the problem in which the structure of time is given modulo our choice of set theory."

--- Seems prodigiously strange, on first glance. Ask the people downstairs and up the hall what they think of it?
to:NB  induction  set_theory  philosophy_of_science  prediction 
february 2012 by cshalizi
Stein : When does the screening effect hold?
"When using optimal linear prediction to interpolate point observations of a mean square continuous stationary spatial process, one often finds that the interpolant mostly depends on those observations located nearest to the predictand. This phenomenon is called the screening effect. However, there are situations in which a screening effect does not hold in a reasonable asymptotic sense, and theoretical support for the screening effect is limited to some rather specialized settings for the observation locations. This paper explores conditions on the observation locations and the process model under which an asymptotic screening effect holds. A series of examples shows the difficulty in formulating a general result, especially for processes with different degrees of smoothness in different directions, which can naturally occur for spatial-temporal processes. These examples lead to a general conjecture and two special cases of this conjecture are proven. The key condition on the process is that its spectral density should change slowly at high frequencies. Models not satisfying this condition of slow high-frequency change should be used with caution."
to:NB  spatial_statistics  smoothing  statistics  prediction 
january 2012 by cshalizi
Forecasting Time Series with Complex Seasonal Patterns Using Exponential Smoothing
An innovations state space modeling framework is introduced for forecasting complex seasonal time series such as those with multiple seasonal periods, high-frequency seasonality, non-integer seasonality, and dual-calendar effects. The new framework incorporates Box–Cox transformations, Fourier representations with time varying coefficients, and ARMA error correction. Likelihood evaluation and analytical expressions for point forecasts and interval predictions under the assumption of Gaussian errors are derived, leading to a simple, comprehensive approach to forecasting complex seasonal time series. A key feature of the framework is that it relies on a new method that greatly reduces the computational burden in the maximum likelihood estimation. The modeling framework is useful for a broad range of applications, its versatility being illustrated in three empirical studies. In addition, the proposed trigonometric formulation is presented as a means of decomposing complex seasonal time series, and it is shown that this decomposition leads to the identification and extraction of seasonal components which are otherwise not apparent in the time series plot itself.
to:NB  statistics  time_series  prediction 
january 2012 by cshalizi
[1112.6390] Early Warning with Calibrated and Sharper Probabilistic Forecasts
"Given a nonlinear model, a probabilistic forecast may be obtained by Monte Carlo simulations. At a given forecast horizon, Monte Carlo simulations yield sets of discrete forecasts, which can be converted to density forecasts. The resulting density forecasts will inevitably be downgraded by model mis-specification. In order to enhance the quality of the density forecasts, one can mix them with the unconditional density. This paper examines the value of combining conditional density forecasts with the unconditional density. The findings have positive implications for issuing early warnings in different disciplines including economics and meteorology, but UK inflation forecasts are considered as an example." --- Better than conformal predictors?
to:NB  prediction  statistics  ensemble_methods  density_estimation 
december 2011 by cshalizi
Clements , Schoenberg , Schorlemmer : Residual analysis methods for space–time point processes with applications to earthquake forecast models in California
"Modern, powerful techniques for the residual analysis of spatial-temporal point process models are reviewed and compared. These methods are applied to California earthquake forecast models used in the Collaboratory for the Study of Earthquake Predictability (CSEP). Assessments of these earthquake forecasting models have previously been performed using simple, low-power means such as the L-test and N-test. We instead propose residual methods based on rescaling, thinning, superposition, weighted K-functions and deviance residuals. Rescaled residuals can be useful for assessing the overall fit of a model, but as with thinning and superposition, rescaling is generally impractical when the conditional intensity λ is volatile. While residual thinning and superposition may be useful for identifying spatial locations where a model fits poorly, these methods have limited power when the modeled conditional intensity assumes extremely low or high values somewhere in the observation region, and this is commonly the case for earthquake forecasting models. A recently proposed hybrid method of thinning and superposition, called super-thinning, is a more powerful alternative. The weighted K-function is powerful for evaluating the degree of clustering or inhibition in a model. Competing models are also compared using pixel-based approaches, such as Pearson residuals and deviance residuals. The different residual analysis techniques are demonstrated using the CSEP models and are used to highlight certain deficiencies in the models, such as the overprediction of seismicity in inter-fault zones for the model proposed by Helmstetter, Kagan and Jackson [Seismological Research Letters 78 (2007) 78–86], the underprediction of the model proposed by Kagan, Jackson and Rong [Seismological Research Letters 78 (2007) 94–98] in forecasting seismicity around the Imperial, Laguna Salada, and Panamint clusters, and the underprediction of the model proposed by Shen, Jackson and Kagan [Seismological Research Letters 78 (2007) 116–120] in forecasting seismicity around the Laguna Salada, Baja, and Panamint clusters."
to:NB  point_processes  spatial_statistics  time_series  statistics  model_selection  model-checking  prediction  earthquakes  geology 
december 2011 by cshalizi
[1112.1674] Predicting Failures of Point Forecasts
"The predictability of errors in deterministic temperature forecasts is investigated. More precisely, the aim is to issue warnings whenever the differences between forecast and verification exceed a given threshold. The warnings are generated by analyzing the output of an ensemble forecast system in terms of a decision making approach. The quality of the resulting predictions is evaluated by computing receiver operating characteristics, the Brier score, and the Ignorance score. Special emphasis is also given to the question whether rare events are better predictable."
to:NB  prediction  statistics  time_series  dynamical_systems 
december 2011 by cshalizi
[1111.6174] Resolving conflicts between statistical methods by probability combination: Application to empirical Bayes analyses of genomic data
"In the typical analysis of a data set, a single method is selected for statistical reporting even when equally applicable methods yield very different results. Examples of equally applicable methods can correspond to those of different ancillary statistics in frequentist inference and of different prior distributions in Bayesian inference. More broadly, choices are made between parametric and nonparametric methods and between frequentist and Bayesian methods.
Rather than choosing a single method, it can be safer, in a game-theoretic sense, to combine those that are equally appropriate in light of the available information. Since methods of combining subjectively assessed probability distributions are not objective enough for that purpose, this paper introduces a method of distribution combination that does not require any assignment of distribution weights. It does so by formalizing a hedging strategy in terms of a game between three players: nature, a statistician combining distributions, and a statistician refusing to combine distributions. The optimal move of the first statistician reduces to the solution of a simpler problem of selecting an estimating distribution that minimizes the Kullback-Leibler loss maximized over the plausible distributions to be combined. The resulting combined distribution is a linear combination of the most extreme of the distributions to be combined that are scientifically plausible. The optimal weights are close enough to each other that no extreme distribution dominates the others.
The new methodology is illustrated by combining conflicting empirical Bayes methodologies in the context of gene expression data analysis."
in_NB  ensemble_methods  statistics  prediction  bickel.david 
december 2011 by cshalizi
Quantile regression for longitudinal data based on latent Markov subject-specific parameters Alessio Farcomeni - Statistics and Computing, Volume 22, Number 1
"We propose a latent Markov quantile regression model for longitudinal data with non-informative drop-out. The observations, conditionally on covariates, are modeled through an asymmetric Laplace distribution. Random effects are assumed to be time-varying and to follow a first order latent Markov chain. This latter assumption is easily interpretable and allows exact inference through an ad hoc EM-type algorithm based on appropriate recursions. Finally, we illustrate the model on a benchmark data set."
to:NB  regression  time_series  prediction  markov_models 
december 2011 by cshalizi
Prediction-based regularization using data augmented regression - Statistics and Computing, Volume 22, Number 1
"The role of regularization is to control fitted model complexity and variance by penalizing (or constraining) models to be in an area of model space that is deemed reasonable, thus facilitating good predictive performance. This is typically achieved by penalizing a parametric or non-parametric representation of the model. In this paper we advocate instead the use of prior knowledge or expectations about the predictions of models for regularization. This has the twofold advantage of allowing a more intuitive interpretation of penalties and priors and explicitly controlling model extrapolation into relevant regions of the feature space. This second point is especially critical in high-dimensional modeling situations, where the curse of dimensionality implies that new prediction points usually require extrapolation. We demonstrate that prediction-based regularization can, in many cases, be stochastically implemented by simply augmenting the dataset with Monte Carlo pseudo-data. We investigate the range of applicability of this implementation. An asymptotic analysis of the performance of Data Augmented Regression (DAR) in parametric and non-parametric linear regression, and in nearest neighbor regression, clarifies the regularizing behavior of DAR. We apply DAR to simulated and real data, and show that it is able to control the variance of extrapolation, while maintaining, and often improving, predictive accuracy."
in_NB  to_read  statistics  prediction  estimation  hooker.giles  regression  to_teach:undergrad-ADA  to_teach:data-mining  curse_of_dimensionality 
december 2011 by cshalizi
Lai , Gross , Shen : Evaluating probability forecasts
"Probability forecasts of events are routinely used in climate predictions, in forecasting default probabilities on bank loans or in estimating the probability of a patient’s positive response to treatment. Scoring rules have long been used to assess the efficacy of the forecast probabilities after observing the occurrence, or nonoccurrence, of the predicted events. We develop herein a statistical theory for scoring rules and propose an alternative approach to the evaluation of probability forecasts. This approach uses loss functions relating the predicted to the actual probabilities of the events and applies martingale theory to exploit the temporal structure between the forecast and the subsequent occurrence or nonoccurrence of the event."
in_NB  statistics  prediction  calibration  to_read  to_teach:undergrad-ADA 
november 2011 by cshalizi
[1111.1386] Confidence Estimation in Structured Prediction
"Structured classification tasks such as sequence labeling and dependency parsing have seen much interest by the Natural Language Processing and the machine learning communities. Several online learning algorithms were adapted for structured tasks such as Perceptron, Passive- Aggressive and the recently introduced Confidence-Weighted learning . These online algorithms are easy to implement, fast to train and yield state-of-the-art performance. However, unlike probabilistic models like Hidden Markov Model and Conditional random fields, these methods generate models that output merely a prediction with no additional information regarding confidence in the correctness of the output. In this work we fill the gap proposing few alternatives to compute the confidence in the output of non-probabilistic algorithms.We show how to compute confidence estimates in the prediction such that the confidence reflects the probability that the word is labeled correctly. We then show how to use our methods to detect mislabeled words, trade recall for precision and active learning. We evaluate our methods on four noun-phrase chunking and named entity recognition sequence labeling tasks, and on dependency parsing for 14 languages."
to:NB  machine_learning  confidence_sets  prediction  natural_language_processing 
november 2011 by cshalizi
[1111.1418] Efficient Nonparametric Conformal Prediction Regions
Yay, it's out! "We investigate and extend the conformal prediction method due to Vovk,Gammerman and Shafer (2005) to construct nonparametric prediction regions. These regions have guaranteed distribution free, finite sample coverage, without any assumptions on the distribution or the bandwidth. Explicit convergence rates of the loss function are established for such regions under standard regularity conditions. Approximations for simplifying implementation and data driven bandwidth selection methods are also discussed. The theoretical properties of our method are demonstrated through simulations."
in_NB  prediction  statistics  confidence_sets  nonparametrics  kith_and_kin  wasserman.larry  robins.james  have_read  density_estimation 
november 2011 by cshalizi
[1110.6416] Adaptive Hedge
"Most methods for decision-theoretic online learning are based on the Hedge algorithm, which takes a parameter called the learning rate. In most previous analyses the learning rate was carefully tuned to obtain optimal worst-case performance, leading to suboptimal performance on easy instances, for example when there exists an action that is significantly better than all others. We propose a new way of setting the learning rate, which adapts to the difficulty of the learning problem: in the worst case our procedure still guarantees optimal performance, but on easy instances it achieves much smaller regret. In particular, our adaptive method achieves constant regret in a probabilistic setting, when there exists an action that on average obtains strictly smaller loss than all other actions. We also provide a simulation study comparing our approach to existing methods."
to:NB  to_read  re:growing_ensemble_project  online_learning  prediction  grunwald.peter  low-regret_learning 
october 2011 by cshalizi
Universiality of Bayesian Predictions
"This paper studies the theoretical properties of Bayesian predictions and shows that under minimal conditions we can derive finite sample bounds for the loss incurred using Bayesian predictions under the Kullback-Leibler divergence. In particular, the concept of universality of predictions is discussed and universality is established for Bayesian predictions in a variety of settings. These include predictions under almost arbitrary loss functions, model averaging, predictions in a non-stationary environment and under model misspecification."
statistics  prediction  universal_prediction  bayesianism  to:NB  to_read  re:bayes_as_evol 
october 2011 by cshalizi
How Useful are Estimated DSGE Model Forecasts? by Rochelle Edge, Refet Gurkaynak :: SSRN
The methodological ideas here are suspect.  It is true that there is not much to predict about an in-control system, and what is happening is largely random and so unpredictable, so that even the true model would show low forecasting ability.  The question however is why we are supposed to think that the DSGE _does_ give us good information about counterfactuals.  If you could show that it had much better predictive performance than baselines like constants or random walks during _out-of-control_ periods, that would be something; but they don't.
re:your_favorite_dsge_sucks  dsges  prediction  economics  macroeconomics  time_series  statistics  in_NB  have_read  to:blog 
july 2011 by cshalizi
[1107.0013] Likelihood based observability analysis and confidence intervals for predictions of dynamic models
"Mechanistic dynamic models of biochemical networks such as Ordinary Differential Equations (ODEs) contain unknown parameters like the reaction rate constants and the initial concentrations of the compounds. The large number of parameters as well as their nonlinear impact on the model responses hamper the determination of confidence regions for parameter estimates. At the same time, classical approaches translating the uncertainty of the parameters into confidence intervals for model predictions are hardly feasible. In this article it is shown that a so-called prediction profile likelihood yields reliable confidence intervals for model predictions, despite arbitrarily complex and high-dimensional shapes of the confidence regions for the estimated parameters. Prediction confidence intervals of the dynamic states allow a data-based observability analysis. The approach renders the issue of sampling a high-dimensional parameter space into evaluating one-dimensional prediction spaces."
dynamical_systemss  statistics  statistical_inference_for_stochastic_processes  prediction  confidence_sets  to_read 
july 2011 by cshalizi
Making and Evaluating Point Forecasts (Gneiting)
"Typically, point forecasting methods are compared and assessed by means of an error measure or scoring function, with the absolute error and the squared error being key examples. The individual scores are averaged over forecast cases, to result in a summary measure of the predictive performance, such as the mean absolute error or the mean squared error. I demonstrate that this common practice can lead to grossly misguided inferences, unless the scoring function and the forecasting task are carefully matched...."
prediction  statistics  calibration  machine_learning  decision_theory  gneiting.tilmann  have_read 
july 2011 by cshalizi
[0711.3856] Forward estimation for ergodic time series
"The forward estimation problem for stationary and ergodic time series $\{X_n\}_{n=0}^{\infty}$ taking values from a finite alphabet ${\cal X}$ is to estimate the probability that $X_{n+1}=x$ based on the observations $X_i$, $0\le i\le n$ without prior knowledge of the distribution of the process $\{X_n\}$. We present a simple procedure $g_n$ which is evaluated on the data segment $(X_0,...,X_n)$ and for which, ${\rm error}(n) = |g_{n}(x)-P(X_{n+1}=x |X_0,...,X_n)|\to 0$ almost surely for a subclass of all stationary and ergodic time series, while for the full class the Cesaro average of the error tends to zero almost surely and moreover, the error tends to zero in probability."
prediction  ergodic_theory  time_series  statistics  morvai.gusztav  weiss.benjamin 
july 2011 by cshalizi
An Uncertain World II: Adapt, by Tim Harford - Whimsley
"This sentence shows another failure of the book: a blurring of the line between experimentation (trial-and-error) and decentralization. Throughout most of the book he uses experimentation as a synonym for decentralization (tacit knowledge and all that) and is in favour of both, but sometimes - as here - he separates the two to make his argument fit."
prediction  adaptive_behavior  book_reviews  slee.tom 
june 2011 by cshalizi
Scoring the pundits — Crooked Timber
"So, although the development of even rudimentary forms of audit is a great boon to the democratic public (and probably a lot more so than yet another inconclusive study of “media bias” one way or the other), I think it needs to be taken with two caveats. The biggest villain is not the guy who gets it wrong. The people who will cost you money and reputation over the long run are first, the guy who says he’s more certain than he really is, and second, the guy who won’t admit he’s wrong when he knows he is. "
prediction  natural_history_of_truthiness  why_oh_why_cant_we_have_a_better_press_corps  dsquared 
may 2011 by cshalizi
Statistical Prediction Analysis (Aitchison and Dunsmore, 1980)
Ancient, but I should see if there are examples or simple tools worth stealing for ADA.
books:noted  statistics  prediction  to:NB  to_teach:undergrad-ADA 
may 2011 by cshalizi
Efficient probabilistic forecasts for counts - McCabe et al., 2011 - JRSS-B
" Efficient probabilistic forecasts of integer-valued random variables are derived. The optimality is achieved by estimating the forecast distribution non-parametrically over a given broad model class and proving asymptotic (non-parametric) efficiency in that setting. The method is developed within the context of the integer auto-regressive class of models, which is a suitable class for any count data that can be interpreted as a queue, stock, birth-and-death process or branching process. The theoretical proofs of asymptotic efficiency are supplemented by simulation results that demonstrate the overall superiority of the non-parametric estimator relative to a misspecified parametric alternative, in large but finite samples. The method is applied to counts of stock market iceberg orders. A subsampling method is used to assess sampling variation in the full estimated forecast distribution and a proof of its validity is given."  (Dunno about the to_teach tags, I haven't read this yet.)
statistics  prediction  density_estimation  time_series  stochastic_processes  branching_processes  to_teach:data-mining  to_teach:undergrad-ADA 
march 2011 by cshalizi
Shmueli : To Explain or to Predict?
"Statistical modeling is a powerful tool for developing and testing theories by way of causal explanation, prediction, and description. In many disciplines there is near-exclusive use of statistical modeling for causal explanation and the assumption that models with high explanatory power are inherently of high predictive power. Conflation between explanation and prediction is common, yet the distinction must be understood for progressing scientific knowledge. While this distinction has been recognized in the philosophy of science, the statistical literature lacks a thorough discussion of the many differences that arise in the process of modeling for an explanatory versus a predictive goal. The purpose of this article is to clarify the distinction between explanatory and predictive modeling, to discuss its sources, and to reveal the practical implications of the distinction to each step in the modeling process."
statistics  prediction  philosophy_of_science 
january 2011 by cshalizi
Combining Nonparametric and Optimal Linear Time Series Predictions
ARMA model forecasting, supplemented somehow with nonparametric smoothing of the residuals.  (I haven't read beyond the abstract.)
time_series  prediction  statistics  nonparametrics  to_teach:undergrad-ADA 
january 2011 by cshalizi
Phys. Rev. E 82, 056206 (2010): Forecasting the evolution of nonlinear and nonstationary systems using recurrence-based local Gaussian process models
"...combining nonparametric Gaussian process (GP) modeling with certain local topological considerations ... for prediction (one-step look ahead) of ... nonlinear and nonstationary dynamics. ... partition ... trajectories into multiple near-stationary segments by aligning the boundaries of the partitions with those of the piecewise affine projections of the underlying dynamic system... alignment is achieved through the consideration of recurrence and other local topological properties ... forecasting in Lorenz system under different levels of induced noise and nonstationarity, synthetic heart-rate signals, and a real-world time-series from an industrial operation known to exhibit highly nonlinear and nonstationary dynamics. ... local Gaussian process can significantly outperform not just classical system identification, neural network and nonparametric models, but also the sequential Bayesian Monte Carlo methods in terms of prediction accuracy and computational speed."
time_series  prediction  non-stationarity  gaussian_processes  re:growing_ensemble_project  to_read 
november 2010 by cshalizi
[1010.6202] Sequential Data-Adaptive Bandwidth Selection by Cross-Validation for Nonparametric Prediction
"We consider the problem of bandwidth selection by cross-validation from a sequential point of view in a nonparametric regression model. Having in mind that in applications one often aims at estimation, prediction and change detection simultaneously, we investigate that approach for sequential kernel smoothers in order to base these tasks on a single statistic. We provide uniform weak laws of large numbers and weak consistency results for the cross-validated bandwidth. Extensions to weakly dependent error terms are discussed as well. The errors may be {\alpha}-mixing or L2-near epoch dependent, which guarantees that the uniform convergence of the cross validation sum and the consistency of the cross-validated bandwidth hold true for a large class of time series. The method is illustrated by analyzing photovoltaic data."
cross-validation  prediction  time_series  model_selection  to_read 
november 2010 by cshalizi
Lauritzen - "Sufficiency, Prediction, and Extreme Models" (JSTOR: Scandinavian Journal of Statistics, Vol. 1, No. 3 (1974), pp. 128-134)
"A modified concept of sufficiency, relevant in connection with statistical analysis of stochastic processes, is defined and its basic properties investigated. A method of prediction that applies when the probability structure is partly unknown is introduced and the method is shown to possess certain important invariance properties. The concept of an extreme model is defined and its probabilistic and statistical properties discussed. Existence of maximum likelihood estimators and predictors is established under weak regularity assumptions. For technical convenience, only discrete-valued stochastic processes are considered throughout the paper."
sufficiency  statistics  prediction  stochastic_processes  statistical_inference_for_stochastic_processes  have_read  lauritzen.steffen 
september 2010 by cshalizi
Wiener: Nonlinear Prediction and Dynamics
"Norbert Wiener really was that smart" dept.: long-term weather forecasting on the basis of deterministic dynamical models impossible because of limited precision observations and self-amplifying processes; but ergodic theory to the rescue for statistical forecasting; reconstruction of dynamical systems from sufficiently long trajectories (up to the ergodic component); linearization of nonlinear problems by projection into a higher-dimensional space; probably more, I'm not done reading it yet.
wiener.norbert  prediction  ergodic_theory  ergodic_decomposition  statistics  time_series  sensitive_dependence_on_initial_conditions  statistical_inference_for_stochastic_processes  series_of_footnotes  to:blog  have_read 
august 2010 by cshalizi
[1006.0475] Prediction with Advice of Unknown Number of Experts
"In the framework of prediction with expert advice, we consider a recently introduced kind of regret bounds: the bounds that depend on the effective instead of nominal number of experts. In contrast to the NormalHedge bound, which mainly depends on the effective number of experts and also weakly depends on the nominal one, we obtain a bound that does not contain the nominal number of experts at all. We use the defensive forecasting method and introduce an application of defensive forecasting to multivalued supermartingales."
prediction  learning_theory  re:growing_ensemble_project 
june 2010 by cshalizi
10-705 Intermediate Statistics, Fall 2009
Larry's version of the typical masters-level course based on Casella and Berger. Note: half of what he covers is not in Casella and Berger. (For example, he starts with VC theory!)
learning_theory  statistics  estimation  hypothesis_testing  prediction  minimax  bootstrap  model_selection  regression  classifiers  confidence_sets  wasserman.larry  kith_and_kin 
april 2010 by cshalizi
Desiderata for a Predictive Theory of Statistics - Clarke, 2010
"In many contexts the predictive validation of models or their associated prediction strategies is of greater importance than model identification which may be practically impossible. This is particularly so in fields involving complex or high dimensional data where model selection, or more generally predictor selection is the main focus of effort. This paper suggests a unified treatment for predictive analyses based on six `desiderata'. These desiderata are an effort to clarify what criteria a good predictive theory of statistics should satisfy." --- I presume (I haven't read the paper yet) that he means a theory of statistical predictions, and not a theory which tries to predict future developments within statistics.
statistics  prediction  methodology  to_read  data_mining 
march 2010 by cshalizi
[1003.1513] On the trasductive arguments in statistics
"The paper argues that a part of the current statistical discussion is not based on the standard firm foundations of the field. Among the examples we consider are prediction into the future, semi-supervised classification, and causality inference based on observational data." --- I have read this paper, but do not pretend to understand it. (For instance, I really don't get what he's saying about time series.)
statistics  prediction  causal_inference  time_series  have_read 
march 2010 by cshalizi
Reintroducing Prediction to Explanation
"Although prediction has been largely absent from discussions of explanation for the past 40 years, theories of explanation can gain much from a reintroduction. I review the history that divorced prediction from explanation, examine the proliferation of models of explanation that followed, and argue that accounts of explanation have been impoverished by the neglect of prediction. Instead of a revival of the symmetry thesis, I suggest that explanation should be understood as a cognitive tool that assists us in generating new predictions. This view of explanation and prediction clarifies what makes an explanation scientific and why inference to the best explanation makes sense in science."
explanation  prediction  philosophy_of_science 
february 2010 by cshalizi
Luen, Stark: Testing earthquake predictions
Back-up for the Hough review. Also: might make a good mini-project for the data-mining class, though I'd have to teach about spatio-temporal methods (which I should anyway [but where would the time come?]).
earthquakes  hypothesis_testing  bad_data_analysis  stark.philip  statistics  prediction  have_read  to_teach:data-mining  to_teach:undergrad-ADA 
december 2009 by cshalizi
[0912.4883] On Finding Predictors for Arbitrary Families of Processes
" A sequence $x_1,...,x_n,...$ of discrete-valued observations is generated according to some unknown [measure] $\mu$. After observing each outcome, ... give the conditional probabilities of the next observation. ... $\mu$ [is in] an arbitrary but known class $C$ of stochastic process measures. We [want] predictors ... whose conditional probabilities converge (in some sense) to the [true] conditional probabilities if any $\mu\in C$ is chosen to generate the sequence. ... [C]haracteriz[e] the families $C$ for which such predictors exist ... a specific and simple form in which to look for a solution. ... if any predictor works, then there exists a Bayesian predictor, whose prior is discrete, and which works too. .... sufficient and necessary conditions for the existence of a predictor, in terms of topological characterizations of the family $C$, as well as in terms of local behaviour of the measures in $C$, which in some cases lead to procedures for constructing such predictors."
prediction  universal_prediction  stochastic_processes  statistical_inference_for_stochastic_processes  statistics  re:AoS_project 
december 2009 by cshalizi
[0903.3620] Reconciling Model Selection and Prediction
"It is known that there is a dichotomy in the performance of model selectors. Those that are consistent (having the "oracle property") do not achieve the asymptotic minimax rate for prediction error. We look at this phenomenon closely, and argue that the set of parameters on which this dichotomy occurs is extreme, even pathological, and should not be considered when evaluating model selectors. We characterize this set, and show that, when such parameters are dismissed from consideration, consistency and asymptotic minimaxity can be attained simultaneously."
model_selection  statistics  minimax  regression  have_read  prediction 
december 2009 by cshalizi
A simple procedure for computing improved prediction intervals for autoregressive models. Paolo Vidoni. 2009; Journal of Time Series Analysis - Wiley InterScience
"construction of prediction intervals for time series models. The estimative or plug-in solution is usually not entirely adequate, since the (conditional) coverage probability may differ substantially from the nominal value. Prediction intervals with improved (conditional) coverage probability can be defined by adjusting the estimative ones, using rather complicated asymptotic procedures or suitable simulation techniques. This article extends to Markov process models a recent result by Vidoni, which defines a relatively simple predictive distribution function, giving improved prediction limits as quantiles"
prediction  time_series  statistics  confidence_sets  to_read 
december 2009 by cshalizi
The Monkey Cage: Forecasting Fallacies?
What we have here, boy, is a failure to calibrate: " 'Around 74% of companies have beat forecasts, versus the long-term average of 61% (empahsis added) and the all-time record of 73%, reached in the first quarter of 2004.' Now I might be missing something here, but if the forecasters were good at their jobs, shouldn’t the long term average of companies beating forecasts be the same as the long term average of companies doing worse than the forecasts?" --- Actually, isn't this compatible with the forecasters minimizing squared error under an asymmetric (but mean zero) noise distribution? (A more plausible explanation, to my mind, has to do with corrupt practices, where the same firms solicit investment-banking business from companies and purport to advise investors on what those companies are worth. But that's my cynicism.)
calibration  prediction  financial_markets  to_teach:data-mining  statistics 
august 2009 by cshalizi
PhilSci Archive - Deterministic versus indeterministic descriptions: not that different after all?
"The guiding question of this paper is: how similar are deterministic descriptions and indeterministic descriptions from a predictive viewpoint? The deterministic and indeterministic descriptions of concern in this paper are measure-theoretic deterministic systems and stochastic processes, respectively. I will explain intuitively some mathematical results which show that measure-theoretic deterministic systems and stochastic processes give more often the same predictions than one might perhaps have expected, and hence that from a predictive viewpoint these descriptions are quite similar." This needs saying?!?
dynamical_systems  stochastic_processes  prediction  philosophy_of_science  boltzmann_died_for_your_sins 
july 2009 by cshalizi
Limits of declustering methods for disentangling exogenous from endogenous events in time series with foreshocks, main shocks, and aftershocks
"Many time series in natural and social sciences can be seen as resulting from an interplay between exogenous influences and an endogenous organization. We use a simple epidemic-type aftershock model of events occurring sequentially, in which future events are influenced (partially triggered) by past events to ask the question of how well can one disentangle the exogenous events from the endogenous ones. We apply both model-dependent and model-independent stochastic declustering methods to reconstruct the tree of ancestry and estimate key parameters. In contrast with previously reported positive results, we have to conclude that declustered catalogs are rather unreliable for the synthetic catalogs that we have investigated, which contains of the order of thousands of events, typical of realistic applications. The estimated rates of exogenous events suffer from large errors. The branching ratio n, quantifying the fraction of events that have been triggered by previous events, is also badly estimated in general from declustered catalogs. We find, however, that the errors tend to be smaller and perhaps acceptable in some cases for small triggering efficiency and branching ratios. The high level of randomness together with the long memory makes the stochastic reconstruction of trees of ancestry and the estimation of the key parameters perhaps intrinsically unreliable for long-memory processes. For shorter memories (larger “bare” Omori exponent), the results improve significantly."
statistics  time_series  branching_processes  in_NB  earthquakes  prediction  inference_to_latent_objects  sornette.didier  long-range_dependence 
july 2009 by cshalizi
Why we overestimate the costs of climate change legislation | Grist
Conversely, the demand for Pan Am flights to the moon is much smaller than _very reasonable_ people have expected. This suggests an interesting question for retrospective studies of futurology: what's the variance? Quite conceivably, futurology is right _on average_, but with such a huge spread as to be unusable...
prediction  innovation  technological_change  environmental_management  environmental_policy  cost-benefit_analysis  climate_change 
june 2009 by cshalizi
« earlier      

related tags

adaptive_behavior  arrow_of_time  art  artificial_life  automata_theory  autonomous_agents  autonomy  bad_data_analysis  bayesianism  bayesian_consistency  beirl.wolfgang  bickel.david  blogged  boltzmann_died_for_your_sins  books:disrecommended  books:noted  books:recommended  book_reviews  bootstrap  branching_processes  caires.s.  calibration  causal_inference  chaos  classical_mechanics  classifiers  climate_change  climatology  coarse-graining  cognitive_science  communication  complexity  complexity_measures  computational_mechanics  confidence_sets  control  cost-benefit_analysis  cross-validation  cumulative_advantage  curse_of_dimensionality  cybernetics  data_mining  dawid.a.p.  dawid.philip  decision-making  decision_theory  decision_trees  density_estimation  determinism  diacu.florian  differential_equations  dimension_reduction  disasters  dsges  dsquared  dynamical_systems  dynamical_systemss  earthquakes  econometrics  economics  empirical_processes  ensemble_methods  environmental_management  environmental_policy  ergodic_decomposition  ergodic_theory  estimation  expertise  explanation  exponential_families  feedback  ferreira.j.a.  finance  financial_markets  fluctuation-response  freedom_as_self-control  funny:geeky  funny:malicious  game_theory  gaussian_processes  geology  gneiting.tilmann  grants  grunwald.peter  hansen.bruce  have_read  heard  heavy_tails  history_of_science  homeostasis  homophily  hooker.giles  hypothesis_testing  individual_sequence_prediction  induction  inference_to_latent_objects  information_theory  innovation  in_NB  kith_and_kin  knight.frank_b.  langford.john  lauritzen.steffen  learning_in_games  learning_theory  lei.jing  likelihood  long-range_dependence  low-rank_approximation  low-regret-learning  low-regret_learning  machine_learning  macroeconomics  macro_from_micro  markov_models  martingales  meteorology  methodology  minimax  misspecification  model-checking  modeling  model_averaging  model_selection  modularity  morvai.gusztav  multiple_testing  natural_history_of_truthiness  natural_language_processing  network_data_analysis  neuroscience  non-stationarity  nonparametrics  no_such_thing_as_bad_publicity  online_learning  optimization  path_dependence  philosophy_of_science  point_processes  popular_social_science  prediction  prequentialism  probability  prophecy  racine.jeffrey  raginsky.maxim  random_fields  re:almost_none  re:AoS_project  re:bayes_as_evol  re:growing_ensemble_project  re:phil-of-bayes_paper  re:stacs  re:XV_for_mixing  re:XV_for_networks  re:your_favorite_dsge_sucks  regression  robins.james  ryabko.b._ya.  salakhutdinov.ruslan  scoring_rules  search_engines  self-centered  self-organization  sensitive_dependence_on_initial_conditions  series_of_footnotes  set_theory  simulation  slee.tom  smoothing  social_media  sornette.didier  spatial_statistics  stark.philip  state-space_models  state_estimation  statistical_inference_for_stochastic_processes  statistical_mechanics  statistics  stochastic_processes  sufficiency  symbols_from_dynamics  technological_change  teleology  teleonomy  time_series  to:blog  to:NB  to_be_shot_after_a_fair_trial  to_read  to_teach:complexity-and-inference  to_teach:data-mining  to_teach:undergrad-ADA  track_down_references  universal_prediction  via:?  via:arthegall  via:mejn  vovk.vladimir_g.  wasserman.larry  watts.duncan  weather_prediction  weiss.benjamin  why_oh_why_cant_we_have_a_better_press_corps  wiener.norbert  willett.rebecca  zhang.tong 

Copy this bookmark:



description:


tags: