cshalizi + density_estimation   58

[0805.1404] Adaptive estimation of a distribution function and its density in sup-norm loss by wavelet and spline projections
"Given an i.i.d. sample from a distribution $F$ on $mathbb{R}$ with uniformly continuous density $p_0$, purely data-driven estimators are constructed that efficiently estimate $F$ in sup-norm loss and simultaneously estimate $p_0$ at the best possible rate of convergence over H"older balls, also in sup-norm loss. The estimators are obtained by applying a model selection procedure close to Lepski's method with random thresholds to projections of the empirical measure onto spaces spanned by wavelets or $B$-splines. The random thresholds are based on suprema of Rademacher processes indexed by wavelet or spline projection kernels. This requires Bernstein-type analogs of the inequalities in Koltchinskii [Ann. Statist. 34 (2006) 2593-2656] for the deviation of suprema of empirical processes from their Rademacher symmetrizations."
to:NB  density_estimation  wavelets  splines  statistics  empirical_processes 
22 days ago by cshalizi
Testing parametric conditional distributions using the nonparametric smoothing method
"This paper proposes a new goodness-of-fit test for parametric conditional probability distributions using the nonparametric smoothing methodology. An asymptotic normal distribution is established for the test statistic under the null hypothesis of correct specification of the parametric distribution. The test is shown to have power against local alternatives converging to the null at certain rates. The test can be applied to testing for possible misspecifications in a wide variety of parametric models. A bootstrap procedure is provided for obtaining more accurate critical values for the test. Monte Carlo simulations show that the test has good power against some common alternatives."
to:NB  misspecification  density_estimation  smoothing  statistics  to_teach:undergrad-ADA 
22 days ago by cshalizi
The benchden Package: Benchmark Densities for Nonparametric Density Estimation
"This article describes the benchden package which implements a set of 28 example densities for nonparametric density estimation in R. In addition to the usual functions that evaluate the density, distribution and quantile functions or generate random variates, a function designed to be specifically useful for larger simulation studies has been added. After describing the set of densities and the usage of the package, a small toy example of a simulation study conducted using the benchden package is given."
to:NB  computational_statistics  R  density_estimation  nonparametrics  to_teach:undergrad-ADA 
7 weeks ago by cshalizi
[1203.5422] Distribution Free Prediction Bands
"We study distribution free, nonparametric prediction bands with a special focus on their finite sample behavior. First we investigate and develop different notions of finite sample coverage guarantees. Then we give a new prediction band estimator by combining the idea of "conformal prediction" (Vovk et al. 2009) with nonparametric conditional density estimation. The proposed estimator, called COPS (Conformal Optimized Prediction Set), always has finite sample guarantee in a stronger sense than the original conformal prediction estimator. Under regularity conditions the estimator converges to an oracle band at a minimax optimal rate. A fast approximation algorithm and a data driven method for selecting the bandwidth are developed. The method is illustrated first in simulated data. Then, an application shows that the proposed method gives desirable prediction intervals in an automatic way, as compared to the classical linear regression modeling."
to:NB  prediction  statistics  nonparametrics  kith_and_kin  wasserman.larry  lei.jing  heard  confidence_sets  density_estimation 
8 weeks ago by cshalizi
[0803.2984] Conditional density estimation in a regression setting
"Regression problems are traditionally analyzed via univariate characteristics like the regression function, scale function and marginal density of regression errors. These characteristics are useful and informative whenever the association between the predictor and the response is relatively simple. More detailed information about the association can be provided by the conditional density of the response given the predictor. For the first time in the literature, this article develops the theory of minimax estimation of the conditional density for regression settings with fixed and random designs of predictors, bounded and unbounded responses and a vast set of anisotropic classes of conditional densities. The study of fixed design regression is of special interest and novelty because the known literature is devoted to the case of random predictors. For the aforementioned models, the paper suggests a universal adaptive estimator which (i) matches performance of an oracle that knows both an underlying model and an estimated conditional density; (ii) is sharp minimax over a vast class of anisotropic conditional densities; (iii) is at least rate minimax when the response is independent of the predictor and thus a bivariate conditional density becomes a univariate density; (iv) is adaptive to an underlying design (fixed or random) of predictors."
in_NB  statistics  nonparametrics  regression  density_estimation  minimax  to_read  to_teach:undergrad-ADA 
11 weeks ago by cshalizi
[math/0510311] Adaptive density estimation under dependence
"Assume that $(X_t)_{tinZ}$ is a real valued time series admitting a common marginal density $f$ with respect to Lebesgue's measure. Donoho {it et al.} (1996) propose a near-minimax method based on thresholding wavelets to estimate $f$ on a compact set in an independent and identically distributed setting. The aim of the present work is to extend these results to general weak dependent contexts. Weak dependence assumptions are expressed as decreasing bounds of covariance terms and are detailed for different examples. The threshold levels in estimators $widehat f_n$ depend on weak dependence properties of the sequence $(X_t)_{tinZ}$ through the constant. If these properties are unknown, we propose cross-validation procedures to get new estimators. These procedures are illustrated via simulations of dynamical systems and non causal infinite moving averages. We also discuss the efficiency of our estimators with respect to the decrease of covariances bounds."
to:NB  statistics  density_estimation  wavelets  time_series  statistical_inference_for_stochastic_processes 
12 weeks ago by cshalizi
[1202.1561] Tree Models for Difference and Change Detection in a Complex Environment
"A new family of tree models is proposed, which we call "differential trees." A differential tree model is constructed from multiple data sets and aims to detect distributional differences between them. The new methodology differs from the existing difference and change detection techniques in its nonparametric nature, model construction from multiple data sets, and applicability to high-dimensional data. Through a detailed study of an arson case in New Zealand, where an individual is known to have been laying vegetation fires within a certain time period, we illustrate how these models can help detect changes in the frequencies of event occurrences and uncover unusual clusters of events in a complex environment."

--- After reading, I think their exposition is needlessly hard to follow, but let me take a stab at it. In an ordinary classification tree, we are interested in the distribution of the class labels Y given the predictors X, i.e., Pr(Y|X), and make splits on X so that (in essence) the conditional entropy H[Y|X] becomes small. This is of course equivalent to making splits so that the divergence of Pr(Y|X) from Pr(Y) is maximized. What they are interested in is not classification but _describing_ how the different classes are distinct, so the relevant distribution is Pr(X|Y), and they want a big divergence between Pr(X) and Pr(X|Y).
to:NB  re:network_differences  statistics  hypothesis_testing  density_estimation  decision_trees  have_read  data_mining  two-sample_tests 
february 2012 by cshalizi
f-Divergence Estimation and Two-Sample Homogeneity Test Under Semiparametric Density-Ratio Models
"A density ratio is defined by the ratio of two probability densities. We study the inference problem of density ratios and apply a semiparametric density-ratio estimator to the two-sample homogeneity test. In the proposed test procedure, the $f$-divergence between two probability densities is estimated using a density-ratio estimator. The $f$ -divergence estimator is then exploited for the two-sample homogeneity test. We derive an optimal estimator of $f$-divergence in the sense of the asymptotic variance in a semiparametric setting, and provide a statistic for two-sample homogeneity test based on the optimal estimator. We prove that the proposed test dominates the existing empirical likelihood score test. Through numerical studies, we illustrate the adequacy of the asymptotic theory for finite-sample inference."
to:NB  statistics  density_estimation  information_theory  hypothesis_testing  two-sample_tests 
february 2012 by cshalizi
Giné , Nickl : Rates of contraction for posterior distributions in Lr-metrics, 1 ≤ r ≤ ∞
"The frequentist behavior of nonparametric Bayes estimates, more specifically, rates of contraction of the posterior distributions to shrinking Lr-norm neighborhoods, 1 ≤ r ≤ ∞, of the unknown parameter, are studied. A theorem for nonparametric density estimation is proved under general approximation-theoretic assumptions on the prior. The result is applied to a variety of common examples, including Gaussian process, wavelet series, normal mixture and histogram priors. The rates of contraction are minimax-optimal for 1 ≤ r ≤ 2, but deteriorate as r increases beyond 2. In the case of Gaussian nonparametric regression a Gaussian prior is devised for which the posterior contracts at the optimal rate in all Lr-norms, 1 ≤ r ≤ ∞."
in_NB  bayesian_consistency  statistics  nonparametrics  learning_theory  re:bayes_as_evol  density_estimation  regression 
january 2012 by cshalizi
Nonparametric Tests for Homogeneity Based on Non-Bipartite Matching
"Given a sequence of observations, has a change occurred in the underlying probability distribution with respect to observation order? This problem of detecting change points arises in a variety of applications including health prognostics for mechanical systems, syndromic disease surveillance in geographically dispersed populations, anomaly detection in information networks, and multivariate process control in general. Detecting change points in high-dimensional settings is challenging, and most change-point methods for multidimensional problems rely upon distributional assumptions or the use of observation history to model probability distributions. We present three new nonparametric statistical tests for heterogeneity based on the combinatorial properties of minimum non-bipartite matching (MNBM). The key idea underlying each of these tests is that if a sequence of independent random observations undergoes a change in distribution—either an abrupt “shift” or a gradual “drift”—a MNBM based on inter-point distances tends to produce pairings that are closer in the sequence labeling than would be the case if the observations were drawn from the same distribution. Our tests follow on the work of Rosenbaum (2005) who used MNBM to derive a simple cross-match test statistic for the two-sample problem based on this idea. Similar ideas are present in the minimum spanning tree (MST) test derived by Friedman and Rafsky (1979, 1981). We extend these approaches by utilizing ensembles of orthogonal MNBMs which greatly increase information extraction from the data, leading to tests that compare favorably to parametric procedures while maintaining level and good power properties across distributions."
to:NB  statistics  hypothesis_testing  density_estimation  change-point_problem  two-sample_tests 
january 2012 by cshalizi
[1201.0794] Sparse Nonparametric Graphical Models
"We present some nonparametric methods for graphical modeling. In the discrete case, where the data are binary or drawn from a finite alphabet, Markov random fields are already essentially nonparametric, since the cliques can take only a finite number of values. Continuous data are different. The Gaussian graphical model is the standard parametric model for continuous data, but it makes distributional assumptions that are often unrealistic. We discuss two approaches to building more flexible graphical models. One allows arbitrary graphs and a nonparametric extension of the Gaussian; the other uses kernel density estimation and restricts the graphs to trees and forests. Examples of both methods are presented. We also discuss possible future research directions for nonparametric graphical modeling."

(Review/good parts version of previous papers.)
in_NB  kith_and_kin  statistics  machine_learning  graphical_models  nonparametrics  density_estimation  wasserman.larry  liu.han  lafferty.john 
january 2012 by cshalizi
[1112.6390] Early Warning with Calibrated and Sharper Probabilistic Forecasts
"Given a nonlinear model, a probabilistic forecast may be obtained by Monte Carlo simulations. At a given forecast horizon, Monte Carlo simulations yield sets of discrete forecasts, which can be converted to density forecasts. The resulting density forecasts will inevitably be downgraded by model mis-specification. In order to enhance the quality of the density forecasts, one can mix them with the unconditional density. This paper examines the value of combining conditional density forecasts with the unconditional density. The findings have positive implications for issuing early warnings in different disciplines including economics and meteorology, but UK inflation forecasts are considered as an example." --- Better than conformal predictors?
to:NB  prediction  statistics  ensemble_methods  density_estimation 
december 2011 by cshalizi
[1111.1418] Efficient Nonparametric Conformal Prediction Regions
Yay, it's out! "We investigate and extend the conformal prediction method due to Vovk,Gammerman and Shafer (2005) to construct nonparametric prediction regions. These regions have guaranteed distribution free, finite sample coverage, without any assumptions on the distribution or the bandwidth. Explicit convergence rates of the loss function are established for such regions under standard regularity conditions. Approximations for simplifying implementation and data driven bandwidth selection methods are also discussed. The theoretical properties of our method are demonstrated through simulations."
in_NB  prediction  statistics  confidence_sets  nonparametrics  kith_and_kin  wasserman.larry  robins.james  have_read  density_estimation 
november 2011 by cshalizi
CAKE: Convex Adaptive Kernel Density Estimation
"In this paper we present a generalization of kernel density estimation called Convex Adaptive Kernel Density Estimation (CAKE) that replaces single bandwidth se- lection by a convex aggregation of kernels at all scales, where the convex aggregation is allowed to vary from one training point to another, treating the fundamental problem of heterogeneous smoothness in a novel way. Learning the CAKE estimator given a training set reduces to solving a single con- vex quadratic programming problem. We derive rates of convergence of CAKE like estimator to the true underlying density under smoothness assumptions on the class and show that given a sufficiently large sample the mean squared error of such estimators is optimal in a minimax sense. We also give a risk bound of the CAKE estimator in terms of its empirical risk. We empirically compare CAKE to other density estimators proposed in the statistics literature for handling heterogeneous smoothness on different synthetic and natural distributions. "
to:NB  have_read  density_estimation  ensemble_methods  kernel_estimators  statistics 
november 2011 by cshalizi
Density Estimation in Several Populations With Uncertain Population Membership
"We devise methods to estimate probability density functions of several populations using observations with uncertain population membership, meaning from which population an observation comes is unknown. The probability of an observation being sampled from any given population can be calculated. We develop general estimation procedures and bandwidth selection methods for our setting. We establish large-sample properties and study finite-sample performance using simulation studies. We illustrate our methods with data from a nutrition study."
in_NB  density_estimation  mixture_models  to_teach:undergrad-ADA  to_teach:data-mining 
october 2011 by cshalizi
[1107.3133] Robust Kernel Density Estimation
"We propose a method for nonparametric density estimation that exhibits robustness to contamination of the training sample. This method achieves robustness by combining a traditional kernel density estimator (KDE) with ideas from classical M-estimation. We interpret the KDE based on a radial, positive semi-definite kernel as a sample mean in the associated reproducing kernel Hilbert space. Since the sample mean is sensitive to outliers, we estimate it robustly via M-estimation, yielding a robust kernel density estimator (RKDE). An RKDE can be computed efficiently via a kernelized iteratively re-weighted least squares (IRWLS) algorithm. Necessary and sufficient conditions are given for kernelized IRWLS to converge to the global minimizer of the M-estimator objective function. The robustness of the RKDE is demonstrated with a representer theorem, the influence function, and experimental results for density estimation and anomaly detection."
density_estimation  statistics  robust_statistics  to:NB  to_read 
july 2011 by cshalizi
Efficient probabilistic forecasts for counts - McCabe et al., 2011 - JRSS-B
" Efficient probabilistic forecasts of integer-valued random variables are derived. The optimality is achieved by estimating the forecast distribution non-parametrically over a given broad model class and proving asymptotic (non-parametric) efficiency in that setting. The method is developed within the context of the integer auto-regressive class of models, which is a suitable class for any count data that can be interpreted as a queue, stock, birth-and-death process or branching process. The theoretical proofs of asymptotic efficiency are supplemented by simulation results that demonstrate the overall superiority of the non-parametric estimator relative to a misspecified parametric alternative, in large but finite samples. The method is applied to counts of stock market iceberg orders. A subsampling method is used to assess sampling variation in the full estimated forecast distribution and a proof of its validity is given."  (Dunno about the to_teach tags, I haven't read this yet.)
statistics  prediction  density_estimation  time_series  stochastic_processes  branching_processes  to_teach:data-mining  to_teach:undergrad-ADA 
march 2011 by cshalizi
Giné , Nickl : Adaptive estimation of a distribution function and its density in sup-norm loss by wavelet and spline projections
"Given an i.i.d. sample from a distribution F on ℝ with uniformly continuous density p0, purely data-driven estimators are constructed that efficiently estimate F in sup-norm loss and simultaneously estimate p0 at the best possible rate of convergence over Hölder balls, also in sup-norm loss. The estimators are obtained by applying a model selection procedure close to Lepski’s method with random thresholds to projections of the empirical measure onto spaces spanned by wavelets or B-splines. The random thresholds are based on suprema of Rademacher processes indexed by wavelet or spline projection kernels. This requires Bernstein-type analogs of the inequalities in Koltchinskii [Ann. Statist. 34 (2006) 2593–2656] for the deviation of suprema of empirical processes from their Rademacher symmetrizations."
learning_theory  density_estimation  empirical_processes 
november 2010 by cshalizi
Botev, Grotowski, Kroese: Kernel density estimation via diffusion
"We present a new adaptive kernel density estimator based on linear diffusion processes. The proposed estimator builds on existing ideas for adaptive smoothing by incorporating information from a pilot density estimate. In addition, we propose a new plug-in bandwidth selection method that is free from the arbitrary normal reference rules used by existing methods. We present simulation examples in which the proposed approach outperforms existing methods in terms of accuracy and reliability."
density_estimation  kernel_estimators  stochastic_processes  statistics 
august 2010 by cshalizi
"Dimension Reduction and Adaptation in Conditional Density Estimation" - Journal of the American Statistical Association - 105(490):761
Sounds neat: "An orthogonal series estimator of the conditional density of a response given a vector of continuous and ordinal/nominal categorical predictors ... The estimator is based on writing a conditional density as a sum of orthogonal projections on all possible subspaces of reduced dimensionality and then estimating each projection via a shrinkage procedure [which] uses a universal thresholding and a dyadic-blockwise shrinkage for low and high frequencies, respectively. The estimator is data-driven, is adaptive to underlying smoothness of a conditional density, and attains a minimax rate of the mean integrated squared error convergence. ... if a conditional density depends only on a subgroup of predictors, then the estimator seizes the opportunity and attains a corresponding minimax rate of convergence. The latter property relaxes the notorious “curse of dimensionality.” ... the estimator is fast, because neither projections nor shrinkages are computation-intensive."
density_estimation  dimension_reduction  statistics  estimation  shrinkage 
july 2010 by cshalizi
Steif: Consistent estimation of joint distributions for sufficiently mixing random fields
"The joint distribution of a d-dimensional random field restricted to a box of size k can be estimated by looking at a realization in a box of size $n \gg k$ and computing the empirical distribution. This is done by sliding a box of size k around in the box of size n and computing frequencies. We show that when $k = k(n)$ grows as a function of n, then the total variation distance between this empirical distribution and the true distribution goes to 0 a.s. as $n \to \infty$ provided $k(n)^d \leq (\log n^d)/(H + \varepsilon)$ (where H is the entropy of the random field) and providing the random field satisfies a condition called quite weak Bernoulli with exponential rate. ... Marton and Shields have proved such results in one dimension and this paper is an attempt to extend their results to some extent to higher dimensions."
statistics  information_theory  random_fields  estimation  density_estimation  entropy  mixing  to_read  statistical_inference_for_stochastic_processes 
may 2010 by cshalizi
Giné, Nickl: Confidence bands in density estimation
"Given a sample from some unknown continuous density f : ℝ→ℝ, we construct adaptive confidence bands that are honest for all densities in a “generic” subset of the union of t-Hölder balls, 0
statistics  estimation  confidence_sets  density_estimation 
february 2010 by cshalizi
[0909.0999] Adaptive density estimation for stationary processes
"We propose an algorithm to estimate the common density $s$ of a stationary process $X_1,...,X_n$. We suppose that the process is either $\beta$ or $\tau$-mixing. We provide a model selection procedure based on a generalization of Mallows' $C_p$ and we prove oracle inequalities for the selected estimator under a few prior assumptions on the collection of models and on the mixing coefficients. We prove that our estimator is adaptive over a class of Besov spaces, namely, we prove that it achieves the same rates of convergence as in the i.i.d framework."
statistical_inference_for_stochastic_processes  density_estimation  statistics 
september 2009 by cshalizi
[0908.3856] Self-consistent method for density estimation
Physicists discovering non-parametric density estimation. It's a cute idea, but I am not comfortable with anything which can give me a negative density estimate.
density_estimation  statistics  nonparametrics  have_read 
august 2009 by cshalizi
Uniform Convergence Rates for Kernel Estimation with Dependent Data
"This paper presents a set of rate of uniform consistency results for kernel estimators of density functions and regressions functions. We generalize the existing literature by allowing for stationary strong mixing multivariate data with infinite support, and kernels with unbounded support, and general bandwidth sequences. These results are useful for semiparametric estimation based on a first stage nonparametric estimator."
kernel_estimators  mixing  statistical_inference_for_stochastic_processes  statistics  density_estimation  regression  hansen.bruce 
june 2009 by cshalizi
[0906.2885] Noisy Independent Factor Analysis Model for Density Estimation and Classification
"We consider the problem of multivariate density estimation when the unknown density is assumed to follow a particular form of dimensionality reduction, a noisy independent factor analysis (IFA) model. In this model the data are generated by a number of latent independent components having unknown distributions and are observed in Gaussian noise. We do not assume that either the number of components or the matrix mixing the components are known. We show that the densities of this form can be estimated with a fast rate"
factor_analysis  density_estimation  statistics  to_read  re:g_paper 
june 2009 by cshalizi
A Note on the Richness of the Convex Hulls of VC Classes
"We prove the existence of a class A of subsets of Rd of VC dimension 1 such that the symmetric convex hull F of the class of characteristic functions of sets in A is rich in the following sense. For any absolutely continuous probability measure μ on Rd, measurable set B and ε >0, there exists a function f in F such that the measure of the symmetric difference of B and the set where f is positive is less than ε. The question was motivated by the investigation of the theoretical properties of certain algorithms in machine learning." --- I see it, but I don't believe it! (The proof would seem to extend to arbitrary complete separable metric spaces, not just R^d.)
learning_theory  density_estimation  statistics  have_read  vc-dimension  analysis  measure_theory 
march 2009 by cshalizi
[0901.2044] Spades and Mixture Models
"This paper studies sparse density estimation via l1 penalization (SPADES). We focus on estimation in high-dimensional mixture models and nonparametric adaptive density estimation. We show, respectively, that SPADES can recover, with high probability, the unknown components of a mixture of probability densities and that it yields minimax adaptive density estimates."
density_estimation  sparsity  mixture_models  to_read  nonparametrics  statistics  in_NB 
january 2009 by cshalizi
Cross-Validation and the Estimation of Conditional Probability Densities
Nice. Definitely needs to be included next time I teach data-mining. (The method is implemented in the "np" package on CRAN.) In particular worth comparing to logistic regression and logistic GAMs for binary conditional probability estimation/classification.
statistics  density_estimation  kernel_methods  cross-validation  to_teach:data-mining  have_read  to_teach:undergrad-ADA 
october 2008 by cshalizi

related tags

analysis  barron.andrew  bayesian_consistency  bayesian_nonparametrics  books:noted  branching_processes  change-point_problem  chow-liu_trees  clustering  computational_statistics  confidence_sets  cross-validation  data_mining  decision_trees  density_estimation  density_ratio_estimation  dimension_reduction  earthquakes  empirical_processes  ensemble_methods  entropy  ergodic_theory  estimation  exponential_families  factor_analysis  geology  graphical_models  hansen.bruce  hardle.wolfgang  have_read  hayfield.tristen  heard  heavy_tails  histograms  hypothesis_testing  indirect_inference  information_theory  interview  in_NB  kernel_estimators  kernel_methods  kith_and_kin  lafferty.john  laplace_approximation  learning_theory  lei.jing  liu.han  lives_of_the_scientists  machine_learning  manifold_learning  measure_theory  minimax  misspecification  mixing  mixture_models  model_selection  nonparametrics  political_science  prediction  R  racine.jeffrey  random_fields  re:bayes_as_evol  re:g_paper  re:network_differences  re:stacs  re:your_favorite_dsge_sucks  regression  rinaldo.alessandro  robins.james  robust_statistics  rosenblatt.murray  self-promotion  sheu.chyong-hwa  shrinkage  smoothing  sparsity  spatial_statistics  splines  stability_of_learning  statistical_inference_for_stochastic_processes  statistics  stochastic_approximation  stochastic_processes  sugiyama.masashi  survival_analysis  time_series  to:blog  to:NB  to_read  to_teach:complexity-and-inference  to_teach:data-mining  to_teach:undergrad-ADA  two-sample_tests  universal_prediction  vc-dimension  wasserman.larry  wavelets 

Copy this bookmark:



description:


tags: