cshalizi + hypothesis_testing   94

Likelihood inference for discriminating between long-memory and change-point models - Yau - 2012 - Journal of Time Series Analysis - Wiley Online Library
"We develop a likelihood ratio (LR) test procedure for discriminating between a short-memory time series with a change-point (CP) and a long-memory (LM) time series. Under the null hypothesis, the time series consists of two segments of short-memory time series with different means and possibly different covariance functions. The location of the shift in the mean is unknown. Under the alternative, the time series has no shift in mean but rather is LM. The LR statistic is defined as the normalized log-ratio of the Whittle likelihood between the CP model and the LM model, which is asymptotically normally distributed under the null. The LR test provides a parametric alternative to the CUSUM test proposed by Berkes et al. (2006). Moreover, the LR test is more general than the CUSUM test in the sense that it is applicable to changes in other marginal or dependence features other than a change-in-mean. We show its good performance in simulations and apply it to two data examples."
to:NB  time_series  change-point_problem  long-range_dependence  statistics  to_teach:undergrad-ADA  hypothesis_testing 
13 days ago by cshalizi
[1204.1563] Generalized Error Exponents for Sparse Sample Goodness of Fit Tests
"We investigate the sparse sample goodness-of-fit problem, where the number of samples $n$ is smaller than the size of the alphabet $m$. The goal of this work is to find an appropriate criterion to analyze statistical tests in this setting. A suitable model for analysis is the high-dimensional model in which both $n$ and $m$ tend to infinity, and $n=o(m)$. We propose a new performance criterion based on large deviation analysis, which generalizes the classical error exponent applicable for large sample problems (in which $m=O(n)$). This new criterion provides insights that are not available from asymptotic consistency or CLT analysis. The main results are:
(i) The best achievable probability of error $P_e$ decays as $-log(P_e)=(n^2/m)(1+o(1))J$ for some $J>0$.
(ii) A well-known coincidence-based test attains the optimal generalized error exponent.
(iii) The widely used Pearson's chi-square test has J=0.
(iv) The contributions (i)-(iii) are established under the assumption that the distribution under the null hypothesis is uniform. For the non-uniform case, a new test is proposed, with a non-zero generalized error exponent."
to:NB  hypothesis_testing  re:LICORS  statistics  large_deviations  goodness-of-fit 
6 weeks ago by cshalizi
Taylor & Francis Online :: Statistical Inference on Random Graphs: Comparative Power Analyses via Monte Carlo - Journal of Computational and Graphical Statistics - Volume 20, Issue 2
"We present a comparative power analysis, via Monte Carlo, of various graph invariants used as statistics for testing graph homogeneity versus a “chatter” alternative—the existence of a local region of excessive activity. Our results indicate that statistical inference on random graphs, even in a relatively simple setting, can be decidedly nontrivial. We find that none of the graph invariants considered is uniformly most powerful throughout our space of alternatives. Code for reproducing all the simulation results presented in this article is available online."
to:NB  re:network_differences  statistics  hypothesis_testing  network_data_analysis 
8 weeks ago by cshalizi
[0803.2095] Properties of higher criticism under strong dependence
"The problem of signal detection using sparse, faint information is closely related to a variety of contemporary statistical problems, including the control of false-discovery rate, and classification using very high-dimensional data. Each problem can be solved by conducting a large number of simultaneous hypothesis tests, the properties of which are readily accessed under the assumption of independence. In this paper we address the case of dependent data, in the context of higher criticism methods for signal detection. Short-range dependence has no first-order impact on performance, but the situation changes dramatically under strong dependence. There, although higher criticism can continue to perform well, it can be bettered using methods based on differences of signal values or on the maximum of the data. The relatively inferior performance of higher criticism in such cases can be explained in terms of the fact that, under strong dependence, the higher criticism statistic behaves as though the data were partitioned into very large blocks, with all but a single representative of each block being eliminated from the dataset."
to:NB  statistics  hypothesis_testing  multiple_testing  stochastic_processes 
9 weeks ago by cshalizi
Kaiser , Lahiri , Nordman : Goodness of fit tests for a class of Markov random field models
"This paper develops goodness of fit statistics that can be used to formally assess Markov random field models for spatial data, when the model distributions are discrete or continuous and potentially parametric. Test statistics are formed from generalized spatial residuals which are collected over groups of nonneighboring spatial observations, called concliques. Under a hypothesized Markov model structure, spatial residuals within each conclique are shown to be independent and identically distributed as uniform variables. The information from a series of concliques can be then pooled into goodness of fit statistics. Under some conditions, large sample distributions of these statistics are explicitly derived for testing both simple and composite hypotheses, where the latter involves additional parametric estimation steps. The distributional results are verified through simulation, and a data example illustrates the method for model assessment."
to:NB  to_read  statistics  spatial_statistics  random_fields  goodness-of-fit  hypothesis_testing  re:stacs  markov_models 
10 weeks ago by cshalizi
Meinshausen , Maathuis , Bühlmann : Asymptotic optimality of the Westfall–Young permutation procedure for multiple testing under dependence
"Test statistics are often strongly dependent in large-scale multiple testing applications. Most corrections for multiplicity are unduly conservative for correlated test statistics, resulting in a loss of power to detect true positives. We show that the Westfall–Young permutation method has asymptotically optimal power for a broad class of testing problems with a block-dependence and sparsity structure among the tests, when the number of tests tends to infinity."
to:NB  statistics  multiple_testing  hypothesis_testing  buhlmann.peter 
12 weeks ago by cshalizi
[1202.1377] Statistical significance in high-dimensional linear models
"We propose a method for constructing p-values for general hypotheses in a high-dimensional linear model. The hypotheses can be local for testing a single regression parameter or they may be more global involving several up to all parameters. Furthermore, when considering many hypotheses, we show how to adjust for multiple testing taking dependence among the p-values into account. Our technique is based on Ridge estimation with an additional correction term due to a substantial projection bias in high dimensions. We prove strong error control for our p-values and provide sufficient conditions for detection: for the former, we do not make any assumption on the size of the true underlying regression coefficients. We demonstrate the method in simulated examples and a real data application."
in_NB  statistics  regression  goodness-of-fit  hypothesis_testing  buhlmann.peter 
12 weeks ago by cshalizi
[1202.3775] Kernel-based Conditional Independence Test and Application in Causal Discovery
"Conditional independence testing is an important problem, especially in Bayesian network learning and causal discovery. Due to the curse of dimensionality, testing for conditional independence of continuous variables is particularly challenging. We propose a Kernel-based Conditional Independence test (KCI-test), by constructing an appropriate test statistic and deriving its asymptotic distribution under the null hypothesis of conditional independence. The proposed method is computationally efficient and easy to implement. Experimental results show that it outperforms other methods, especially when the conditioning set is large or the sample size is not very large, in which case other methods encounter difficulties."
statistics  kernel_estimators  independence_testing  hypothesis_testing  causal_inference  in_NB  have_read  to:blog  to_teach:undergrad-ADA 
12 weeks ago by cshalizi
[0809.1053] An impossibility result for process discrimination
"Two series of binary observations $x_1,x_1,...$ and $y_1,y_2,...$ are presented: at each time $ninN$ we are given $x_n$ and $y_n$. It is assumed that the sequences are generated independently of each other by two B-processes. We are interested in the question of whether the sequences represent a typical realization of two different processes or of the same one. We demonstrate that this is impossible to decide, in the sense that every discrimination procedure is bound to err with non-negligible frequency when presented with sequences from some B-processes. This contrasts earlier positive results on B-processes, in particular those showing that there are consistent $bar d$-distance estimates for this class of processes."
to:NB  statistics  time_series  stochastic_processes  ergodic_theory  statistical_inference_for_stochastic_processes  hypothesis_testing 
12 weeks ago by cshalizi
[0810.2276] A generalized portmanteau test of independence between two stationary time series
"We propose generalized portmanteau-type test statistics in the frequency domain to test independence between two stationary time series. The test statistics are formed analogous to the one in Chen and Deo (2004, Econometric Theory 20, 382-416), who extended the applicability of portmanteau goodness-of-fit test to the long memory case. Under the null hypothesis of independence, the asymptotic standard normal distributions of the proposed statistics are derived under fairly mild conditions. In particular, each time series is allowed to possess short memory, long memory or anti-persistence. A simulation study shows that the tests have reasonable size and power properties."
in_NB  statistics  time_series  hypothesis_testing  independence_testing 
12 weeks ago by cshalizi
The optimal discovery procedure: a new approach to simultaneous significance testing - Storey - 2007 - Journal of the Royal Statistical Society: Series B (Statistical Methodology) - Wiley Online Library
"The Neyman–Pearson lemma provides a simple procedure for optimally testing a single hypothesis when the null and alternative distributions are known. This result has played a major role in the development of significance testing strategies that are used in practice. Most of the work extending single-testing strategies to multiple tests has focused on formulating and estimating new types of significance measures, such as the false discovery rate. These methods tend to be based on p-values that are calculated from each test individually, ignoring information from the other tests. I show here that one can improve the overall performance of multiple significance tests by borrowing information across all the tests when assessing the relative significance of each one, rather than calculating p-values for each test individually. The ‘optimal discovery procedure’ is introduced, which shows how to maximize the number of expected true positive results for each fixed number of expected false positive results. The optimality that is achieved by this procedure is shown to be closely related to optimality in terms of the false discovery rate. The optimal discovery procedure motivates a new approach to testing multiple hypotheses, especially when the tests are related. As a simple example, a new simultaneous procedure for testing several normal means is defined; this is surprisingly demonstrated to outperform the optimal single-test procedure, showing that a method which is optimal for single tests may no longer be optimal for multiple tests. Connections to other concepts in statistics are discussed, including Stein's paradox, shrinkage estimation and the Bayesian approach to hypothesis testing."
to:NB  statistics  hypothesis_testing  multiple_comparisons 
february 2012 by cshalizi
Henze : A Multivariate Two-Sample Test Based on the Number of Nearest Neighbor Type Coincidences
"For independent $d$-variate random samples $X_1, cdots, X_{n_1}$ i.i.d. $f(x), Y_1, cdots, Y_{n_2}$ i.i.d. $g(x)$, where the densities $f$ and $g$ are assumed to be continuous a.e., consider the number $T$ of all $k$ nearest neighbor comparisons in which observations and their neighbors belong to the same sample. We show that, if $f = g$ a.e., the limiting (normal) distribution of $T$, as $min(n_1, n_2) rightarrow infty, n_1/(n_1 + n_2) rightarrow tau, 0 < tau < 1$, does not depend on $f$. An omnibus procedure for testing the hypothesis $H_0: f = g$ a.e. is obtained by rejecting $H_0$ for large values of $T$. The result applies to a general distance (generated by a norm on $mathbb{R}^d$) for determining nearest neighbors, and it generalizes to the multisample situation."
to:NB  to_read  statistics  hypothesis_testing  two-sample_tests  re:AoS_project 
february 2012 by cshalizi
[1202.1561] Tree Models for Difference and Change Detection in a Complex Environment
"A new family of tree models is proposed, which we call "differential trees." A differential tree model is constructed from multiple data sets and aims to detect distributional differences between them. The new methodology differs from the existing difference and change detection techniques in its nonparametric nature, model construction from multiple data sets, and applicability to high-dimensional data. Through a detailed study of an arson case in New Zealand, where an individual is known to have been laying vegetation fires within a certain time period, we illustrate how these models can help detect changes in the frequencies of event occurrences and uncover unusual clusters of events in a complex environment."

--- After reading, I think their exposition is needlessly hard to follow, but let me take a stab at it. In an ordinary classification tree, we are interested in the distribution of the class labels Y given the predictors X, i.e., Pr(Y|X), and make splits on X so that (in essence) the conditional entropy H[Y|X] becomes small. This is of course equivalent to making splits so that the divergence of Pr(Y|X) from Pr(Y) is maximized. What they are interested in is not classification but _describing_ how the different classes are distinct, so the relevant distribution is Pr(X|Y), and they want a big divergence between Pr(X) and Pr(X|Y).
to:NB  re:network_differences  statistics  hypothesis_testing  density_estimation  decision_trees  have_read  data_mining  two-sample_tests 
february 2012 by cshalizi
f-Divergence Estimation and Two-Sample Homogeneity Test Under Semiparametric Density-Ratio Models
"A density ratio is defined by the ratio of two probability densities. We study the inference problem of density ratios and apply a semiparametric density-ratio estimator to the two-sample homogeneity test. In the proposed test procedure, the $f$-divergence between two probability densities is estimated using a density-ratio estimator. The $f$ -divergence estimator is then exploited for the two-sample homogeneity test. We derive an optimal estimator of $f$-divergence in the sense of the asymptotic variance in a semiparametric setting, and provide a statistic for two-sample homogeneity test based on the optimal estimator. We prove that the proposed test dominates the existing empirical likelihood score test. Through numerical studies, we illustrate the adequacy of the asymptotic theory for finite-sample inference."
to:NB  statistics  density_estimation  information_theory  hypothesis_testing  two-sample_tests 
february 2012 by cshalizi
Nonparametric Tests for Homogeneity Based on Non-Bipartite Matching
"Given a sequence of observations, has a change occurred in the underlying probability distribution with respect to observation order? This problem of detecting change points arises in a variety of applications including health prognostics for mechanical systems, syndromic disease surveillance in geographically dispersed populations, anomaly detection in information networks, and multivariate process control in general. Detecting change points in high-dimensional settings is challenging, and most change-point methods for multidimensional problems rely upon distributional assumptions or the use of observation history to model probability distributions. We present three new nonparametric statistical tests for heterogeneity based on the combinatorial properties of minimum non-bipartite matching (MNBM). The key idea underlying each of these tests is that if a sequence of independent random observations undergoes a change in distribution—either an abrupt “shift” or a gradual “drift”—a MNBM based on inter-point distances tends to produce pairings that are closer in the sequence labeling than would be the case if the observations were drawn from the same distribution. Our tests follow on the work of Rosenbaum (2005) who used MNBM to derive a simple cross-match test statistic for the two-sample problem based on this idea. Similar ideas are present in the minimum spanning tree (MST) test derived by Friedman and Rafsky (1979, 1981). We extend these approaches by utilizing ensembles of orthogonal MNBMs which greatly increase information extraction from the data, leading to tests that compare favorably to parametric procedures while maintaining level and good power properties across distributions."
to:NB  statistics  hypothesis_testing  density_estimation  change-point_problem  two-sample_tests 
january 2012 by cshalizi
[0805.4136] Inference for the dark energy equation of state using Type IA supernova data
"The surprising discovery of an accelerating universe led cosmologists to posit the existence of "dark energy"--a mysterious energy field that permeates the universe. Understanding dark energy has become the central problem of modern cosmology. After describing the scientific background in depth, we formulate the task as a nonlinear inverse problem that expresses the comoving distance function in terms of the dark-energy equation of state. We present two classes of methods for making sharp statistical inferences about the equation of state from observations of Type Ia Supernovae (SNe). First, we derive a technique for testing hypotheses about the equation of state that requires no assumptions about its form and can distinguish among competing theories. Second, we present a framework for computing parametric and nonparametric estimators of the equation of state, with an associated assessment of uncertainty. Using our approach, we evaluate the strength of statistical evidence for various competing models of dark energy. Consistent with current studies, we find that with the available Type Ia SNe data, it is not possible to distinguish statistically among popular dark-energy models, and that, in particular, there is no support in the data for rejecting a cosmological constant. With much more supernova data likely to be available in coming years (e.g., from the DOE/NASA Joint Dark Energy Mission), we address the more interesting question of whether future data sets will have sufficient resolution to distinguish among competing theories."

--- I am biased, because Chris G. and Larry are friends, but this seems to me a model of the modern applied statistics paper: use interesting statistical tools to say something helpful about an important scientific problem on its own terms, rather than distorting the problem until it "looks like a nail".
in_NB  kith_and_kin  cosmology  astronomy  inverse_problems  nonparametrics  estimation  hypothesis_testing  statistics  bootstrap  genovese.christopher  wasserman.larry  have_read 
january 2012 by cshalizi
Phys. Rev. E 84, 051138 (2011): Anomalous diffusion: Testing ergodicity breaking in experimental data
"Recent advances in single-molecule experiments show that various complex systems display nonergodic behavior. In this paper, we show how to test ergodicity and ergodicity breaking in experimental data. Exploiting the so-called dynamical functional, we introduce a simple test which allows us to verify ergodic properties of a real-life process. The test can be applied to a large family of stationary infinitely divisible processes. We check the performance of the test for various simulated processes and apply it to experimental data describing the motion of mRNA molecules inside live Escherichia coli cells. We show that the data satisfy necessary conditions for mixing and ergodicity. The detailed analysis is presented in the supplementary material."
in_NB  to_read  ergodic_theory  hypothesis_testing  stochastic_processes  statistical_inference_for_stochastic_processes 
december 2011 by cshalizi
Quantifying the failure of bootstrap likelihood ratio tests
"When testing geometrically irregular parametric hypotheses, the bootstrap is an intuitively appealing method to circumvent difficult distribution theory. It has been shown, however, that the usual bootstrap is inconsistent in estimating the asymptotic distributions involved in such problems. This paper is concerned with the asymptotic size of likelihood ratio tests when critical values are computed using the inconsistent bootstrap. We clarify how the asymptotic size of such a test can be obtained from the size of the corresponding bootstrap test in the relevant limiting normal experiment. For boundary problems, that is, hypotheses given by convex cones, we show the bootstrap test to always be anticonservative, and we compute the size numerically for different two-dimensional examples. The examples illustrate that the size can be below or above the nominal level, and reveal that the relationship between the size of the test and the geometry of the considered hypotheses is surprisingly subtle."
in_NB  statistics  bootstrap  hypothesis_testing 
december 2011 by cshalizi
Non-parametric detection of meaningless distances in high dimensional data - Ata Kabán - Statistics and Computing, Volume 22, Number 2
"Distance concentration is the phenomenon that, in certain conditions, the contrast between the nearest and the farthest neighbouring points vanishes as the data dimensionality increases. It affects high dimensional data processing, analysis, retrieval, and indexing, which all rely on some notion of distance or dissimilarity. Previous work has characterised this phenomenon in the limit of infinite dimensions. However, real data is finite dimensional, and hence the infinite-dimensional characterisation is insufficient. Here we quantify the phenomenon more precisely, for the possibly high but finite dimensional case in a distribution-free manner, by bounding the tails of the probability that distances become meaningless. As an application, we show how this can be used to assess the concentration of a given distance function in some unknown data distribution solely on the basis of an available data sample from it. This can be used to test and detect problematic cases more rigorously than it is currently possible, and we demonstrate the working of this approach on both synthetic data and ten real-world data sets from different domains."
statistics  probability  curse_of_dimensonality  hypothesis_testing  concentration_of_measure  in_NB  high-dimensional_probability 
december 2011 by cshalizi
[1111.0328] The Average Likelihood Ratio for Large-scale Multiple Testing and Detecting Sparse Mixtures
"Large-scale multiple testing problems require the simultaneous assessment of many p-values. This paper compares several methods to assess the evidence in multiple binomial counts of p-values: the maximum of the binomial counts after standardization (the `higher-criticism statistic'), the maximum of the binomial counts after a log-likelihood ratio transformation (the `Berk-Jones statistic'), and a newly introduced average of the binomial counts after a likelihood ratio transformation. Simulations show that the higher criticism statistic has a superior performance to the Berk-Jones statistic in the case of very sparse alternatives (sparsity coefficient $beta gtrapprox 0.75$), while the situation is reversed for $beta lessapprox 0.75$. The average likelihood ratio is found to combine the favorable performance of higher criticism in the very sparse case with that of the Berk-Jones statistic in the less sparse case and thus appears to dominate both statistics. Some asymptotic optimality theory is considered but found to set in too slowly to illuminate the above findings, at least for sample sizes up to one million. In contrast, asymptotic approximations to the critical values of the Berk-Jones statistic that have been developed by Wellner and Koltchinskii (2003) and Jager and Wellner (2007) are found to give surprisingly accurate approximations even for quite small sample sizes."
in_NB  statistics  hypothesis_testing  multiple_testing 
november 2011 by cshalizi
Learning from positive and unlabeled examples by enforcing statistical significance
"Given a finite but large set of objects described by a vector of features, only a small subset of which have been labeled as ""positive"" with respect to a class of interest, we consider the problem of characterizing the positive class. We formalize this as the problem of learning a feature based score function that minimizes the p-value of a non parametric statistical hypothesis test. For linear score functions over the original feature space or over one of its kernelized versions, we provide a solution of this problem computed by a one-class SVM applied on a surrogate dataset obtained by sampling subsets of the overall set of objects and representing them by their average feature-vector shifted by the average feature-vector of the original sample of positive examples. We carry out experiments with this method on the prediction of targets of transcription factors in two different organisms, E. Coli and S. Cererevisiae. Our method extends enrichment analysis commonly carried out in Bioinformatics and its results outperform common solutions to this problem. "
to:NB  machine_learning  hypothesis_testing  semi-supervised_learning 
november 2011 by cshalizi
[1110.3599] Testing over a continuum of null hypotheses
"We introduce a theoretical framework for performing statistical hypothesis testing simultaneously over a fairly general, possibly uncountably infinite, set of null hypotheses. This extends the standard statistical setting for multiple hypotheses testing, which is restricted to a finite set. This work is motivated by numerous modern applications where the observed signal is modeled by a stochastic process over a continuum. As a measure of type I error, we extend the concept of false discovery rate (FDR) to this setting. The FDR is defined as the average ratio of the measure of two random sets, so that its study presents some challenge and is of some intrinsic mathematical interest. Our main result shows how to use the $p$-value process to control the FDR at a nominal level, either under arbitrary dependence of $p$-values, or under the assumption that the finite dimensional distributions of the $p$-value process have positive correlations of a specific type (weak PRDS). Both cases generalize existing results established in the finite setting, the latter one leading to a less conservative procedure. The interest of this approach is demonstrated in several non-parametric examples: testing the mean/signal in a Gaussian white noise model, testing the intensity of a Poisson process and testing the c.d.f. of i.i.d. random variables. Conceptually, an interesting feature of the setting advocated here is that it focuses directly on the intrinsic hypothesis space associated with a testing model on a random process, without referring to an arbitrary discretization."
in_NB  statistics  hypothesis_testing  multiple_testing  stochastic_processes 
october 2011 by cshalizi
[1110.1248] An algorithm to compute the power of Monte Carlo tests with guaranteed precision
"This article presents an algorithm that generates an exact (conservative) confidence interval of a specified length and coverage probability for the power of a Monte Carlo test (such as a bootstrap or permutation test). It is the first method that achieves this aim for almost any Monte Carlo test. The existing research on power estimation for Monte Carlo tests has focused on obtaining as accurate a result as possible for a fixed computational effort. However, the methods proposed do not provide any guarantee of precision, in the sense that they cannot report a confidence interval to accompany their estimate of the power. Conversely in this article the computational effort is random. The algorithm operates until a confidence interval can be constructed that meets the requirements of the user, in terms of length and coverage probability. We show that, surprisingly, by generating two more datasets that what might have been assumed to be sufficient, the expected number of steps required by the algorithm is finite in many cases of practical interest. These include, for instance, any situation where the distribution of the p-value is absolutely continuous or if it is discrete with finite support. The algorithm is implemented in the R package simctest."
statistics  hypothesis_testing  confidence_sets  monte_carlo  bootstrap  in_NB 
october 2011 by cshalizi
[1106.3670] Adjusting for selection bias in testing multiple families of hypotheses
"In many large multiple testing problems the hypotheses are divided into families. Given the data, families with evidence for true discoveries are selected, and hypotheses within them are tested. Neither controlling the error-rate in each family separately nor controlling the error-rate over all hypotheses together can assure that an error-rate is controlled in the selected families. We formulate this concern about selective inference in its generality, for a very wide class of error-rates and for any selection criterion, and present an adjustment of the testing level inside the selected families that retains the average error-rate over the selected families."
multiple_testing  hypothesis_testing  statistics 
june 2011 by cshalizi
[0812.2712] Sequential multiple hypothesis testing in presence of control variables
Each experiment has a control setting; the distribution of responses is independent given control settings; how to design the experiment so as to decide among K hypotheses quickly and reliably?
experimental_design  hypothesis_testing  statistics 
april 2011 by cshalizi
A Multivariate Kolmogorov-Smirnov Test of Goodness of Fit
Does not seem to be implemented in R.  Next time, make into a programming project, and have them compare to simple bootstrapping? --- This copy appears to be on the personal webpage of one of the authors; DOI 10.1016/S0167-7152(97)00020-5
hypothesis_testing  goodness-of-fit  kolmogorov-smirnov-test  statistics  to_teach:undergrad-ADA 
april 2011 by cshalizi
xkcd: Significant
... and this goes on the office doors of statisticians everywhere.
funny:geeky  funny:because_its_true  xkcd  hypothesis_testing  multiple_comparisons  cartoons 
april 2011 by cshalizi
[1102.5750] Neyman-Pearson classification, convexity and stochastic constraints
"Motivated by problems of anomaly detection, this paper implements the Neyman-Pearson paradigm to deal with asymmetric errors in binary classification with a convex loss. Given a finite collection of classifiers, we combine them and obtain a new classifier that satisfies simultaneously the two following properties with high probability: (i) its probability of type I error is below a pre-specified level and (ii), it has probability of type II error close to the minimum possible. The proposed classifier is obtained by solving an optimization problem with an empirical objective and an empirical constraint. New techniques to handle such problems are developed and have consequences on chance constrained programming."
Final version: http://jmlr.csail.mit.edu/papers/v12/rigollet11a.html
in_NB  learning_theory  statistics  hypothesis_testing  convexity  machine_learning  optimization  have_read  rigollet.philippe 
march 2011 by cshalizi
Statistical Inference on Random Graphs; Comparative Power Analysis- Journal of Computational and Graphical Statistics - 0(0):1
"We present a comparative power analysis, via Monte Carlo, of various graph invariants used as statistics for testing graph homogeneity versus a “chatter” alternative—the existence of a local region of excessive activity. Our results indicate that statistical inference on random graphs, even in a relatively simple setting, can be decidedly nontrivial. We find that none of the graph invariants considered is uniformly most powerful throughout our space of alternatives. Code for reproducing all the simulation results presented in this article is available online."
statistics  network_data_analysis  hypothesis_testing  re:smoothing_adjacency_matrices  re:network_differences 
february 2011 by cshalizi
[1012.4401] A Note on a Characterization of R'enyi Measures and its Relation to Composite Hypothesis Testing
"The R'enyi information measures are characterized in terms of their Shannon counterparts, and properties of the former are recovered from first principle via the associated properties of the latter."  I think if I stared at theorem 1 for a bit longer it would give me a new intuitive sense of what Renyi entropy is, but that's low priority right now...
information_theory  renyi_entropy  via:ded-maxim  in_NB  hypothesis_testing 
december 2010 by cshalizi
"Is Frequentist Testing Vulenrable to the Base-Rate Fallacy?" (Spanos) - Philosophy of Science
"This article calls into question the charge that frequentist testing is susceptible to the base-rate fallacy. It is argued that the apparent similarity between examples like the Harvard Medical School test and frequentist testing is highly misleading. A closer scrutiny reveals that such examples have none of the basic features of a proper frequentist test, such as legitimate data, hypotheses, test statistics, and sampling distributions. Indeed, the relevant error probabilities are replaced with the false positive/negative rates that constitute deductive calculations based on known probabilities among events. As a result, the ampliative dimension of frequentist induction—learning from data about the underlying data-generating mechanism—is missing."
statistics  philosophy_of_science  re:phil-of-bayes_paper  hypothesis_testing  spanos.aris 
october 2010 by cshalizi
Ehm, Kornmeier, Heinrich: Multiple testing along a tree
"Suitable sequentially rejective multiple test procedures allow to “zoom in" on clusters of relevant variables in high-dimensional regression (Meinshausen [7]), or on regions of interest in some search space (Heinrich et al. [3]; Meinshausen et al. [8]). As a common framework for these schemes we propose to consider multiple testing along a tree of hypotheses together with a “keep rejecting until first acceptance" rule. Particular topics addressed in this note are control of the familywise error, and some variants and basic properties of the procedure."
multiple_testing  hypothesis_testing  model_selection  re:AoS_project  to_read 
may 2010 by cshalizi
[1005.1327] Statistical Model Checking : An Overview
"Quantitative properties of stochastic systems are usually specified in logics that allow one to compare the measure of executions satisfying certain temporal properties with thresholds. The model checking problem for stochastic systems with respect to such logics is typically solved by a numerical approach that iteratively computes (or approximates) the exact measure of paths satisfying relevant subformulas; the algorithms themselves depend on the class of systems being analyzed as well as the logic used for specifying the properties. Another approach to solve the model checking problem is to \emph{simulate} the system for finitely many runs, and use \emph{hypothesis testing} to infer whether the samples provide a \emph{statistical} evidence for the satisfaction or violation of the specification. In this short paper, we survey the statistical approach, and outline its main advantages in terms of efficiency, uniformity, and simplicity."
simulation  stochastic_models  model-checking  via:tozier  hypothesis_testing 
may 2010 by cshalizi
10-705 Intermediate Statistics, Fall 2009
Larry's version of the typical masters-level course based on Casella and Berger. Note: half of what he covers is not in Casella and Berger. (For example, he starts with VC theory!)
learning_theory  statistics  estimation  hypothesis_testing  prediction  minimax  bootstrap  model_selection  regression  classifiers  confidence_sets  wasserman.larry  kith_and_kin 
april 2010 by cshalizi
How persuasive is a good fit? A comment on theory testing.
Everything useful in this paper is contained in their Figure 1 and its caption, and even then I think they're incomplete. (In the top left of Figure 1, the "strong support" quadrant, draw another narrow band along the opposite diagonal to the first theory, also going through the small cross of observations: this would be a distinct and incompatible theory which also makes a narrow range of predictions that also match the precisely-measured data.)
methodological_advice  hypothesis_testing  statistics  psychology  via:kass  have_read  re:phil-of-bayes_paper 
april 2010 by cshalizi
Lindsay, Liu: Model Assessment Tools for a Model False World
"a model credibility index, which is designed to serve as a one-number summary measure of model adequacy. We define the index to be the maximum sample size at which samples from the model and those from the true data generating mechanism are nearly indistinguishable. We use standard notions from hypothesis testing to make this definition precise. We use data subsampling to estimate the index" --- To be blogged, after the paper with Andy is done.
statistics  misspecification  re:phil-of-bayes_paper  hypothesis_testing  bootstrap  have_read  to:blog 
april 2010 by cshalizi
Luen, Stark: Testing earthquake predictions
Back-up for the Hough review. Also: might make a good mini-project for the data-mining class, though I'd have to teach about spatio-temporal methods (which I should anyway [but where would the time come?]).
earthquakes  hypothesis_testing  bad_data_analysis  stark.philip  statistics  prediction  have_read  to_teach:data-mining  to_teach:undergrad-ADA 
december 2009 by cshalizi
Evans, Jang: Invariant P-values for model checking
Interesting, but I suspect the bits about approximating an underlying discrete distribution could be lifted...
statistics  hypothesis_testing  model-checking  p-values  re:phil-of-bayes_paper  have_read 
december 2009 by cshalizi
Learning with Finite Memory
Cf. Leo's results on estimation with finite state machines.
statistics  hypothesis_testing  to:NB  have_read 
august 2009 by cshalizi
Superstars without Talent? The Yule Distribution Controversy
"Chung and Cox (1994) provided an intuitively appealing stochastic model indicating that superstars may exist regardless of talent, giving rise to the Yule distribution. We adopt a different empirical approach and test its goodness of fit using a parametric bootstrap and several powerful test statistics. Just like the discrete Pareto distribution, it is overwhelmingly rejected: it is a fairly accurate approximation of the lower quantiles of the superstar distribution but overestimates the snowball effect that makes consumers purchase records of the most successful artists. In other words, the Yule distribution captures stardom, but not superstardom. A generalization of the Yule distribution provides an excellent fit in two of the three data sets." --- We only seem to subscribe with a one-issue delay (?); preprint at http://swopec.hhs.se/hastef/papers/hastef0658.pdf
heavy_tails  inequality  economics_of_superstars  hypothesis_testing  economics  statistics  evisceration  have_read 
july 2009 by cshalizi
Tirvengadum: Linguistic Fingerprints and Literary Fraud
Using the case of an author with a known pseudonym to test methods for establishing identity of authorship. Conclusion: it may be that <
literary_criticism  author-identification  hypothesis_testing  to_teach  via:chl  textual_criticism 
june 2009 by cshalizi
The Likelihood Ratio Test Under Nonstandard Conditions
I very much like the approach of treating the likelihood ratio as an empirical process; why haven't I seen it before? (Also, the state-of-the-art in simulating Gaussian processes must be much better now than what Hansen was doing in '92, which would make this even more practical.)
empirical_processes  hypothesis_testing  statistics  likelihood_ratio_tests  econometrics  time_series  hansen.bruce  have_read 
june 2009 by cshalizi
[0905.4937] A criterion for hypothesis testing for stationary processes
"Given a discrete-valued sample X_1... X_n we wish to test whether it was generated by a process belonging to a family H_0, or it was generated by a process outside H_0. All process distributions are assumed stationary ergodic, and no further probabilistic or parametric assumptions are made. We require the Type I error of the test to be uniformly bounded, while the probability of Type II error has to tend to zero as the sample size increases. For this notion of consistency we provide necessary and sufficient conditions on the family H_0 for the existence of a consistent test. "
statistics  statistical_inference_for_stochastic_processes  ergodic_theory  hypothesis_testing  ryabko.daniil  to:NB  to_read 
june 2009 by cshalizi
A consistent nonparametric test of ergodicity for time series with applications
They completely fail to deal with the basic problem that ergodic components are invariant, so that every realization of a stochastic process is always confined to a single component. Hence no test on a single realization has ANY ability to detect non-ergodicity; this could ONLY be done with multiple realizations from the same source.
statistics  time_series  ergodic_theory  nonparametrics  have_read  hypothesis_testing  shot_after_a_fair_trial 
march 2009 by cshalizi
Support of the Null Hypothesis
Aleks Jakulin on the _Journal of Articles in Support of the Null Hypothesis_ (with links to more such in the comments)
hypothesis_testing  paper_writing  social_life_of_the_mind  why_oh_why_cant_we_have_a_better_academic_publishing_system  funny:geeky 
march 2009 by cshalizi
« earlier      

related tags

antidepressants  astronomy  author-identification  bad_data_analysis  bartlett.m.s.  bergstrom.carl  bibliometry  book_reviews  bootstrap  boris  buhlmann.peter  cai.t._tony  cartoons  causal_inference  change-point_problem  citation_networks  clarke.kevin  classifiers  clustering  community_discovery  concentration_of_measure  confidence_sets  convexity  cosmology  curse_of_dimensonality  data_mining  decision_trees  delong.brad  density_estimation  earthquakes  econometrics  economics  economics_of_superstars  empirical_processes  encompassing  epidemic_models  ergodic_theory  estimation  evisceration  experimental_design  exponential_families  filtering  fmri  fourier_analysis  funny:academic  funny:because_its_true  funny:geeky  funny:laughing_instead_of_screaming  funny:malicious  genovese.christopher  goodness-of-fit  hansen.bruce  harrison.matt  have_read  heavy_tails  high-dimensional_probability  history_of_science  history_of_statistics  hypothesis_testing  independence_testing  indirect_inference  inequality  information_retrieval  information_theory  inverse_problems  in_NB  kernel_estimators  kith_and_kin  kolmogorov-smirnov-test  lagrange_multipliers  lang.kevin  large_deviations  learning_theory  likelihood_ratio_tests  literary_criticism  long-memory_processes  long-range_dependence  machine_learning  markov_models  martingales  mccloskey.deirdre  medical_statistics  meta-analysis  methodological_advice  minimax  minimum_description_length  mis-specification_testing  misspecification  mizon.grayham  model-checking  model_selection  monte_carlo  multiple_comparisons  multiple_testing  network_data_analysis  neural_data_analysis  neuroscience  neyman-pearson_lemma  neyman.jerzy  neyman_smooth_tests  nonparametrics  nukes  optimization  p-values  paper_writing  philosophy_of_science  precision-recall  prediction  prequentialism  probability  psychology  publication_bias  random_fields  re:almost_none  re:AoS_project  re:LICORS  re:network_differences  re:neutral-model-of-inquiry  re:phil-of-bayes_paper  re:smoothing_adjacency_matrices  re:social-networks-as-sensor-networks  re:stacs  regression  renyi_entropy  richard.jean-francois  rigollet.philippe  rosvall.martin  ryabko.daniil  salmon  semi-supervised_learning  shot_after_a_fair_trial  simulation  social_life_of_the_mind  social_science_methodology  spanos.aris  spatial_statistics  stark.philip  stationarity  statistical_inference_for_stochastic_processes  statistics  stochastic_models  stochastic_processes  stupid_security  sufficiency  terrorism_fears  textual_criticism  tibshirani.robert  time_series  to:blog  to:NB  to_be_shot_after_a_fair_trial  to_read  to_teach  to_teach:complexity-and-inference  to_teach:data-mining  to_teach:undergrad-ADA  track_down_references  two-sample_tests  via:chl  via:ded-maxim  via:kass  via:matthew_berryman  via:tozier  visual_display_of_quantitative_information  wasserman.larry  why_oh_why_cant_we_have_a_better_academic_publishing_system  xkcd  ziliak.stephen 

Copy this bookmark:



description:


tags: