cshalizi + hypothesis_testing 94
Likelihood inference for discriminating between long-memory and change-point models - Yau - 2012 - Journal of Time Series Analysis - Wiley Online Library
13 days ago by cshalizi
"We develop a likelihood ratio (LR) test procedure for discriminating between a short-memory time series with a change-point (CP) and a long-memory (LM) time series. Under the null hypothesis, the time series consists of two segments of short-memory time series with different means and possibly different covariance functions. The location of the shift in the mean is unknown. Under the alternative, the time series has no shift in mean but rather is LM. The LR statistic is defined as the normalized log-ratio of the Whittle likelihood between the CP model and the LM model, which is asymptotically normally distributed under the null. The LR test provides a parametric alternative to the CUSUM test proposed by Berkes et al. (2006). Moreover, the LR test is more general than the CUSUM test in the sense that it is applicable to changes in other marginal or dependence features other than a change-in-mean. We show its good performance in simulations and apply it to two data examples."
to:NB
time_series
change-point_problem
long-range_dependence
statistics
to_teach:undergrad-ADA
hypothesis_testing
13 days ago by cshalizi
[1204.1563] Generalized Error Exponents for Sparse Sample Goodness of Fit Tests
6 weeks ago by cshalizi
"We investigate the sparse sample goodness-of-fit problem, where the number of samples $n$ is smaller than the size of the alphabet $m$. The goal of this work is to find an appropriate criterion to analyze statistical tests in this setting. A suitable model for analysis is the high-dimensional model in which both $n$ and $m$ tend to infinity, and $n=o(m)$. We propose a new performance criterion based on large deviation analysis, which generalizes the classical error exponent applicable for large sample problems (in which $m=O(n)$). This new criterion provides insights that are not available from asymptotic consistency or CLT analysis. The main results are:
(i) The best achievable probability of error $P_e$ decays as $-log(P_e)=(n^2/m)(1+o(1))J$ for some $J>0$.
(ii) A well-known coincidence-based test attains the optimal generalized error exponent.
(iii) The widely used Pearson's chi-square test has J=0.
(iv) The contributions (i)-(iii) are established under the assumption that the distribution under the null hypothesis is uniform. For the non-uniform case, a new test is proposed, with a non-zero generalized error exponent."
to:NB
hypothesis_testing
re:LICORS
statistics
large_deviations
goodness-of-fit
(i) The best achievable probability of error $P_e$ decays as $-log(P_e)=(n^2/m)(1+o(1))J$ for some $J>0$.
(ii) A well-known coincidence-based test attains the optimal generalized error exponent.
(iii) The widely used Pearson's chi-square test has J=0.
(iv) The contributions (i)-(iii) are established under the assumption that the distribution under the null hypothesis is uniform. For the non-uniform case, a new test is proposed, with a non-zero generalized error exponent."
6 weeks ago by cshalizi
Taylor & Francis Online :: Statistical Inference on Random Graphs: Comparative Power Analyses via Monte Carlo - Journal of Computational and Graphical Statistics - Volume 20, Issue 2
8 weeks ago by cshalizi
"We present a comparative power analysis, via Monte Carlo, of various graph invariants used as statistics for testing graph homogeneity versus a “chatter” alternative—the existence of a local region of excessive activity. Our results indicate that statistical inference on random graphs, even in a relatively simple setting, can be decidedly nontrivial. We find that none of the graph invariants considered is uniformly most powerful throughout our space of alternatives. Code for reproducing all the simulation results presented in this article is available online."
to:NB
re:network_differences
statistics
hypothesis_testing
network_data_analysis
8 weeks ago by cshalizi
[0803.2095] Properties of higher criticism under strong dependence
9 weeks ago by cshalizi
"The problem of signal detection using sparse, faint information is closely related to a variety of contemporary statistical problems, including the control of false-discovery rate, and classification using very high-dimensional data. Each problem can be solved by conducting a large number of simultaneous hypothesis tests, the properties of which are readily accessed under the assumption of independence. In this paper we address the case of dependent data, in the context of higher criticism methods for signal detection. Short-range dependence has no first-order impact on performance, but the situation changes dramatically under strong dependence. There, although higher criticism can continue to perform well, it can be bettered using methods based on differences of signal values or on the maximum of the data. The relatively inferior performance of higher criticism in such cases can be explained in terms of the fact that, under strong dependence, the higher criticism statistic behaves as though the data were partitioned into very large blocks, with all but a single representative of each block being eliminated from the dataset."
to:NB
statistics
hypothesis_testing
multiple_testing
stochastic_processes
9 weeks ago by cshalizi
Kaiser , Lahiri , Nordman : Goodness of fit tests for a class of Markov random field models
10 weeks ago by cshalizi
"This paper develops goodness of fit statistics that can be used to formally assess Markov random field models for spatial data, when the model distributions are discrete or continuous and potentially parametric. Test statistics are formed from generalized spatial residuals which are collected over groups of nonneighboring spatial observations, called concliques. Under a hypothesized Markov model structure, spatial residuals within each conclique are shown to be independent and identically distributed as uniform variables. The information from a series of concliques can be then pooled into goodness of fit statistics. Under some conditions, large sample distributions of these statistics are explicitly derived for testing both simple and composite hypotheses, where the latter involves additional parametric estimation steps. The distributional results are verified through simulation, and a data example illustrates the method for model assessment."
to:NB
to_read
statistics
spatial_statistics
random_fields
goodness-of-fit
hypothesis_testing
re:stacs
markov_models
10 weeks ago by cshalizi
Meinshausen , Maathuis , Bühlmann : Asymptotic optimality of the Westfall–Young permutation procedure for multiple testing under dependence
12 weeks ago by cshalizi
"Test statistics are often strongly dependent in large-scale multiple testing applications. Most corrections for multiplicity are unduly conservative for correlated test statistics, resulting in a loss of power to detect true positives. We show that the Westfall–Young permutation method has asymptotically optimal power for a broad class of testing problems with a block-dependence and sparsity structure among the tests, when the number of tests tends to infinity."
to:NB
statistics
multiple_testing
hypothesis_testing
buhlmann.peter
12 weeks ago by cshalizi
[1202.1377] Statistical significance in high-dimensional linear models
12 weeks ago by cshalizi
"We propose a method for constructing p-values for general hypotheses in a high-dimensional linear model. The hypotheses can be local for testing a single regression parameter or they may be more global involving several up to all parameters. Furthermore, when considering many hypotheses, we show how to adjust for multiple testing taking dependence among the p-values into account. Our technique is based on Ridge estimation with an additional correction term due to a substantial projection bias in high dimensions. We prove strong error control for our p-values and provide sufficient conditions for detection: for the former, we do not make any assumption on the size of the true underlying regression coefficients. We demonstrate the method in simulated examples and a real data application."
in_NB
statistics
regression
goodness-of-fit
hypothesis_testing
buhlmann.peter
12 weeks ago by cshalizi
[1202.3775] Kernel-based Conditional Independence Test and Application in Causal Discovery
12 weeks ago by cshalizi
"Conditional independence testing is an important problem, especially in Bayesian network learning and causal discovery. Due to the curse of dimensionality, testing for conditional independence of continuous variables is particularly challenging. We propose a Kernel-based Conditional Independence test (KCI-test), by constructing an appropriate test statistic and deriving its asymptotic distribution under the null hypothesis of conditional independence. The proposed method is computationally efficient and easy to implement. Experimental results show that it outperforms other methods, especially when the conditioning set is large or the sample size is not very large, in which case other methods encounter difficulties."
statistics
kernel_estimators
independence_testing
hypothesis_testing
causal_inference
in_NB
have_read
to:blog
to_teach:undergrad-ADA
12 weeks ago by cshalizi
[0809.1053] An impossibility result for process discrimination
12 weeks ago by cshalizi
"Two series of binary observations $x_1,x_1,...$ and $y_1,y_2,...$ are presented: at each time $ninN$ we are given $x_n$ and $y_n$. It is assumed that the sequences are generated independently of each other by two B-processes. We are interested in the question of whether the sequences represent a typical realization of two different processes or of the same one. We demonstrate that this is impossible to decide, in the sense that every discrimination procedure is bound to err with non-negligible frequency when presented with sequences from some B-processes. This contrasts earlier positive results on B-processes, in particular those showing that there are consistent $bar d$-distance estimates for this class of processes."
to:NB
statistics
time_series
stochastic_processes
ergodic_theory
statistical_inference_for_stochastic_processes
hypothesis_testing
12 weeks ago by cshalizi
[0810.2276] A generalized portmanteau test of independence between two stationary time series
12 weeks ago by cshalizi
"We propose generalized portmanteau-type test statistics in the frequency domain to test independence between two stationary time series. The test statistics are formed analogous to the one in Chen and Deo (2004, Econometric Theory 20, 382-416), who extended the applicability of portmanteau goodness-of-fit test to the long memory case. Under the null hypothesis of independence, the asymptotic standard normal distributions of the proposed statistics are derived under fairly mild conditions. In particular, each time series is allowed to possess short memory, long memory or anti-persistence. A simulation study shows that the tests have reasonable size and power properties."
in_NB
statistics
time_series
hypothesis_testing
independence_testing
12 weeks ago by cshalizi
The optimal discovery procedure: a new approach to simultaneous significance testing - Storey - 2007 - Journal of the Royal Statistical Society: Series B (Statistical Methodology) - Wiley Online Library
february 2012 by cshalizi
"The Neyman–Pearson lemma provides a simple procedure for optimally testing a single hypothesis when the null and alternative distributions are known. This result has played a major role in the development of significance testing strategies that are used in practice. Most of the work extending single-testing strategies to multiple tests has focused on formulating and estimating new types of significance measures, such as the false discovery rate. These methods tend to be based on p-values that are calculated from each test individually, ignoring information from the other tests. I show here that one can improve the overall performance of multiple significance tests by borrowing information across all the tests when assessing the relative significance of each one, rather than calculating p-values for each test individually. The ‘optimal discovery procedure’ is introduced, which shows how to maximize the number of expected true positive results for each fixed number of expected false positive results. The optimality that is achieved by this procedure is shown to be closely related to optimality in terms of the false discovery rate. The optimal discovery procedure motivates a new approach to testing multiple hypotheses, especially when the tests are related. As a simple example, a new simultaneous procedure for testing several normal means is defined; this is surprisingly demonstrated to outperform the optimal single-test procedure, showing that a method which is optimal for single tests may no longer be optimal for multiple tests. Connections to other concepts in statistics are discussed, including Stein's paradox, shrinkage estimation and the Bayesian approach to hypothesis testing."
to:NB
statistics
hypothesis_testing
multiple_comparisons
february 2012 by cshalizi
Henze : A Multivariate Two-Sample Test Based on the Number of Nearest Neighbor Type Coincidences
february 2012 by cshalizi
"For independent $d$-variate random samples $X_1, cdots, X_{n_1}$ i.i.d. $f(x), Y_1, cdots, Y_{n_2}$ i.i.d. $g(x)$, where the densities $f$ and $g$ are assumed to be continuous a.e., consider the number $T$ of all $k$ nearest neighbor comparisons in which observations and their neighbors belong to the same sample. We show that, if $f = g$ a.e., the limiting (normal) distribution of $T$, as $min(n_1, n_2) rightarrow infty, n_1/(n_1 + n_2) rightarrow tau, 0 < tau < 1$, does not depend on $f$. An omnibus procedure for testing the hypothesis $H_0: f = g$ a.e. is obtained by rejecting $H_0$ for large values of $T$. The result applies to a general distance (generated by a norm on $mathbb{R}^d$) for determining nearest neighbors, and it generalizes to the multisample situation."
to:NB
to_read
statistics
hypothesis_testing
two-sample_tests
re:AoS_project
february 2012 by cshalizi
[1202.1561] Tree Models for Difference and Change Detection in a Complex Environment
february 2012 by cshalizi
"A new family of tree models is proposed, which we call "differential trees." A differential tree model is constructed from multiple data sets and aims to detect distributional differences between them. The new methodology differs from the existing difference and change detection techniques in its nonparametric nature, model construction from multiple data sets, and applicability to high-dimensional data. Through a detailed study of an arson case in New Zealand, where an individual is known to have been laying vegetation fires within a certain time period, we illustrate how these models can help detect changes in the frequencies of event occurrences and uncover unusual clusters of events in a complex environment."
--- After reading, I think their exposition is needlessly hard to follow, but let me take a stab at it. In an ordinary classification tree, we are interested in the distribution of the class labels Y given the predictors X, i.e., Pr(Y|X), and make splits on X so that (in essence) the conditional entropy H[Y|X] becomes small. This is of course equivalent to making splits so that the divergence of Pr(Y|X) from Pr(Y) is maximized. What they are interested in is not classification but _describing_ how the different classes are distinct, so the relevant distribution is Pr(X|Y), and they want a big divergence between Pr(X) and Pr(X|Y).
to:NB
re:network_differences
statistics
hypothesis_testing
density_estimation
decision_trees
have_read
data_mining
two-sample_tests
--- After reading, I think their exposition is needlessly hard to follow, but let me take a stab at it. In an ordinary classification tree, we are interested in the distribution of the class labels Y given the predictors X, i.e., Pr(Y|X), and make splits on X so that (in essence) the conditional entropy H[Y|X] becomes small. This is of course equivalent to making splits so that the divergence of Pr(Y|X) from Pr(Y) is maximized. What they are interested in is not classification but _describing_ how the different classes are distinct, so the relevant distribution is Pr(X|Y), and they want a big divergence between Pr(X) and Pr(X|Y).
february 2012 by cshalizi
Simon and Tibshirani: COMMENT ON “DETECTING NOVEL ASSOCIATIONS IN LARGE DATA SETS” BY RESHEF ET AL, SCIENCE DEC 16, 2011
february 2012 by cshalizi
Since this is publicly online now, I guess I can write that post.
information_theory
statistics
hypothesis_testing
tibshirani.robert
to:blog
independence_testing
february 2012 by cshalizi
f-Divergence Estimation and Two-Sample Homogeneity Test Under Semiparametric Density-Ratio Models
february 2012 by cshalizi
"A density ratio is defined by the ratio of two probability densities. We study the inference problem of density ratios and apply a semiparametric density-ratio estimator to the two-sample homogeneity test. In the proposed test procedure, the $f$-divergence between two probability densities is estimated using a density-ratio estimator. The $f$ -divergence estimator is then exploited for the two-sample homogeneity test. We derive an optimal estimator of $f$-divergence in the sense of the asymptotic variance in a semiparametric setting, and provide a statistic for two-sample homogeneity test based on the optimal estimator. We prove that the proposed test dominates the existing empirical likelihood score test. Through numerical studies, we illustrate the adequacy of the asymptotic theory for finite-sample inference."
to:NB
statistics
density_estimation
information_theory
hypothesis_testing
two-sample_tests
february 2012 by cshalizi
Nonparametric Tests for Homogeneity Based on Non-Bipartite Matching
january 2012 by cshalizi
"Given a sequence of observations, has a change occurred in the underlying probability distribution with respect to observation order? This problem of detecting change points arises in a variety of applications including health prognostics for mechanical systems, syndromic disease surveillance in geographically dispersed populations, anomaly detection in information networks, and multivariate process control in general. Detecting change points in high-dimensional settings is challenging, and most change-point methods for multidimensional problems rely upon distributional assumptions or the use of observation history to model probability distributions. We present three new nonparametric statistical tests for heterogeneity based on the combinatorial properties of minimum non-bipartite matching (MNBM). The key idea underlying each of these tests is that if a sequence of independent random observations undergoes a change in distribution—either an abrupt “shift” or a gradual “drift”—a MNBM based on inter-point distances tends to produce pairings that are closer in the sequence labeling than would be the case if the observations were drawn from the same distribution. Our tests follow on the work of Rosenbaum (2005) who used MNBM to derive a simple cross-match test statistic for the two-sample problem based on this idea. Similar ideas are present in the minimum spanning tree (MST) test derived by Friedman and Rafsky (1979, 1981). We extend these approaches by utilizing ensembles of orthogonal MNBMs which greatly increase information extraction from the data, leading to tests that compare favorably to parametric procedures while maintaining level and good power properties across distributions."
to:NB
statistics
hypothesis_testing
density_estimation
change-point_problem
two-sample_tests
january 2012 by cshalizi
[0805.4136] Inference for the dark energy equation of state using Type IA supernova data
january 2012 by cshalizi
"The surprising discovery of an accelerating universe led cosmologists to posit the existence of "dark energy"--a mysterious energy field that permeates the universe. Understanding dark energy has become the central problem of modern cosmology. After describing the scientific background in depth, we formulate the task as a nonlinear inverse problem that expresses the comoving distance function in terms of the dark-energy equation of state. We present two classes of methods for making sharp statistical inferences about the equation of state from observations of Type Ia Supernovae (SNe). First, we derive a technique for testing hypotheses about the equation of state that requires no assumptions about its form and can distinguish among competing theories. Second, we present a framework for computing parametric and nonparametric estimators of the equation of state, with an associated assessment of uncertainty. Using our approach, we evaluate the strength of statistical evidence for various competing models of dark energy. Consistent with current studies, we find that with the available Type Ia SNe data, it is not possible to distinguish statistically among popular dark-energy models, and that, in particular, there is no support in the data for rejecting a cosmological constant. With much more supernova data likely to be available in coming years (e.g., from the DOE/NASA Joint Dark Energy Mission), we address the more interesting question of whether future data sets will have sufficient resolution to distinguish among competing theories."
--- I am biased, because Chris G. and Larry are friends, but this seems to me a model of the modern applied statistics paper: use interesting statistical tools to say something helpful about an important scientific problem on its own terms, rather than distorting the problem until it "looks like a nail".
in_NB
kith_and_kin
cosmology
astronomy
inverse_problems
nonparametrics
estimation
hypothesis_testing
statistics
bootstrap
genovese.christopher
wasserman.larry
have_read
--- I am biased, because Chris G. and Larry are friends, but this seems to me a model of the modern applied statistics paper: use interesting statistical tools to say something helpful about an important scientific problem on its own terms, rather than distorting the problem until it "looks like a nail".
january 2012 by cshalizi
Phys. Rev. E 84, 051138 (2011): Anomalous diffusion: Testing ergodicity breaking in experimental data
december 2011 by cshalizi
"Recent advances in single-molecule experiments show that various complex systems display nonergodic behavior. In this paper, we show how to test ergodicity and ergodicity breaking in experimental data. Exploiting the so-called dynamical functional, we introduce a simple test which allows us to verify ergodic properties of a real-life process. The test can be applied to a large family of stationary infinitely divisible processes. We check the performance of the test for various simulated processes and apply it to experimental data describing the motion of mRNA molecules inside live Escherichia coli cells. We show that the data satisfy necessary conditions for mixing and ergodicity. The detailed analysis is presented in the supplementary material."
in_NB
to_read
ergodic_theory
hypothesis_testing
stochastic_processes
statistical_inference_for_stochastic_processes
december 2011 by cshalizi
Quantifying the failure of bootstrap likelihood ratio tests
december 2011 by cshalizi
"When testing geometrically irregular parametric hypotheses, the bootstrap is an intuitively appealing method to circumvent difficult distribution theory. It has been shown, however, that the usual bootstrap is inconsistent in estimating the asymptotic distributions involved in such problems. This paper is concerned with the asymptotic size of likelihood ratio tests when critical values are computed using the inconsistent bootstrap. We clarify how the asymptotic size of such a test can be obtained from the size of the corresponding bootstrap test in the relevant limiting normal experiment. For boundary problems, that is, hypotheses given by convex cones, we show the bootstrap test to always be anticonservative, and we compute the size numerically for different two-dimensional examples. The examples illustrate that the size can be below or above the nominal level, and reveal that the relationship between the size of the test and the geometry of the considered hypotheses is surprisingly subtle."
in_NB
statistics
bootstrap
hypothesis_testing
december 2011 by cshalizi
Non-parametric detection of meaningless distances in high dimensional data - Ata Kabán - Statistics and Computing, Volume 22, Number 2
december 2011 by cshalizi
"Distance concentration is the phenomenon that, in certain conditions, the contrast between the nearest and the farthest neighbouring points vanishes as the data dimensionality increases. It affects high dimensional data processing, analysis, retrieval, and indexing, which all rely on some notion of distance or dissimilarity. Previous work has characterised this phenomenon in the limit of infinite dimensions. However, real data is finite dimensional, and hence the infinite-dimensional characterisation is insufficient. Here we quantify the phenomenon more precisely, for the possibly high but finite dimensional case in a distribution-free manner, by bounding the tails of the probability that distances become meaningless. As an application, we show how this can be used to assess the concentration of a given distance function in some unknown data distribution solely on the basis of an available data sample from it. This can be used to test and detect problematic cases more rigorously than it is currently possible, and we demonstrate the working of this approach on both synthetic data and ten real-world data sets from different domains."
statistics
probability
curse_of_dimensonality
hypothesis_testing
concentration_of_measure
in_NB
high-dimensional_probability
december 2011 by cshalizi
[1111.0328] The Average Likelihood Ratio for Large-scale Multiple Testing and Detecting Sparse Mixtures
november 2011 by cshalizi
"Large-scale multiple testing problems require the simultaneous assessment of many p-values. This paper compares several methods to assess the evidence in multiple binomial counts of p-values: the maximum of the binomial counts after standardization (the `higher-criticism statistic'), the maximum of the binomial counts after a log-likelihood ratio transformation (the `Berk-Jones statistic'), and a newly introduced average of the binomial counts after a likelihood ratio transformation. Simulations show that the higher criticism statistic has a superior performance to the Berk-Jones statistic in the case of very sparse alternatives (sparsity coefficient $beta gtrapprox 0.75$), while the situation is reversed for $beta lessapprox 0.75$. The average likelihood ratio is found to combine the favorable performance of higher criticism in the very sparse case with that of the Berk-Jones statistic in the less sparse case and thus appears to dominate both statistics. Some asymptotic optimality theory is considered but found to set in too slowly to illuminate the above findings, at least for sample sizes up to one million. In contrast, asymptotic approximations to the critical values of the Berk-Jones statistic that have been developed by Wellner and Koltchinskii (2003) and Jager and Wellner (2007) are found to give surprisingly accurate approximations even for quite small sample sizes."
in_NB
statistics
hypothesis_testing
multiple_testing
november 2011 by cshalizi
Learning from positive and unlabeled examples by enforcing statistical significance
november 2011 by cshalizi
"Given a finite but large set of objects described by a vector of features, only a small subset of which have been labeled as ""positive"" with respect to a class of interest, we consider the problem of characterizing the positive class. We formalize this as the problem of learning a feature based score function that minimizes the p-value of a non parametric statistical hypothesis test. For linear score functions over the original feature space or over one of its kernelized versions, we provide a solution of this problem computed by a one-class SVM applied on a surrogate dataset obtained by sampling subsets of the overall set of objects and representing them by their average feature-vector shifted by the average feature-vector of the original sample of positive examples. We carry out experiments with this method on the prediction of targets of transcription factors in two different organisms, E. Coli and S. Cererevisiae. Our method extends enrichment analysis commonly carried out in Bioinformatics and its results outperform common solutions to this problem. "
to:NB
machine_learning
hypothesis_testing
semi-supervised_learning
november 2011 by cshalizi
[1110.3599] Testing over a continuum of null hypotheses
october 2011 by cshalizi
"We introduce a theoretical framework for performing statistical hypothesis testing simultaneously over a fairly general, possibly uncountably infinite, set of null hypotheses. This extends the standard statistical setting for multiple hypotheses testing, which is restricted to a finite set. This work is motivated by numerous modern applications where the observed signal is modeled by a stochastic process over a continuum. As a measure of type I error, we extend the concept of false discovery rate (FDR) to this setting. The FDR is defined as the average ratio of the measure of two random sets, so that its study presents some challenge and is of some intrinsic mathematical interest. Our main result shows how to use the $p$-value process to control the FDR at a nominal level, either under arbitrary dependence of $p$-values, or under the assumption that the finite dimensional distributions of the $p$-value process have positive correlations of a specific type (weak PRDS). Both cases generalize existing results established in the finite setting, the latter one leading to a less conservative procedure. The interest of this approach is demonstrated in several non-parametric examples: testing the mean/signal in a Gaussian white noise model, testing the intensity of a Poisson process and testing the c.d.f. of i.i.d. random variables. Conceptually, an interesting feature of the setting advocated here is that it focuses directly on the intrinsic hypothesis space associated with a testing model on a random process, without referring to an arbitrary discretization."
in_NB
statistics
hypothesis_testing
multiple_testing
stochastic_processes
october 2011 by cshalizi
[1110.1248] An algorithm to compute the power of Monte Carlo tests with guaranteed precision
october 2011 by cshalizi
"This article presents an algorithm that generates an exact (conservative) confidence interval of a specified length and coverage probability for the power of a Monte Carlo test (such as a bootstrap or permutation test). It is the first method that achieves this aim for almost any Monte Carlo test. The existing research on power estimation for Monte Carlo tests has focused on obtaining as accurate a result as possible for a fixed computational effort. However, the methods proposed do not provide any guarantee of precision, in the sense that they cannot report a confidence interval to accompany their estimate of the power. Conversely in this article the computational effort is random. The algorithm operates until a confidence interval can be constructed that meets the requirements of the user, in terms of length and coverage probability. We show that, surprisingly, by generating two more datasets that what might have been assumed to be sufficient, the expected number of steps required by the algorithm is finite in many cases of practical interest. These include, for instance, any situation where the distribution of the p-value is absolutely continuous or if it is discrete with finite support. The algorithm is implemented in the R package simctest."
statistics
hypothesis_testing
confidence_sets
monte_carlo
bootstrap
in_NB
october 2011 by cshalizi
[1106.3670] Adjusting for selection bias in testing multiple families of hypotheses
june 2011 by cshalizi
"In many large multiple testing problems the hypotheses are divided into families. Given the data, families with evidence for true discoveries are selected, and hypotheses within them are tested. Neither controlling the error-rate in each family separately nor controlling the error-rate over all hypotheses together can assure that an error-rate is controlled in the selected families. We formulate this concern about selective inference in its generality, for a very wide class of error-rates and for any selection criterion, and present an adjustment of the testing level inside the selected families that retains the average error-rate over the selected families."
multiple_testing
hypothesis_testing
statistics
june 2011 by cshalizi
[0812.2712] Sequential multiple hypothesis testing in presence of control variables
april 2011 by cshalizi
Each experiment has a control setting; the distribution of responses is independent given control settings; how to design the experiment so as to decide among K hypotheses quickly and reliably?
experimental_design
hypothesis_testing
statistics
april 2011 by cshalizi
A Multivariate Kolmogorov-Smirnov Test of Goodness of Fit
april 2011 by cshalizi
Does not seem to be implemented in R. Next time, make into a programming project, and have them compare to simple bootstrapping? --- This copy appears to be on the personal webpage of one of the authors; DOI 10.1016/S0167-7152(97)00020-5
hypothesis_testing
goodness-of-fit
kolmogorov-smirnov-test
statistics
to_teach:undergrad-ADA
april 2011 by cshalizi
xkcd: Significant
april 2011 by cshalizi
... and this goes on the office doors of statisticians everywhere.
funny:geeky
funny:because_its_true
xkcd
hypothesis_testing
multiple_comparisons
cartoons
april 2011 by cshalizi
[1102.5750] Neyman-Pearson classification, convexity and stochastic constraints
march 2011 by cshalizi
"Motivated by problems of anomaly detection, this paper implements the Neyman-Pearson paradigm to deal with asymmetric errors in binary classification with a convex loss. Given a finite collection of classifiers, we combine them and obtain a new classifier that satisfies simultaneously the two following properties with high probability: (i) its probability of type I error is below a pre-specified level and (ii), it has probability of type II error close to the minimum possible. The proposed classifier is obtained by solving an optimization problem with an empirical objective and an empirical constraint. New techniques to handle such problems are developed and have consequences on chance constrained programming."
Final version: http://jmlr.csail.mit.edu/papers/v12/rigollet11a.html
in_NB
learning_theory
statistics
hypothesis_testing
convexity
machine_learning
optimization
have_read
rigollet.philippe
Final version: http://jmlr.csail.mit.edu/papers/v12/rigollet11a.html
march 2011 by cshalizi
Statistical Inference on Random Graphs; Comparative Power Analysis- Journal of Computational and Graphical Statistics - 0(0):1
february 2011 by cshalizi
"We present a comparative power analysis, via Monte Carlo, of various graph invariants used as statistics for testing graph homogeneity versus a “chatter” alternative—the existence of a local region of excessive activity. Our results indicate that statistical inference on random graphs, even in a relatively simple setting, can be decidedly nontrivial. We find that none of the graph invariants considered is uniformly most powerful throughout our space of alternatives. Code for reproducing all the simulation results presented in this article is available online."
statistics
network_data_analysis
hypothesis_testing
re:smoothing_adjacency_matrices
re:network_differences
february 2011 by cshalizi
[1102.2407] Multivariate Goodness of Fit Procedures for Unbinned Data: An Annotated Bibliography
february 2011 by cshalizi
Not read so I don't know if it's any good/
statistics
hypothesis_testing
goodness-of-fit
february 2011 by cshalizi
Data-driven smooth tests when the hypothesis Is composite - University of Twente Publications
december 2010 by cshalizi
Hmmm, do these assumptions hold for power law distributions? (Now corresponding software: "ddst" on CRAN.)
goodness-of-fit
hypothesis_testing
statistics
neyman_smooth_tests
to_teach:undergrad-ADA
to_teach:complexity-and-inference
december 2010 by cshalizi
[1012.4401] A Note on a Characterization of R'enyi Measures and its Relation to Composite Hypothesis Testing
december 2010 by cshalizi
"The R'enyi information measures are characterized in terms of their Shannon counterparts, and properties of the former are recovered from first principle via the associated properties of the latter." I think if I stared at theorem 1 for a bit longer it would give me a new intuitive sense of what Renyi entropy is, but that's low priority right now...
information_theory
renyi_entropy
via:ded-maxim
in_NB
hypothesis_testing
december 2010 by cshalizi
SSRN-Neyman's Smooth Test and Its Applications in Econometrics by Aurobindo Ghosh, Anil Bera
october 2010 by cshalizi
I can rarely remember such _enthusiasm_ in a statistical paper.
hypothesis_testing
statistics
neyman.jerzy
econometrics
history_of_statistics
have_read
goodness-of-fit
mis-specification_testing
october 2010 by cshalizi
"Is Frequentist Testing Vulenrable to the Base-Rate Fallacy?" (Spanos) - Philosophy of Science
october 2010 by cshalizi
"This article calls into question the charge that frequentist testing is susceptible to the base-rate fallacy. It is argued that the apparent similarity between examples like the Harvard Medical School test and frequentist testing is highly misleading. A closer scrutiny reveals that such examples have none of the basic features of a proper frequentist test, such as legitimate data, hypotheses, test statistics, and sampling distributions. Indeed, the relevant error probabilities are replaced with the false positive/negative rates that constitute deductive calculations based on known probabilities among events. As a result, the ampliative dimension of frequentist induction—learning from data about the underlying data-generating mechanism—is missing."
statistics
philosophy_of_science
re:phil-of-bayes_paper
hypothesis_testing
spanos.aris
october 2010 by cshalizi
[citation needed]» Blog Archive » fourteen questions about selection bias, circularity, nonindependence, etc.
june 2010 by cshalizi
We don't appear to subscribe, but presumably I could write some of them for a copy... --- ETA: Received, thanks to T.D.
fmri
multiple_comparisons
neuroscience
neural_data_analysis
statistics
estimation
hypothesis_testing
june 2010 by cshalizi
Ehm, Kornmeier, Heinrich: Multiple testing along a tree
may 2010 by cshalizi
"Suitable sequentially rejective multiple test procedures allow to “zoom in" on clusters of relevant variables in high-dimensional regression (Meinshausen [7]), or on regions of interest in some search space (Heinrich et al. [3]; Meinshausen et al. [8]). As a common framework for these schemes we propose to consider multiple testing along a tree of hypotheses together with a “keep rejecting until first acceptance" rule. Particular topics addressed in this note are control of the familywise error, and some variants and basic properties of the procedure."
multiple_testing
hypothesis_testing
model_selection
re:AoS_project
to_read
may 2010 by cshalizi
[1005.1327] Statistical Model Checking : An Overview
may 2010 by cshalizi
"Quantitative properties of stochastic systems are usually specified in logics that allow one to compare the measure of executions satisfying certain temporal properties with thresholds. The model checking problem for stochastic systems with respect to such logics is typically solved by a numerical approach that iteratively computes (or approximates) the exact measure of paths satisfying relevant subformulas; the algorithms themselves depend on the class of systems being analyzed as well as the logic used for specifying the properties. Another approach to solve the model checking problem is to \emph{simulate} the system for finitely many runs, and use \emph{hypothesis testing} to infer whether the samples provide a \emph{statistical} evidence for the satisfaction or violation of the specification. In this short paper, we survey the statistical approach, and outline its main advantages in terms of efficiency, uniformity, and simplicity."
simulation
stochastic_models
model-checking
via:tozier
hypothesis_testing
may 2010 by cshalizi
[citation needed]» Blog Archive » the capricious nature of p < .05, or why data peeking is evil
may 2010 by cshalizi
Sampling to apparent significance.
statistics
hypothesis_testing
bad_data_analysis
may 2010 by cshalizi
Consistent Nonparametric Tests of Independence
may 2010 by cshalizi
Suitable for use in the PC or FCI algorithms?
statistics
hypothesis_testing
to_read
to:blog
nonparametrics
independence_testing
in_NB
may 2010 by cshalizi
10-705 Intermediate Statistics, Fall 2009
april 2010 by cshalizi
Larry's version of the typical masters-level course based on Casella and Berger. Note: half of what he covers is not in Casella and Berger. (For example, he starts with VC theory!)
learning_theory
statistics
estimation
hypothesis_testing
prediction
minimax
bootstrap
model_selection
regression
classifiers
confidence_sets
wasserman.larry
kith_and_kin
april 2010 by cshalizi
Discrimination Between B-Processes is Impossible
april 2010 by cshalizi
I really don't see how this abstract is compatible with Ryabko's own work!
stochastic_processes
ergodic_theory
hypothesis_testing
ryabko.daniil
to_read
april 2010 by cshalizi
How persuasive is a good fit? A comment on theory testing.
april 2010 by cshalizi
Everything useful in this paper is contained in their Figure 1 and its caption, and even then I think they're incomplete. (In the top left of Figure 1, the "strong support" quadrant, draw another narrow band along the opposite diagonal to the first theory, also going through the small cross of observations: this would be a distinct and incompatible theory which also makes a narrow range of predictions that also match the precisely-measured data.)
methodological_advice
hypothesis_testing
statistics
psychology
via:kass
have_read
re:phil-of-bayes_paper
april 2010 by cshalizi
Lindsay, Liu: Model Assessment Tools for a Model False World
april 2010 by cshalizi
"a model credibility index, which is designed to serve as a one-number summary measure of model adequacy. We define the index to be the maximum sample size at which samples from the model and those from the true data generating mechanism are nearly indistinguishable. We use standard notions from hypothesis testing to make this definition precise. We use data subsampling to estimate the index" --- To be blogged, after the paper with Andy is done.
statistics
misspecification
re:phil-of-bayes_paper
hypothesis_testing
bootstrap
have_read
to:blog
april 2010 by cshalizi
Luen, Stark: Testing earthquake predictions
december 2009 by cshalizi
Back-up for the Hough review. Also: might make a good mini-project for the data-mining class, though I'd have to teach about spatio-temporal methods (which I should anyway [but where would the time come?]).
earthquakes
hypothesis_testing
bad_data_analysis
stark.philip
statistics
prediction
have_read
to_teach:data-mining
to_teach:undergrad-ADA
december 2009 by cshalizi
Evans, Jang: Invariant P-values for model checking
december 2009 by cshalizi
Interesting, but I suspect the bits about approximating an underlying discrete distribution could be lifted...
statistics
hypothesis_testing
model-checking
p-values
re:phil-of-bayes_paper
have_read
december 2009 by cshalizi
[0912.4269] Martingales and p-values as measures of evidence
december 2009 by cshalizi
Weird. Wonder what Mayo would make of this.
martingales
statistics
p-values
hypothesis_testing
re:phil-of-bayes_paper
to:NB
have_read
december 2009 by cshalizi
[0912.4872] Interpretations of Directed Information in Portfolio Theory, Data Compression, and Hypothesis Testing
december 2009 by cshalizi
Giving me flashbacks to ch. 7 of my thesis.
information_theory
statistics
hypothesis_testing
to_read
via:ded-maxim
december 2009 by cshalizi
"Neural correlates of interspecies perspective taking in the post-mortem Atlantic Salmon: An argument for multiple comparisons correction"
fmri neuroscience bad_data_analysis funny:academic funny:malicious statistics hypothesis_testing multiple_comparisons to_teach:data-mining salmon to:blog
september 2009 by cshalizi
fmri neuroscience bad_data_analysis funny:academic funny:malicious statistics hypothesis_testing multiple_comparisons to_teach:data-mining salmon to:blog
september 2009 by cshalizi
Learning with Finite Memory
august 2009 by cshalizi
Cf. Leo's results on estimation with finite state machines.
statistics
hypothesis_testing
to:NB
have_read
august 2009 by cshalizi
Superstars without Talent? The Yule Distribution Controversy
july 2009 by cshalizi
"Chung and Cox (1994) provided an intuitively appealing stochastic model indicating that superstars may exist regardless of talent, giving rise to the Yule distribution. We adopt a different empirical approach and test its goodness of fit using a parametric bootstrap and several powerful test statistics. Just like the discrete Pareto distribution, it is overwhelmingly rejected: it is a fairly accurate approximation of the lower quantiles of the superstar distribution but overestimates the snowball effect that makes consumers purchase records of the most successful artists. In other words, the Yule distribution captures stardom, but not superstardom. A generalization of the Yule distribution provides an excellent fit in two of the three data sets." --- We only seem to subscribe with a one-issue delay (?); preprint at http://swopec.hhs.se/hastef/papers/hastef0658.pdf
heavy_tails
inequality
economics_of_superstars
hypothesis_testing
economics
statistics
evisceration
have_read
july 2009 by cshalizi
Tirvengadum: Linguistic Fingerprints and Literary Fraud
june 2009 by cshalizi
Using the case of an author with a known pseudonym to test methods for establishing identity of authorship. Conclusion: it may be that <
literary_criticism
author-identification
hypothesis_testing
to_teach
via:chl
textual_criticism
june 2009 by cshalizi
The Likelihood Ratio Test Under Nonstandard Conditions
june 2009 by cshalizi
I very much like the approach of treating the likelihood ratio as an empirical process; why haven't I seen it before? (Also, the state-of-the-art in simulating Gaussian processes must be much better now than what Hansen was doing in '92, which would make this even more practical.)
empirical_processes
hypothesis_testing
statistics
likelihood_ratio_tests
econometrics
time_series
hansen.bruce
have_read
june 2009 by cshalizi
The Statistical Significance of Odd Bits of Information (Bartlett)
june 2009 by cshalizi
A basically a goodness-of-fit test based on fluctuations of the entropy
statistics
hypothesis_testing
information_theory
bartlett.m.s.
have_read
in_NB
june 2009 by cshalizi
[0905.4937] A criterion for hypothesis testing for stationary processes
june 2009 by cshalizi
"Given a discrete-valued sample X_1... X_n we wish to test whether it was generated by a process belonging to a family H_0, or it was generated by a process outside H_0. All process distributions are assumed stationary ergodic, and no further probabilistic or parametric assumptions are made. We require the Type I error of the test to be uniformly bounded, while the probability of Type II error has to tend to zero as the sample size increases. For this notion of consistency we provide necessary and sufficient conditions on the family H_0 for the existence of a consistent test. "
statistics
statistical_inference_for_stochastic_processes
ergodic_theory
hypothesis_testing
ryabko.daniil
to:NB
to_read
june 2009 by cshalizi
A consistent nonparametric test of ergodicity for time series with applications
march 2009 by cshalizi
They completely fail to deal with the basic problem that ergodic components are invariant, so that every realization of a stochastic process is always confined to a single component. Hence no test on a single realization has ANY ability to detect non-ergodicity; this could ONLY be done with multiple realizations from the same source.
statistics
time_series
ergodic_theory
nonparametrics
have_read
hypothesis_testing
shot_after_a_fair_trial
march 2009 by cshalizi
Support of the Null Hypothesis
march 2009 by cshalizi
Aleks Jakulin on the _Journal of Articles in Support of the Null Hypothesis_ (with links to more such in the comments)
hypothesis_testing
paper_writing
social_life_of_the_mind
why_oh_why_cant_we_have_a_better_academic_publishing_system
funny:geeky
march 2009 by cshalizi
related tags
antidepressants ⊕ astronomy ⊕ author-identification ⊕ bad_data_analysis ⊕ bartlett.m.s. ⊕ bergstrom.carl ⊕ bibliometry ⊕ book_reviews ⊕ bootstrap ⊕ boris ⊕ buhlmann.peter ⊕ cai.t._tony ⊕ cartoons ⊕ causal_inference ⊕ change-point_problem ⊕ citation_networks ⊕ clarke.kevin ⊕ classifiers ⊕ clustering ⊕ community_discovery ⊕ concentration_of_measure ⊕ confidence_sets ⊕ convexity ⊕ cosmology ⊕ curse_of_dimensonality ⊕ data_mining ⊕ decision_trees ⊕ delong.brad ⊕ density_estimation ⊕ earthquakes ⊕ econometrics ⊕ economics ⊕ economics_of_superstars ⊕ empirical_processes ⊕ encompassing ⊕ epidemic_models ⊕ ergodic_theory ⊕ estimation ⊕ evisceration ⊕ experimental_design ⊕ exponential_families ⊕ filtering ⊕ fmri ⊕ fourier_analysis ⊕ funny:academic ⊕ funny:because_its_true ⊕ funny:geeky ⊕ funny:laughing_instead_of_screaming ⊕ funny:malicious ⊕ genovese.christopher ⊕ goodness-of-fit ⊕ hansen.bruce ⊕ harrison.matt ⊕ have_read ⊕ heavy_tails ⊕ high-dimensional_probability ⊕ history_of_science ⊕ history_of_statistics ⊕ hypothesis_testing ⊖ independence_testing ⊕ indirect_inference ⊕ inequality ⊕ information_retrieval ⊕ information_theory ⊕ inverse_problems ⊕ in_NB ⊕ kernel_estimators ⊕ kith_and_kin ⊕ kolmogorov-smirnov-test ⊕ lagrange_multipliers ⊕ lang.kevin ⊕ large_deviations ⊕ learning_theory ⊕ likelihood_ratio_tests ⊕ literary_criticism ⊕ long-memory_processes ⊕ long-range_dependence ⊕ machine_learning ⊕ markov_models ⊕ martingales ⊕ mccloskey.deirdre ⊕ medical_statistics ⊕ meta-analysis ⊕ methodological_advice ⊕ minimax ⊕ minimum_description_length ⊕ mis-specification_testing ⊕ misspecification ⊕ mizon.grayham ⊕ model-checking ⊕ model_selection ⊕ monte_carlo ⊕ multiple_comparisons ⊕ multiple_testing ⊕ network_data_analysis ⊕ neural_data_analysis ⊕ neuroscience ⊕ neyman-pearson_lemma ⊕ neyman.jerzy ⊕ neyman_smooth_tests ⊕ nonparametrics ⊕ nukes ⊕ optimization ⊕ p-values ⊕ paper_writing ⊕ philosophy_of_science ⊕ precision-recall ⊕ prediction ⊕ prequentialism ⊕ probability ⊕ psychology ⊕ publication_bias ⊕ random_fields ⊕ re:almost_none ⊕ re:AoS_project ⊕ re:LICORS ⊕ re:network_differences ⊕ re:neutral-model-of-inquiry ⊕ re:phil-of-bayes_paper ⊕ re:smoothing_adjacency_matrices ⊕ re:social-networks-as-sensor-networks ⊕ re:stacs ⊕ regression ⊕ renyi_entropy ⊕ richard.jean-francois ⊕ rigollet.philippe ⊕ rosvall.martin ⊕ ryabko.daniil ⊕ salmon ⊕ semi-supervised_learning ⊕ shot_after_a_fair_trial ⊕ simulation ⊕ social_life_of_the_mind ⊕ social_science_methodology ⊕ spanos.aris ⊕ spatial_statistics ⊕ stark.philip ⊕ stationarity ⊕ statistical_inference_for_stochastic_processes ⊕ statistics ⊕ stochastic_models ⊕ stochastic_processes ⊕ stupid_security ⊕ sufficiency ⊕ terrorism_fears ⊕ textual_criticism ⊕ tibshirani.robert ⊕ time_series ⊕ to:blog ⊕ to:NB ⊕ to_be_shot_after_a_fair_trial ⊕ to_read ⊕ to_teach ⊕ to_teach:complexity-and-inference ⊕ to_teach:data-mining ⊕ to_teach:undergrad-ADA ⊕ track_down_references ⊕ two-sample_tests ⊕ via:chl ⊕ via:ded-maxim ⊕ via:kass ⊕ via:matthew_berryman ⊕ via:tozier ⊕ visual_display_of_quantitative_information ⊕ wasserman.larry ⊕ why_oh_why_cant_we_have_a_better_academic_publishing_system ⊕ xkcd ⊕ ziliak.stephen ⊕Copy this bookmark: