[1204.6703] Two SVDs Suffice: Spectral decompositions for probabilistic topic modeling and latent Dirichlet allocation
27 days ago by cshalizi
"Topic models can be seen as a generalization of the clustering problem, in that they posit that observations are generated due to multiple latent factors (e.g. the words in each document are generated as a mixture of several active topics, as opposed to just one). This increased representational power comes at the cost of a more challenging unsupervised learning problem of estimating the topic probability vectors (the distributions over words for each topic), when only the words are observed and the corresponding topics are hidden.
"We provide a simple and efficient learning procedure that is guaranteed to recover the parameters for a wide class of mixture models, including the popular latent Dirichlet allocation (LDA) model. For LDA, the procedure correctly recovers both the topic probability vectors and the prior over the topics, using only trigram statistics (i.e. third order moments, which may be estimated with documents containing just three words). The method, termed Excess Correlation Analysis (ECA), is based on a spectral decomposition of low order moments (third and fourth order) via two singular value decompositions (SVDs). Moreover, the algorithm is scalable since the SVD operations are carried out on k by k matrices, where k is the number of latent factors (e.g. the number of topics), rather than in the d-dimensional observed space (typically d >> k)."
That's a really remarkable claim, and I'd tag it to_be_shot_after_a_fair_trial if it weren't being made by genuinely serious people.
in_NB
to_read
latent_variables
topic_models
text_mining
mixture_models
statistics
machine_learning
cool_if_true
spectral_clustering
"We provide a simple and efficient learning procedure that is guaranteed to recover the parameters for a wide class of mixture models, including the popular latent Dirichlet allocation (LDA) model. For LDA, the procedure correctly recovers both the topic probability vectors and the prior over the topics, using only trigram statistics (i.e. third order moments, which may be estimated with documents containing just three words). The method, termed Excess Correlation Analysis (ECA), is based on a spectral decomposition of low order moments (third and fourth order) via two singular value decompositions (SVDs). Moreover, the algorithm is scalable since the SVD operations are carried out on k by k matrices, where k is the number of latent factors (e.g. the number of topics), rather than in the d-dimensional observed space (typically d >> k)."
That's a really remarkable claim, and I'd tag it to_be_shot_after_a_fair_trial if it weren't being made by genuinely serious people.
27 days ago by cshalizi
A Normal Law for the Plug-in Estimator of Entropy
4 weeks ago by cshalizi
"This paper establishes a sufficient condition for the asymptotic normality of the plug-in estimator of Shannon's entropy defined on a countable alphabet. The sufficient condition covers a range of cases with countably infinite alphabets, for which no normality results were previously known."
in_NB
entropy_estimation
statistics
information_theory
4 weeks ago by cshalizi
[1201.5871] Null models for network data
5 weeks ago by cshalizi
"The analysis of datasets taking the form of simple, undirected graphs continues to gain in importance across a variety of disciplines. Two choices of null model, the logistic-linear model and the implicit log-linear model, have come into common use for analyzing such network data, in part because each accounts for the heterogeneity of network node degrees typically observed in practice. Here we show how these both may be viewed as instances of a broader class of null models, with the property that all members of this class give rise to essentially the same likelihood-based estimates of link probabilities in sparse graph regimes. This facilitates likelihood-based computation and inference, and enables practitioners to choose the most appropriate null model from this family based on application context. Comparative model fits for a variety of network datasets demonstrate the practical implications of our results."
in_NB
network_data_analysis
have_read
statistics
estimation
approximation
re:smoothing_adjacency_matrices
5 weeks ago by cshalizi
The Colonial American Origins of Modern Democratic Thought - Academic and Professional Books - Cambridge University Press
5 weeks ago by cshalizi
"This first examination in almost 40 years of political ideas in the seventeenth-century American colonies reaches some surprising conclusions about the history of democratic theory more generally. The origins of a distinctively modern kind of thinking about democracy can be located, not in revolutionary America and France in the later eighteenth century, but in the tiny New England colonies in the middle seventeenth. The key feature of this democratic rebirth was honoring not only the principle of popular sovereignty through regular elections but also the principle of accountability through non-electoral procedures for the auditing and impeachment of elected officers. By staking its institutional identity entirely on elections, modern democratic thought has misplaced the sense of robust popular control that originally animated it."
in_NB
books:noted
democracy
american_history
re:democratic_cognition
5 weeks ago by cshalizi
Democracy's Ancient Ancestors: Mari and Early Collective Governance - Academic and Professional Books - Cambridge University Press
5 weeks ago by cshalizi
"This volume examines the political landscape of the ancient Near East through the archive of over 3,000 letters found in the royal palace of Mari. These letters display a rich diversity of political actors, encompassing major kingdoms, smaller states and various tribal towns. Mari's unique contribution to the ancient evidence is its view of tribal organization, made possible especially by the fact that its king, Zimri-Lim, was, first of all, a tribal ruler who claimed Mari as an administrative base and source of prestige. These archaic political traditions are not essentially unlike the forms of pre-democratic Greece, and they offer fresh reason to recognize a cultural continuity between the classical world of the Aegean and the older Near East. This book bridges the areas of archaeology, ancient and classical history, early Middle and Near East, and political and social history."
books:noted
in_NB
ancient_history
democracy
5 weeks ago by cshalizi
[1204.3941] A Log-Linear Graphical Model for Inferring Genetic Networks from High-Throughput Sequencing Data
5 weeks ago by cshalizi
"We develop a novel method for estimating high-dimensional Poisson graphical models, the Log-Linear Graphical Model, allowing us to infer networks based on high-throughput sequencing data. Our model assumes that conditional on all other genes, each gene is Poisson, jointly defining a pair-wise Poisson Markov random field. We estimate our genetic networks via neighborhood selection by fitting `1-norm penalized log-linear models, an approach we call the Poisson Graphical Lasso. Additionally, we develop a fast parallel algorithm, permitting us to fit our graphical models to high-dimensional genomic data sets."
in_NB
graphical_models
gene_expression_data_analysis
lasso
network_data_analysis
statistics
regression
5 weeks ago by cshalizi
Xu , McLeod : Further asymptotic properties of the generalized information criterion
5 weeks ago by cshalizi
"Asymptotic properties of the generalized information criterion for model selection are examined and new conditions under which this criterion is overfitting, consistent, or underfitting are derived."
in_NB
model_selection
information_criteria
statistics
5 weeks ago by cshalizi
[1204.2296] Co-clustering for Directed Graphs; the Stochastic Co-Blockmodel and a Spectral Algorithm
6 weeks ago by cshalizi
"This paper extends the spectral clustering algorithm to directed networks in a way that co-clusters or bi-clusters the rows and columns of a graph Laplacian. Co-clustering leverages the increased complexity of asymmetric relationships to gain new insight into the structure of the directed network. To understand this algorithm and to study its asymptotic properties in a canonical setting, we propose the Stochastic Co-Blockmodel to encode co-clustering structure. This is the first statistical model of co-clustering and it is derived using the concept of stochastic equivalence that motivated the original Stochastic Blockmodel. Although directed spectral clustering is not derived from the Stochastic Co-Blockmodel, we show that, asymptotically, the algorithm can estimate the blocks in a high dimensional asymptotic setting in which the number of blocks grows with the number of nodes. The algorithm, model, and asymptotic results can all be extended to bipartite graphs."
in_NB
relational_learning
network_data_analysis
statistics
clustering
community_discovery
spectral_clustering
yu.bin
6 weeks ago by cshalizi
[1204.2523] Concept Modeling with Superwords
6 weeks ago by cshalizi
"In information retrieval, a fundamental goal is to transform a document into concepts that are representative of its content. The term "representative" is in itself challenging to define, and various tasks require different granularities of concepts. In this paper, we aim to model concepts that are sparse over the vocabulary, and that flexibly adapt their content based on other relevant semantic information such as textual structure or associated image features. We explore a Bayesian nonparametric model based on nested beta processes that allows for inferring an unknown number of strictly sparse concepts. The resulting model provides an inherently different representation of concepts than a standard LDA (or HDP) based topic model, and allows for direct incorporation of semantic features. We demonstrate the utility of this representation on multilingual blog data and the Congressional Record."
in_NB
to_read
text_mining
topic_models
fox.emily
guestrin.carlos
kith_and_kin
6 weeks ago by cshalizi
[1204.2762] On the Uniform Asymptotic Validity of Subsampling and the Bootstrap
6 weeks ago by cshalizi
"This paper provides conditions under which subsampling and the bootstrap can be used to construct estimators of the quantiles of the distribution of a root that behave well uniformly over a large class of distributions $mathbf P$. These results are then applied (i) to construct confidence regions that behave well uniformly over $mathbf P$ in the sense that the coverage probability tends to at least the nominal level uniformly over $mathbf P$ and (ii) to construct tests that behave well uniformly over $mathbf P$ in the sense that the size tends to no greater than the nominal level uniformly over $mathbf P$. Without these stronger notions of convergence, the asymptotic approximations to the coverage probability or size may be poor even in very large samples. Specific applications include the multivariate mean, testing moment inequalities, multiple testing, the empirical process, and $U$-statistics."
in_NB
bootstrap
statistics
6 weeks ago by cshalizi
[0802.4192] Maxisets for Model Selection
6 weeks ago by cshalizi
"We address the statistical issue of determining the maximal spaces (maxisets) where model selection procedures attain a given rate of convergence. By considering first general dictionaries, then orthonormal bases, we characterize these maxisets in terms of approximation spaces. These results are illustrated by classical choices of wavelet model collections. For each of them, the maxisets are described in terms of functional spaces. We take a special care of the issue of calculability and measure the induced loss of performance in terms of maxisets."
in_NB
statistics
model_selection
approximation
6 weeks ago by cshalizi
[0802.4363] Estimating the entropy of binary time series: Methodology, some theory and a simulation study
6 weeks ago by cshalizi
"Partly motivated by entropy-estimation problems in neuroscience, we present a detailed and extensive comparison between some of the most popular and effective entropy estimation methods used in practice: The plug-in method, four different estimators based on the Lempel-Ziv (LZ) family of data compression algorithms, an estimator based on the Context-Tree Weighting (CTW) method, and the renewal entropy estimator.
"**Methodology. Three new entropy estimators are introduced. For two of the four LZ-based estimators, a bootstrap procedure is described for evaluating their standard error, and a practical rule of thumb is heuristically derived for selecting the values of their parameters. ** Theory. We prove that, unlike their earlier versions, the two new LZ-based estimators are consistent for every finite-valued, stationary and ergodic process. An effective method is derived for the accurate approximation of the entropy rate of a finite-state HMM with known distribution. Heuristic calculations are presented and approximate formulas are derived for evaluating the bias and the standard error of each estimator. ** Simulation. All estimators are applied to a wide range of data generated by numerous different processes with varying degrees of dependence and memory. Some conclusions drawn from these experiments include: (i) For all estimators considered, the main source of error is the bias. (ii) The CTW method is repeatedly and consistently seen to provide the most accurate results. (iii) The performance of the LZ-based estimators is often comparable to that of the plug-in method. (iv) The main drawback of the plug-in method is its computational inefficiency."
in_NB
to_read
entropy_estimation
information_theory
time_series
statistics
kontoyiannis.ioannis
re:stacs
"**Methodology. Three new entropy estimators are introduced. For two of the four LZ-based estimators, a bootstrap procedure is described for evaluating their standard error, and a practical rule of thumb is heuristically derived for selecting the values of their parameters. ** Theory. We prove that, unlike their earlier versions, the two new LZ-based estimators are consistent for every finite-valued, stationary and ergodic process. An effective method is derived for the accurate approximation of the entropy rate of a finite-state HMM with known distribution. Heuristic calculations are presented and approximate formulas are derived for evaluating the bias and the standard error of each estimator. ** Simulation. All estimators are applied to a wide range of data generated by numerous different processes with varying degrees of dependence and memory. Some conclusions drawn from these experiments include: (i) For all estimators considered, the main source of error is the bias. (ii) The CTW method is repeatedly and consistently seen to provide the most accurate results. (iii) The performance of the LZ-based estimators is often comparable to that of the plug-in method. (iv) The main drawback of the plug-in method is its computational inefficiency."
6 weeks ago by cshalizi
[1204.2581] Modeling Relational Data via Latent Factor Blockmodel
6 weeks ago by cshalizi
"In this paper we address the problem of modeling relational data, which appear in many applications such as social network analysis, recommender systems and bioinformatics. Previous studies either consider latent feature based models but disregarding local structure in the network, or focus exclusively on capturing local structure of objects based on latent blockmodels without coupling with latent characteristics of objects. To combine the benefits of the previous work, we propose a novel model that can simultaneously incorporate the effect of latent features and covariates if any, as well as the effect of latent structure that may exist in the data. To achieve this, we model the relation graph as a function of both latent feature factors and latent cluster memberships of objects to collectively discover globally predictive intrinsic properties of objects and capture latent block structure in the network to improve prediction performance. We also develop an optimization transfer algorithm based on the generalized EM-style strategy to learn the latent factors. We prove the efficacy of our proposed model through the link prediction task and cluster analysis task, and extensive experiments on the synthetic data and several real world datasets suggest that our proposed LFBM model outperforms the other state of the art approaches in the evaluated tasks."
in_NB
network_data_analysis
community_discovery
statistics
inference_to_latent_objects
factor_analysis
relational_learning
6 weeks ago by cshalizi
Colombo , Maathuis , Kalisch , Richardson : Learning high-dimensional directed acyclic graphs with latent and selection variables
7 weeks ago by cshalizi
"We consider the problem of learning causal information between random variables in directed acyclic graphs (DAGs) when allowing arbitrarily many latent and selection variables. The FCI (Fast Causal Inference) algorithm has been explicitly designed to infer conditional independence and causal information in such settings. However, FCI is computationally infeasible for large graphs. We therefore propose the new RFCI algorithm, which is much faster than FCI. In some situations the output of RFCI is slightly less informative, in particular with respect to conditional independence information. However, we prove that any causal information in the output of RFCI is correct in the asymptotic limit. We also define a class of graphs on which the outputs of FCI and RFCI are identical. We prove consistency of FCI and RFCI in sparse high-dimensional settings, and demonstrate in simulations that the estimation performances of the algorithms are very similar. All software is implemented in the R-package pcalg."
--- To complicated to actually teach, but should be mentioned in the lecture notes on causal discovery, along with FCI.
in_NB
have_read
statistics
graphical_models
causal_inference
sparsity
to_teach:undergrad-ADA
--- To complicated to actually teach, but should be mentioned in the lecture notes on causal discovery, along with FCI.
7 weeks ago by cshalizi
[1203.3037] Expanding the Transfer Entropy to Identify Information Subgraphs in Complex Systems
7 weeks ago by cshalizi
"We propose a formal expansion of the transfer entropy to put in evidence irreducible sets of variables which provide information for the future state of each assigned target. Multiplets characterized by an high value will be associated to informational circuits present in the system, with an informational character (synergetic or redundant) which can be associated to the sign of the contribution. We also present preliminary results on fMRI and EEG data sets."
in_NB
graphical_models
information_theory
community_discovery
time_series
re:functional_communities
7 weeks ago by cshalizi
[1203.3083] Clustering in networks with the collapsed Stochastic Block Model
7 weeks ago by cshalizi
"We present an efficient MCMC algorithm to cluster the nodes of a network such that nodes with similar role in the network are clustered together. This is known as block-modelling or block-clustering. We extend the stochastic blockmodel (SBM) of Nowicki & Snijders (2001), by exploiting parameter collapsing to integrate out block parameters. The resulting model defines a posterior over the number of clusters and cluster memberships. Sampling from this model is simpler than from the original SBM as transdimensional MCMC can be avoided. Moreover, our extensions allow the number of clusters to be directly estimated, rather than given as an input parameter. The algorithm is based on the allocation sampler of Nobile & Fearnside (2007). We use synthetic and real data to test the speed and accuracy of our model and algorithm, including the ability to estimate the number of clusters. The algorithm can scale to networks with up to ten thousand nodes."
in_NB
network_data_analysis
community_discovery
statistics
7 weeks ago by cshalizi
[1203.2200] Role-Dynamics: Fast Mining of Large Dynamic Networks
7 weeks ago by cshalizi
"To understand the structural dynamics of a large-scale social, biological or technological network, it may be useful to discover behavioral roles representing the main connectivity patterns present over time. In this paper, we propose a scalable non-parametric approach to automatically learn the structural dynamics of the network and individual nodes. Roles may represent structural or behavioral patterns such as the center of a star, peripheral nodes, or bridge nodes that connect different communities. Our novel approach learns the appropriate structural role dynamics for any arbitrary network and tracks the changes over time. In particular, we uncover the specific global network dynamics and the local node dynamics of a technological, communication, and social network. We identify interesting node and network patterns such as stationary and non-stationary roles, spikes/steps in role-memberships (perhaps indicating anomalies), increasing/decreasing role trends, among many others. Our results indicate that the nodes in each of these networks have distinct connectivity patterns that are non-stationary and evolve considerably over time. Overall, the experiments demonstrate the effectiveness of our approach for fast mining and tracking of the dynamics in large networks. Furthermore, the dynamic structural representation provides a basis for building more sophisticated models and tools that are fast for exploring large dynamic networks."
in_NB
network_data_analysis
data_mining
community_discovery
neville.jennifer
7 weeks ago by cshalizi
[1203.0697] Learning High-Dimensional Mixtures of Graphical Models
7 weeks ago by cshalizi
"We consider the problem of learning mixtures of discrete graphical models in high dimensions and propose a novel method for estimating the mixture components with provable guarantees. The method proceeds mainly in three stages. In the first stage, it estimates the union of the Markov graphs of the mixture components (referred to as the union graph) via a series of rank tests. It then uses this estimated union graph to compute the mixture components via a spectral decomposition method. The spectral decomposition method was originally proposed for latent class models, and we adapt this method for learning the more general class of graphical model mixtures. In the end, the method produces tree approximations of the mixture components via the Chow-Liu algorithm. Our output is thus a tree-mixture model which serves as a good approximation to the underlying graphical model mixture. When the union graph has sparse node separators, we prove that our method has sample and computational complexities scaling as poly(p, d, r), for an r-component mixture of p-variate graphical models, where d is the cardinality of the sample space of each node variable. We also extend our results to the case when the union graph has sparse local separators, which is a weaker criterion than having sparse exact separators, and when the mixture components are in the regime of correlation decay. The computational and sample complexities of our method for this class are significantly improved, since they involve an upper bound on the cardinality of local separators (as opposed to exact separators). Our results push the realm of tractable model classes for high-dimensional learning, which includes the class of tree mixtures."
in_NB
mixture_models
ensemble_methods
graphical_models
machine_learning
to_read
chow-liu_algorithm
7 weeks ago by cshalizi
[1203.0683] A Method of Moments for Mixture Models and Hidden Markov Models
7 weeks ago by cshalizi
"Mixture models are a fundamental tool in applied statistics and machine learning for treating data taken from multiple subpopulations. The current practice for estimating the parameters of such models relies on local search heuristics (e.g., the EM algorithm) which are prone to failure, and existing consistent methods are unfavorable due to their high computational and sample complexity which typically scale exponentially with the number of mixture components. This work develops an efficient method of moments approach to parameter estimation for a broad class of high-dimensional mixture models with many components, including multi-view mixtures of Gaussians (such as mixtures of axis-aligned Gaussians) and hidden Markov models. The new method leads to rigorous unsupervised learning results for mixture models that were not achieved by previous works; and, because of its simplicity, it also constitutes a viable alternative to EM for practical deployment."
Clever: some mixture models can be characterized by expectations, covariances, and third-order mixed moments, so you just need to estimate tensors up to third order, and not very high moments of vectors (which are very noisy) and do some linear algebra. I should probably re-read because I couldn't reproduce this at the board.
in_NB
statistics
estimation
mixture_models
markov_models
state-space_models
have_read
Clever: some mixture models can be characterized by expectations, covariances, and third-order mixed moments, so you just need to estimate tensors up to third order, and not very high moments of vectors (which are very noisy) and do some linear algebra. I should probably re-read because I couldn't reproduce this at the board.
7 weeks ago by cshalizi
[1204.0033] Transforming Graph Representations for Statistical Relational Learning
7 weeks ago by cshalizi
"Relational data representations have become an increasingly important topic due to the recent proliferation of network datasets (e.g., social, biological, information networks) and a corresponding increase in the application of statistical relational learning (SRL) algorithms to these domains. In this article, we examine a range of representation issues for graph-based relational data. Since the choice of relational data representation for the nodes, links, and features can dramatically affect the capabilities of SRL algorithms, we survey approaches and opportunities for relational representation transformation designed to improve the performance of these algorithms. This leads us to introduce an intuitive taxonomy for data representation transformations in relational domains that incorporates link transformation and node transformation as symmetric representation tasks. In particular, the transformation tasks for both nodes and links include (i) predicting their existence, (ii) predicting their label or type, (iii) estimating their weight or importance, and (iv) systematically constructing their relevant features. We motivate our taxonomy through detailed examples and use it to survey and compare competing approaches for each of these tasks. We also discuss general conditions for transforming links, nodes, and features. Finally, we highlight challenges that remain to be addressed."
in_NB
relational_learning
statistics
machine_learning
neville.jennifer
change_of_representation
7 weeks ago by cshalizi
A Kernel Two-Sample Test
7 weeks ago by cshalizi
"We propose a framework for analyzing and comparing distributions, which we use to construct statistical tests to determine if two samples are drawn from different distributions. Our test statistic is the largest difference in expectations over functions in the unit ball of a reproducing kernel Hilbert space (RKHS), and is called the maximum mean discrepancy (MMD). We present two distribution-free tests based on large deviation bounds for the MMD, and a third test based on the asymptotic distribution of this statistic. The MMD can be computed in quadratic time, although efficient linear time approximations are available. Our statistic is an instance of an integral probability metric, and various classical metrics on distributions are obtained when alternative function classes are used in place of an RKHS. We apply our two-sample tests to a variety of problems, including attribute matching for databases using the Hungarian marriage method, where they perform strongly. Excellent performance is also obtained when comparing distributions over graphs, for which these are the first such tests."
in_NB
to_read
hilbert_space
kernel_methods
goodness-of-fit
statistics
concentration_of_measure
probability
two-sample_tests
re:network_differences
7 weeks ago by cshalizi
[0803.0835] Goodness-of-fit tests for Markovian time series models: Central limit theory and bootstrap approximations
8 weeks ago by cshalizi
"New goodness-of-fit tests for Markovian models in time series analysis are developed which are based on the difference between a fully nonparametric estimate of the one-step transition distribution function of the observed process and that of the model class postulated under the null hypothesis. The model specification under the null allows for Markovian models, the transition mechanisms of which depend on an unknown vector of parameters and an unspecified distribution of i.i.d. innovations. Asymptotic properties of the test statistic are derived and the critical values of the test are found using appropriate bootstrap schemes. General properties of the bootstrap for Markovian processes are derived. A new central limit theorem for triangular arrays of weakly dependent random variables is obtained. For the proof of stochastic equicontinuity of multidimensional empirical processes, we use a simple approach based on an anisotropic tiling of the space. The finite-sample behavior of the proposed test is illustrated by some numerical examples and a real-data application is given."
in_NB
statistics
statistical_inference_for_stochastic_processes
bootstrap
markov_models
goodness-of-fit
8 weeks ago by cshalizi
[1203.5829] Ensemble estimators for multivariate entropy estimation
8 weeks ago by cshalizi
"The problem of estimation of density functionals like entropy and mutual information has received much attention in the statistics and information theory communities. A large class of estimators of functionals of the probability density suffer from the curse of dimensionality, wherein the exponent in the MSE rate of convergence decays increasingly slowly as the dimension $d$ of the samples increases. In particular, the rate is often glacially slow of order $O(T^{-{gamma}/{d}})$, where $T$ is the number of samples, and $gamma>0$ is a rate parameter. Examples of such estimators include kernel density estimators, $k$-NN density estimators, $k$-NN entropy estimators, intrinsic dimension estimators and other examples. In this paper, we propose a weighted convex combination of an ensemble of such estimators, where optimal weights can be chosen such that the weighted estimator converges at a much faster dimension invariant rate of $O(T^{-1})$. Furthermore, we show that these optimal weights can be determined by solving a convex optimization problem which can be performed offline and does not require training data. We illustrate the superior performance of our weighted estimator for two important applications: (i) estimating the Panter-Dite distortion-rate factor and (ii) estimating the Shannon entropy for testing the probability distribution of a random sample."
in_NB
ensemble_methods
entropy_estimation
statistics
8 weeks ago by cshalizi
[1203.5974] The Concentration and Stability of the Community Detecting Functions on Random Networks
8 weeks ago by cshalizi
"We propose a general form of community detecting functions for finding the communities or the optimal partition of a random network, and examine the concentration and stability of the function values using the bounded difference martingale method. We derive LDP inequalities for both the general case and several specific community detecting functions: modularity, graph bipartitioning and q-Potts community structure. We also discuss the concentration and stability of community detecting functions on different types of random networks: the sparse and non-sparse networks and some examples such as ER and CL networks."
in_NB
to_read
community_discovery
network_data_analysis
statistics
8 weeks ago by cshalizi
[1203.6093] Consensus clustering in complex networks
8 weeks ago by cshalizi
"The community structure of complex networks reveals both their organization and hidden relationships among their constituents. Most community detection methods currently available are not deterministic, and their results typically depend on the specific random seeds, initial conditions and tie-break rules adopted for their execution. Consensus clustering is used in data analysis to generate stable results out of a set of partitions delivered by stochastic methods. Here we show that consensus clustering can be combined with any existing method in a self-consistent way, enhancing considerably both the stability and the accuracy of the resulting partitions. This framework is also particularly suitable to monitor the evolution of community structure in temporal networks. An application of consensus clustering to a large citation network of physics papers demonstrates its capability to keep track of the birth, death and diversification of topics."
in_NB
community_discovery
network_data_analysis
clustering
data_mining
8 weeks ago by cshalizi
Taylor & Francis Online :: Robustness Diagnosis for Bootstrap Inference - Journal of Computational and Graphical Statistics - Volume 20, Issue 2
8 weeks ago by cshalizi
"We propose a new robustness diagnostic scheme for bootstrap inference procedures. The scheme is adaptive to the data actually observed, applies readily to bootstrap inference output of diverse format, and therefore provides robustness diagnostics practically more relevant than most conventional robustness measures. Specifically, it monitors the sensitivity of the bootstrap distribution of inference output to specially designed omnidirectional data perturbations, and quantifies findings by a standardized measure with the aid of repeated resampling. The resulting measure, displayed in the form of an R-value plot, permits direct comparisons across different bootstrap procedures and across inference output of different types. Numerical examples are presented using both simulated and real-life data to illustrate applications of the scheme to estimation and hypothesis testing problems. This article has supplementary material online."
in_NB
statistics
bootstrap
8 weeks ago by cshalizi
Entropy Estimation in Turing's Perspective
8 weeks ago by cshalizi
"A new nonparametric estimator of Shannon's entropy on a countable alphabet is proposed and analyzed against the well-known plug-in estimator. The proposed estimator is developed based on Turing's formula, which recovers distributional characteristics on the subset of the alphabet not covered by a size-n sample. The fundamental switch in perspective brings about substantial gain in estimation accuracy for every distribution with finite entropy. In general, a uniform variance upper bound is established for the entire class of distributions with finite entropy that decays at a rate of O(ln(n)/n) compared to O([ln(n)]2/n) for the plug-in. In a wide range of subclasses, the variance of the proposed estimator converges at a rate of O(1/n), and this rate of convergence carries over to the convergence rates in mean squared errors in many subclasses. Specifically, for any finite alphabet, the proposed estimator has a bias decaying exponentially in n. Several new bias-adjusted estimators are also discussed."
in_NB
entropy_estimation
statistics
information_theory
8 weeks ago by cshalizi
[1203.5181] $k$-MLE: A fast algorithm for learning statistical mixture models
9 weeks ago by cshalizi
"We describe $k$-MLE, a fast and efficient local search algorithm for learning finite statistical mixtures of exponential families such as Gaussian mixture models. Mixture models are traditionally learned using the expectation-maximization (EM) soft clustering technique that monotonically increases the incomplete (expected complete) likelihood. Given prescribed mixture weights, the hard clustering $k$-MLE algorithm iteratively assigns data to the most likely weighted component and update the component models using Maximum Likelihood Estimators (MLEs). Using the duality between exponential families and Bregman divergences, we prove that the local convergence of the complete likelihood of $k$-MLE follows directly from the convergence of a dual additively weighted Bregman hard clustering. The inner loop of $k$-MLE can be implemented using any $k$-means heuristic like the celebrated Lloyd's batched or Hartigan's greedy swap updates. We then show how to update the mixture weights by minimizing a cross-entropy criterion that implies to update weights by taking the relative proportion of cluster points, and reiterate the mixture parameter update and mixture weight update processes until convergence. Hard EM is interpreted as a special case of $k$-MLE when both the component update and the weight update are performed successively in the inner loop. To initialize $k$-MLE, we propose $k$-MLE++, a careful initialization of $k$-MLE guaranteeing probabilistically a global bound on the best possible complete likelihood."
in_NB
em_algorithm
mixture_models
statistics
machine_learning
clustering
estimation
9 weeks ago by cshalizi
[0803.1628] Component models for large networks
10 weeks ago by cshalizi
"Being among the easiest ways to find meaningful structure from discrete data, Latent Dirichlet Allocation (LDA) and related component models have been applied widely. They are simple, computationally fast and scalable, interpretable, and admit nonparametric priors. In the currently popular field of network modeling, relatively little work has taken uncertainty of data seriously in the Bayesian sense, and component models have been introduced to the field only recently, by treating each node as a bag of out-going links. We introduce an alternative, interaction component model for communities (ICMc), where the whole network is a bag of links, stemming from different components. The former finds both disassortative and assortative structure, while the alternative assumes assortativity and finds community-like structures like the earlier methods motivated by physics. With Dirichlet Process priors and an efficient implementation the models are highly scalable, as demonstrated with a social network from the Last.fm web site, with 670,000 nodes and 1.89 million links."
in_NB
community_discovery
statistics
nonparametrics
clustering
10 weeks ago by cshalizi
"Neural reuse: A fundamental organizational principle of the brain" (Anderson, 2010)
10 weeks ago by cshalizi
BBS target article.
Abstract: "An emerging class of theories concerning the functional structure of the brain takes the reuse of neural circuitry for various cognitive purposes to be a central organizational principle. According to these theories, it is quite common for neural circuits established for one purpose to be exapted (exploited, recycled, redeployed) during evolution or normal development, and be put to different uses, often without losing their original functions. Neural reuse theories thus differ from the usual understanding of the role of neural plasticity (which is, after all, a kind of reuse) in brain organization along the following lines: According to neural reuse, circuits can continue to acquire new uses after an initial or original function is established; the acquisition of new uses need not involve unusual circumstances such as injury or loss of established function; and the acquisition of a new use need not involve (much) local change to circuit structure (e.g., it might involve only the establishment of functional connections to new neural partners). Thus, neural reuse theories offer a distinct perspective on several topics of general interest, such as: the evolution and development of the brain, including (for instance) the evolutionary-developmental pathway supporting primate tool use and human language; the degree of modularity in brain organization; the degree of localization of cognitive function; and the cortical parcellation problem and the prospects (and proper methods to employ) for function to structure mapping. The idea also has some practical implications in the areas of rehabilitative medicine and machine interface design."
in_NB
to_read
fmri
neuroscience
functional_connectivity
modularity
re:functional_communities
neuropsychology
cognitive_science
Abstract: "An emerging class of theories concerning the functional structure of the brain takes the reuse of neural circuitry for various cognitive purposes to be a central organizational principle. According to these theories, it is quite common for neural circuits established for one purpose to be exapted (exploited, recycled, redeployed) during evolution or normal development, and be put to different uses, often without losing their original functions. Neural reuse theories thus differ from the usual understanding of the role of neural plasticity (which is, after all, a kind of reuse) in brain organization along the following lines: According to neural reuse, circuits can continue to acquire new uses after an initial or original function is established; the acquisition of new uses need not involve unusual circumstances such as injury or loss of established function; and the acquisition of a new use need not involve (much) local change to circuit structure (e.g., it might involve only the establishment of functional connections to new neural partners). Thus, neural reuse theories offer a distinct perspective on several topics of general interest, such as: the evolution and development of the brain, including (for instance) the evolutionary-developmental pathway supporting primate tool use and human language; the degree of modularity in brain organization; the degree of localization of cognitive function; and the cortical parcellation problem and the prospects (and proper methods to employ) for function to structure mapping. The idea also has some practical implications in the areas of rehabilitative medicine and machine interface design."
10 weeks ago by cshalizi
Cai : Minimax and Adaptive Inference in Nonparametric Function Estimation
10 weeks ago by cshalizi
"Since Stein’s 1956 seminal paper, shrinkage has played a fundamental role in both parametric and nonparametric inference. This article discusses minimaxity and adaptive minimaxity in nonparametric function estimation. Three interrelated problems, function estimation under global integrated squared error, estimation under pointwise squared error, and nonparametric confidence intervals, are considered. Shrinkage is pivotal in the development of both the minimax theory and the adaptation theory.
"While the three problems are closely connected and the minimax theories bear some similarities, the adaptation theories are strikingly different. For example, in a sharp contrast to adaptive point estimation, in many common settings there do not exist nonparametric confidence intervals that adapt to the unknown smoothness of the underlying function. A concise account of these theories is given. The connections as well as differences among these problems are discussed and illustrated through examples."
in_NB
statistics
estimation
regression
confidence_sets
nonparametrics
shrinkage
cai.t._tony
minimax
"While the three problems are closely connected and the minimax theories bear some similarities, the adaptation theories are strikingly different. For example, in a sharp contrast to adaptive point estimation, in many common settings there do not exist nonparametric confidence intervals that adapt to the unknown smoothness of the underlying function. A concise account of these theories is given. The connections as well as differences among these problems are discussed and illustrated through examples."
10 weeks ago by cshalizi
[0803.3017] Accelerated convergence for nonparametric regression with coarsened predictors
11 weeks ago by cshalizi
"We consider nonparametric estimation of a regression function for a situation where precisely measured predictors are used to estimate the regression curve for coarsened, that is, less precise or contaminated predictors. Specifically, while one has available a sample $(W_1,Y_1),...,(W_n,Y_n)$ of independent and identically distributed data, representing observations with precisely measured predictors, where $mathrm{E}(Y_i|W_i)=g(W_i)$, instead of the smooth regression function $g$, the target of interest is another smooth regression function $m$ that pertains to predictors $X_i$ that are noisy versions of the $W_i$. Our target is then the regression function $m(x)=E(Y|X=x)$, where $X$ is a contaminated version of $W$, that is, $X=W+delta$. It is assumed that either the density of the errors is known, or replicated data are available resembling, but not necessarily the same as, the variables $X$. In either case, and under suitable conditions, we obtain $sqrt{n}$-rates of convergence of the proposed estimator and its derivatives, and establish a functional limit theorem. Weak convergence to a Gaussian limit process implies pointwise and uniform confidence intervals and $sqrt{n}$-consistent estimators of extrema and zeros of $m$. It is shown that these results are preserved under more general models in which $X$ is determined by an explanatory variable. Finite sample performance is investigated in simulations and illustrated by a real data example." --- I can think of potential applications...
in_NB
regression
statistics
nonparametrics
error-in-variables
11 weeks ago by cshalizi
[0803.2986] Perturbation selection and influence measures in local influence analysis
11 weeks ago by cshalizi
"Cook's [J. Roy. Statist. Soc. Ser. B 48 (1986) 133--169] local influence approach based on normal curvature is an important diagnostic tool for assessing local influence of minor perturbations to a statistical model. However, no rigorous approach has been developed to address two fundamental issues: the selection of an appropriate perturbation and the development of influence measures for objective functions at a point with a nonzero first derivative. The aim of this paper is to develop a differential--geometrical framework of a perturbation model (called the perturbation manifold) and utilize associated metric tensor and affine curvatures to resolve these issues. We will show that the metric tensor of the perturbation manifold provides important information about selecting an appropriate perturbation of a model. Moreover, we will introduce new influence measures that are applicable to objective functions at any point. Examples including linear regression models and linear mixed models are examined to demonstrate the effectiveness of using new influence measures for the identification of influential observations."
in_NB
statistics
regression
11 weeks ago by cshalizi
[0803.2984] Conditional density estimation in a regression setting
11 weeks ago by cshalizi
"Regression problems are traditionally analyzed via univariate characteristics like the regression function, scale function and marginal density of regression errors. These characteristics are useful and informative whenever the association between the predictor and the response is relatively simple. More detailed information about the association can be provided by the conditional density of the response given the predictor. For the first time in the literature, this article develops the theory of minimax estimation of the conditional density for regression settings with fixed and random designs of predictors, bounded and unbounded responses and a vast set of anisotropic classes of conditional densities. The study of fixed design regression is of special interest and novelty because the known literature is devoted to the case of random predictors. For the aforementioned models, the paper suggests a universal adaptive estimator which (i) matches performance of an oracle that knows both an underlying model and an estimated conditional density; (ii) is sharp minimax over a vast class of anisotropic conditional densities; (iii) is at least rate minimax when the response is independent of the predictor and thus a bivariate conditional density becomes a univariate density; (iv) is adaptive to an underlying design (fixed or random) of predictors."
in_NB
statistics
nonparametrics
regression
density_estimation
minimax
to_read
to_teach:undergrad-ADA
11 weeks ago by cshalizi
[0803.2999] Rate-optimal estimation for a general class of nonparametric regression models with unknown link functions
11 weeks ago by cshalizi
"This paper discusses a nonparametric regression model that naturally generalizes neural network models. The model is based on a finite number of one-dimensional transformations and can be estimated with a one-dimensional rate of convergence. The model contains the generalized additive model with unknown link function as a special case. For this case, it is shown that the additive components and link function can be estimated with the optimal rate by a smoothing spline that is the solution of a penalized least squares criterion."
in_NB
statistics
regression
neural_networks
nonparametrics
11 weeks ago by cshalizi
[1202.3482] The local geometry of finite mixtures
11 weeks ago by cshalizi
"We introduce a technique to obtain local (bracketing) metric entropy bounds for subsets of a normed vector space from global entropy bounds. Using this method, we establish that for q>=1, the class of convex combinations of q translates of a probability density has finite local doubling dimension under a smoothness assumption. The proof requires a detailed investigation of the local geometry of mixture classes, which is of independent interest."
in_NB
statistics
mixture_models
learning_theory
11 weeks ago by cshalizi
Hochschild, J.L., Weaver, V., Burch, T.: Creating a New Racial Order: How Immigration, Multiracialism, Genomics, and the Young Can Remake Race in America.
11 weeks ago by cshalizi
"
The American racial order--the beliefs, institutions, and practices that organize relationships among the nation's races and ethnicities--is undergoing its greatest transformation since the 1960s. Creating a New Racial Order takes a groundbreaking look at the reasons behind this dramatic change, and considers how different groups of Americans are being affected. Through revealing narrative and striking research, the authors show that the personal and political choices of Americans will be critical to how, and how much, racial hierarchy is redefined in decades to come.
"The authors outline the components that make up a racial order and examine the specific mechanisms influencing group dynamics in the United States: immigration, multiracialism, genomic science, and generational change. Cumulatively, these mechanisms increase heterogeneity within each racial or ethnic group, and decrease the distance separating groups from each other. The authors show that individuals are moving across group boundaries, that genomic science is challenging the whole concept of race, and that economic variation within groups is increasing. Above all, young adults understand and practice race differently from their elders: their formative memories are 9/11, Hurricane Katrina, and Obama's election--not civil rights marches, riots, or the early stages of immigration. Blockages could stymie or distort these changes, however, so the authors point to essential policy and political choices."
in_NB
books:noted
race
sociology
the_american_dilemma
The American racial order--the beliefs, institutions, and practices that organize relationships among the nation's races and ethnicities--is undergoing its greatest transformation since the 1960s. Creating a New Racial Order takes a groundbreaking look at the reasons behind this dramatic change, and considers how different groups of Americans are being affected. Through revealing narrative and striking research, the authors show that the personal and political choices of Americans will be critical to how, and how much, racial hierarchy is redefined in decades to come.
"The authors outline the components that make up a racial order and examine the specific mechanisms influencing group dynamics in the United States: immigration, multiracialism, genomic science, and generational change. Cumulatively, these mechanisms increase heterogeneity within each racial or ethnic group, and decrease the distance separating groups from each other. The authors show that individuals are moving across group boundaries, that genomic science is challenging the whole concept of race, and that economic variation within groups is increasing. Above all, young adults understand and practice race differently from their elders: their formative memories are 9/11, Hurricane Katrina, and Obama's election--not civil rights marches, riots, or the early stages of immigration. Blockages could stymie or distort these changes, however, so the authors point to essential policy and political choices."
11 weeks ago by cshalizi
[1202.2341] Measure concentration through non-Lipschitz observables and functional inequalities
12 weeks ago by cshalizi
"Non-Gaussian concentration estimates are obtained for invariant probability measures of reversible Markov processes. We show that the functional inequalities approach combined with a suitable Lyapunov condition allows us to circumvent the classical Lipschitz assumption of the observables. Our method is general and covers diffusions as well as pure-jump Markov processes on unbounded spaces."
in_NB
concentration_of_measure
markov_models
stochastic_processes
12 weeks ago by cshalizi
[1202.1377] Statistical significance in high-dimensional linear models
12 weeks ago by cshalizi
"We propose a method for constructing p-values for general hypotheses in a high-dimensional linear model. The hypotheses can be local for testing a single regression parameter or they may be more global involving several up to all parameters. Furthermore, when considering many hypotheses, we show how to adjust for multiple testing taking dependence among the p-values into account. Our technique is based on Ridge estimation with an additional correction term due to a substantial projection bias in high dimensions. We prove strong error control for our p-values and provide sufficient conditions for detection: for the former, we do not make any assumption on the size of the true underlying regression coefficients. We demonstrate the method in simulated examples and a real data application."
in_NB
statistics
regression
goodness-of-fit
hypothesis_testing
buhlmann.peter
12 weeks ago by cshalizi
[0804.2487] The ergodic decomposition of asymptotically mean stationary random sources
12 weeks ago by cshalizi
"It is demonstrated how to represent asymptotically mean stationary (AMS) random sources with values in standard spaces as mixtures of ergodic AMS sources. This an extension of the well known decomposition of stationary sources which has facilitated the generalization of prominent source coding theorems to arbitrary, not necessarily ergodic, stationary sources. Asymptotic mean stationarity generalizes the definition of stationarity and covers a much larger variety of real-world examples of random sources of practical interest. It is sketched how to obtain source coding and related theorems for arbitrary, not necessarily ergodic, AMS sources, based on the presented ergodic decomposition."
in_NB
ergodic_theory
to_read
re:almost_none
stochastic_processes
12 weeks ago by cshalizi
[0805.2285] Smoothing-inspired lack-of-fit tests based on ranks
12 weeks ago by cshalizi
"A rank-based test of the null hypothesis that a regressor has no effect on a response variable is proposed and analyzed. This test is identical in structure to the order selection test but with the raw data replaced by ranks. The test is nonparametric in that it is consistent against virtually any smooth alternative, and is completely distribution free for all sample sizes. The asymptotic distribution of the rank-based order selection statistic is obtained and seen to be the same as that of its raw data counterpart. Exact small sample critical values of the test statistic are provided as well. It is shown that the Pitman-Noether efficiency of the proposed rank test compares very favorably with that of the order selection test. In fact, their asymptotic relative efficiency is identical to that of the Wilcoxon signed rank and $t$-tests. An example involving microarray data illustrates the usefulness of the rank test in practice."
in_NB
regression
statistics
model-checking
goodness-of-fit
hart.jeffrey
12 weeks ago by cshalizi
[1202.3323] A new look at shifting regret
12 weeks ago by cshalizi
We investigate extensions of well-known online learning algorithms such as fixed-share of Herbster and Warmuth (1998) or the methods proposed by Bousquet and Warmuth (2002). These algorithms use weight sharing schemes to perform as well as the best sequence of experts with a limited number of changes. Here we show, with a common, general, and simpler analysis, that weight sharing in fact achieves much more than what it was designed for. We use it to simultaneously prove new shifting regret bounds for online convex optimization on the simplex in terms of the total variation distance as well as new bounds for the related setting of adaptive regret. Finally, we exhibit the first logarithmic shifting bounds for exp-concave loss functions on the simplex.
online_learning
to_read
individual_sequence_prediction
non-stationarity
re:growing_ensemble_project
in_NB
low-regret_learning
have_read
12 weeks ago by cshalizi
[1202.3775] Kernel-based Conditional Independence Test and Application in Causal Discovery
12 weeks ago by cshalizi
"Conditional independence testing is an important problem, especially in Bayesian network learning and causal discovery. Due to the curse of dimensionality, testing for conditional independence of continuous variables is particularly challenging. We propose a Kernel-based Conditional Independence test (KCI-test), by constructing an appropriate test statistic and deriving its asymptotic distribution under the null hypothesis of conditional independence. The proposed method is computationally efficient and easy to implement. Experimental results show that it outperforms other methods, especially when the conditioning set is large or the sample size is not very large, in which case other methods encounter difficulties."
statistics
kernel_estimators
independence_testing
hypothesis_testing
causal_inference
in_NB
have_read
to:blog
to_teach:undergrad-ADA
12 weeks ago by cshalizi
Universality of Bayesian Predictions
12 weeks ago by cshalizi
"This paper studies the theoretical properties of Bayesian predictions and shows that under minimal conditions we can derive finite sample bounds for the loss incurred using Bayesian predictions under the Kullback-Leibler divergence. In particular, the concept of universality of predictions is discussed and universality is established for Bayesian predictions in a variety of settings. These include predictions under almost arbitrary loss functions, model averaging, predictions in a non-stationary environment and under model misspecification."
in_NB
to_read
statistics
bayesian_consistency
prediction
misspecification
universal_prediction
12 weeks ago by cshalizi
IEEE Xplore - Complexity-Regularized Tree-Structured Partition for Mutual Information Estimation
12 weeks ago by cshalizi
"A new histogram-based mutual information estimator using data-driven tree-structured partitions (TSP) is presented in this paper. The derived TSP is a solution to a complexity regularized empirical information maximization, with the objective of finding a good tradeoff between the known estimation and approximation errors. A distribution-free concentration inequality for this tree-structured learning problem as well as finite sample performance bounds for the proposed histogram-based solution is derived. It is shown that this solution is density-free strongly consistent and that it provides, with an arbitrary high probability, an optimal balance between the mentioned estimation and approximation errors. Finally, for the emblematic scenario of independence, $I(X;Y)=0$ , it is shown that the TSP estimate converges to zero with $O(e^{-n^{1/3}+ log log n})$."
in_NB
statistics
entropy_estimation
12 weeks ago by cshalizi
[0805.3906] Inference for Multivariate Normal Mixtures
12 weeks ago by cshalizi
"Multivariate normal mixtures provide a flexible model for high-dimensional data. They are widely used in statistical genetics, statistical finance, and other disciplines. Due to the unboundedness of the likelihood function, classical likelihood-based methods, which may have nice practical properties, are inconsistent. In this paper, we recommend a penalized likelihood method for estimating the mixing distribution. We show that the maximum penalized likelihood estimator is strongly consistent when the number of components has a known upper bound. We also explore a convenient EM-algorithm for computing the maximum penalized likelihood estimator. Extensive simulations are conducted to explore the effectiveness and the practical limitations of both the new method and the ratified maximum likelihood estimators. Guidelines are provided based on the simulation results."
have_read
statistics
mixture_models
re:network_model_selection
in_NB
12 weeks ago by cshalizi
[0806.3024] Rates of contraction of posterior distributions based on Gaussian process priors
12 weeks ago by cshalizi
"We derive rates of contraction of posterior distributions on nonparametric or semiparametric models based on Gaussian processes. The rate of contraction is shown to depend on the position of the true parameter relative to the reproducing kernel Hilbert space of the Gaussian process and the small ball probabilities of the Gaussian process. We determine these quantities for a range of examples of Gaussian priors and in several statistical settings. For instance, we consider the rate of contraction of the posterior distribution based on sampling from a smooth density model when the prior models the log density as a (fractionally integrated) Brownian motion. We also consider regression with Gaussian errors and smooth classification under a logistic or probit link function combined with various priors."
in_NB
statistics
bayesian_consistency
gaussian_processes
van_der_vaart.aad
12 weeks ago by cshalizi
[0806.4140] Optimal oracle inequalities for model selection
12 weeks ago by cshalizi
"Model selection is often performed by empirical risk minimization. The quality of selection in a given situation can be assessed by risk bounds, which require assumptions both on the margin and the tails of the losses used. Starting with examples from the 3 basic estimation problems, regression, classification and density estimation, we formulate risk bounds for empirical risk minimization under successively weakening conditions and prove them at a very general level, for general margin and power tail behavior of the excess losses."
in_NB
statistics
learning_theory
cross-validation
model_selection
van_de_geer.sara
12 weeks ago by cshalizi
[0806.3978] Information In The Non-Stationary Case
12 weeks ago by cshalizi
"Information estimates such as the ``direct method'' of Strong et al. (1998) sidestep the difficult problem of estimating the joint distribution of response and stimulus by instead estimating the difference between the marginal and conditional entropies of the response. While this is an effective estimation strategy, it tempts the practitioner to ignore the role of the stimulus and the meaning of mutual information. We show here that, as the number of trials increases indefinitely, the direct (or ``plug-in'') estimate of marginal entropy converges (with probability 1) to the entropy of the time-averaged conditional distribution of the response, and the direct estimate of the conditional entropy converges to the time-averaged entropy of the conditional distribution of the response. Under joint stationarity and ergodicity of the response and stimulus, the difference of these quantities converges to the mutual information. When the stimulus is deterministic or non-stationary the direct estimate of information no longer estimates mutual information, which is no longer meaningful, but it remains a measure of variability of the response distribution across time."
in_NB
statistics
neuroscience
entropy_estimation
kith_and_kin
heard_the_talk
yu.bin
vu.vincent
kass.rob
neural_data_analysis
12 weeks ago by cshalizi
[0808.4042] Statistical models, likelihood, penalized likelihood and hierarchical likelihood
12 weeks ago by cshalizi
"We give an overview of statistical models and likelihood, together with two of its variants: penalized and hierarchical likelihood. The Kullback-Leibler divergence is referred to repeatedly, for defining the misspecification risk of a model, for grounding the likelihood and the likelihood crossvalidation which can be used for choosing weights in penalized likelihood. Families of penalized likelihood and sieves estimators are shown to be equivalent. The similarity of these likelihood with a posteriori distributions in a Bayesian approach is considered."
in_NB
statistics
likelihood
bayesianism
information_theory
12 weeks ago by cshalizi
[0809.5032] Identifiability of parameters in latent structure models with many observed variables
12 weeks ago by cshalizi
"While hidden class models of various types arise in many statistical applications, it is often difficult to establish the identifiability of their parameters. Focusing on models in which there is some structure of independence of some of the observed variables conditioned on hidden ones, we demonstrate a general approach for establishing identifiability utilizing algebraic arguments. A theorem of J. Kruskal for a simple latent-class model with finite state space lies at the core of our results, though we apply it to a diverse set of models. These include mixtures of both finite and nonparametric product distributions, hidden Markov models and random graph mixture models, and lead to a number of new results and improvements to old ones. In the parametric setting, this approach indicates that for such models, the classical definition of identifiability is typically too strong. Instead generic identifiability holds, which implies that the set of nonidentifiable parameters has measure zero, so that parameter inference is still meaningful. In particular, this sheds light on the properties of finite mixtures of Bernoulli products, which have been used for decades despite being known to have nonidentifiable parameters. In the nonparametric setting, we again obtain identifiability only when certain restrictions are placed on the distributions that are mixed, but we explicitly describe the conditions."
in_NB
statistics
identifiability
mixture_models
inference_to_latent_objects
re:homophily_and_confounding
to_read
12 weeks ago by cshalizi
[0810.5302] A class of R'{e}nyi information estimators for multidimensional densities
12 weeks ago by cshalizi
"A class of estimators of the R'{e}nyi and Tsallis entropies of an unknown distribution $f$ in $mathbb{R}^m$ is presented. These estimators are based on the $k$th nearest-neighbor distances computed from a sample of $N$ i.i.d. vectors with distribution $f$. We show that entropies of any order $q$, including Shannon's entropy, can be estimated consistently with minimal assumptions on $f$. Moreover, we show that it is straightforward to extend the nearest-neighbor method to estimate the statistical distance between two distributions using one i.i.d. sample from each."
in_NB
statistics
entropy_estimation
information_theory
renyi_entropy
12 weeks ago by cshalizi
[0810.2276] A generalized portmanteau test of independence between two stationary time series
12 weeks ago by cshalizi
"We propose generalized portmanteau-type test statistics in the frequency domain to test independence between two stationary time series. The test statistics are formed analogous to the one in Chen and Deo (2004, Econometric Theory 20, 382-416), who extended the applicability of portmanteau goodness-of-fit test to the long memory case. Under the null hypothesis of independence, the asymptotic standard normal distributions of the proposed statistics are derived under fairly mild conditions. In particular, each time series is allowed to possess short memory, long memory or anti-persistence. A simulation study shows that the tests have reasonable size and power properties."
in_NB
statistics
time_series
hypothesis_testing
independence_testing
12 weeks ago by cshalizi
[1102.3616] Tight conditions for consistent variable selection in high dimensional nonparametric regression
february 2012 by cshalizi
"We address the issue of variable selection in the regression model with very high ambient dimension, i.e., when the number of covariates is very large. The main focus is on the situation where the number of relevant covariates, called intrinsic dimension, is much smaller than the ambient dimension. Without assuming any parametric form of the underlying regression function, we get tight conditions making it possible to consistently estimate the set of relevant variables. These conditions relate the intrinsic dimension to the ambient dimension and to the sample size. The procedure that is provably consistent under these tight conditions is simple and is based on comparing the empirical Fourier coefficients with an appropriately chosen threshold value."
in_NB
regression
variable_selection
nonparametrics
statistics
february 2012 by cshalizi
[0801.0327] Nonparametric sequential prediction of time series
february 2012 by cshalizi
"Time series prediction covers a vast field of every-day statistical applications in medical, environmental and economic domains. In this paper we develop nonparametric prediction strategies based on the combination of a set of 'experts' and show the universal consistency of these strategies under a minimum of conditions. We perform an in-depth analysis of real-world data sets and show that these nonparametric strategies are more flexible, faster and generally outperform ARMA methods in terms of normalized cumulative prediction error."
in_NB
time_series
nonparametrics
prediction
statistics
to_teach:undergrad-ADA
re:growing_ensemble_project
february 2012 by cshalizi
Efficiently identifying deterministic real-time automata from labeled data
february 2012 by cshalizi
"We develop a novel learning algorithm RTI for identifying a deterministic real-time automaton (DRTA) from labeled time-stamped event sequences. The RTI algorithm is based on the current state of the art in deterministic finite-state automaton (DFA) identification, called evidence-driven state-merging (EDSM). In addition to having a DFA structure, a DRTA contains time constraints between occurrences of consecutive events. Although this seems a small difference, we show that the problem of identifying a DRTA is much more difficult than the problem of identifying a DFA: identifying only the time constraints of a DRTA given its DFA structure is already NP-complete. In spite of this additional complexity, we show that RTI is a correct and complete algorithm that converges efficiently (from polynomial time and data) to the correct DRTA in the limit. To the best of our knowledge, this is the first algorithm that can identify a timed automaton model from time-stamped event sequences.
A straightforward alternative to identifying DRTAs is to identify a DFA that models time implicitly, i.e., a DFA that uses different states for different points in time. Such a DFA can be identified by first sampling the timed sequences using a fixed frequency, and subsequently applying EDSM to the resulting non-timed event sequences. We evaluate the performance of both RTI and this sampling approach experimentally on artificially generated data. In these experiments RTI outperforms the sampling approach significantly. Thus, we show that if we obtain data from a real-time system, it is easier to identify a DRTA from this data than to identify an equivalent DFA."
in_NB
automata_theory
machine_learning
grammar_induction
A straightforward alternative to identifying DRTAs is to identify a DFA that models time implicitly, i.e., a DFA that uses different states for different points in time. Such a DFA can be identified by first sampling the timed sequences using a fixed frequency, and subsequently applying EDSM to the resulting non-timed event sequences. We evaluate the performance of both RTI and this sampling approach experimentally on artificially generated data. In these experiments RTI outperforms the sampling approach significantly. Thus, we show that if we obtain data from a real-time system, it is easier to identify a DRTA from this data than to identify an equivalent DFA."
february 2012 by cshalizi
"Local equivalences of distances between clusterings—a geometric perspective" --- Marina Meilă
february 2012 by cshalizi
"In comparing clusterings, several different distances and indices are in use. We prove that the Misclassification Error distance, the Hamming distance (equivalent to the unadjusted Rand index), and the χ 2 distance between partitions are equivalent in the neighborhood of 0. In other words, if two partitions are very similar, then one distance defines upper and lower bounds on the other and viceversa. The proofs are geometric and rely on the concavity of the distances. The geometric intuitions themselves advance the understanding of the space of all clusterings. To our knowledge, this is the first result of its kind.
Practically, distances are frequently used to compare two clusterings of a set of observations. But the motivation for this work is in the theoretical study of data clustering. Distances between partitions are involved in constructing new methods for cluster validation, determining the number of clusters, and analyzing clustering algorithms. From a probability theory point of view, the present results apply to any pair of finite valued random variables, and provide simple yet tight upper and lower bounds on the χ 2 measure of (in)dependence valid when the two variables are strongly dependent."
in_NB
clustering
data_mining
meila.marina
Practically, distances are frequently used to compare two clusterings of a set of observations. But the motivation for this work is in the theoretical study of data clustering. Distances between partitions are involved in constructing new methods for cluster validation, determining the number of clusters, and analyzing clustering algorithms. From a probability theory point of view, the present results apply to any pair of finite valued random variables, and provide simple yet tight upper and lower bounds on the χ 2 measure of (in)dependence valid when the two variables are strongly dependent."
february 2012 by cshalizi
[0810.5663] Effective Complexity and its Relation to Logical Depth
february 2012 by cshalizi
"Effective complexity measures the information content of the regularities of an object. It has been introduced by M. Gell-Mann and S. Lloyd to avoid some of the disadvantages of Kolmogorov complexity, also known as algorithmic information content. In this paper, we give a precise formal definition of effective complexity and rigorous proofs of its basic properties. In particular, we show that incompressible binary strings are effectively simple, and we prove the existence of strings that have effective complexity close to their lengths. Furthermore, we show that effective complexity is related to Bennett's logical depth: If the effective complexity of a string $x$ exceeds a certain explicit threshold then that string must have astronomically large depth; otherwise, the depth can be arbitrarily small."
in_NB
kith_and_kin
complexity_measures
ay.nihat
algorithmic_information_theory
february 2012 by cshalizi
[0810.5565] Limit Behaviour of Sequential Empirical Measure Processes
february 2012 by cshalizi
"In this paper, we obtain some uniform laws of large numbers and functional central limit theorems for sequential empirical measure processes indexed by classes of product functions satisfying appropriate Vapnik-Chervonenkis properties."
in_NB
empirical_processes
stochastic_processes
statistics
february 2012 by cshalizi
Bootstrapping clustered data - Field - 2007 - Journal of the Royal Statistical Society: Series B (Statistical Methodology) - Wiley Online Library
february 2012 by cshalizi
"Various bootstraps have been proposed for bootstrapping clustered data from one-way arrays. The simulation results in the literature suggest that some of these methods work quite well in practice; the theoretical results are limited and more mixed in their conclusions. For example, McCullagh reached negative conclusions about the use of non-parametric bootstraps for one-way arrays. The purpose of this paper is to extend our understanding of the issues by discussing the effect of different ways of modelling clustered data, the criteria for successful bootstraps used in the literature and extending the theory from functions of the sample mean to include functions of the between and within sums of squares and non-parametric bootstraps to include model-based bootstraps. We determine that the consistency of variance estimates for a bootstrap method depends on the choice of model with the residual bootstrap giving consistency under the transformation model whereas the cluster bootstrap gives consistent estimates under both the transformation and the random-effect model. In addition we note that the criteria based on the distribution of the bootstrap observations are not really useful in assessing consistency."
in_NB
have_read
statistics
bootstrap
to_teach:undergrad-ADA
hierarchical_models
february 2012 by cshalizi
[1202.4294] Prediction of quantiles by statistical learning and application to GDP forecasting
february 2012 by cshalizi
"In this paper, we tackle the problem of prediction and confidence intervals for time series using a statistical learning approach and quantile loss functions. In a first time, we show that the Gibbs estimator (also known as Exponentially Weighted aggregate) is able to predict as well as the best predictor in a given family for a wide set of loss functions. In particular, using the quantile loss function of Koenker and Bassett (1978), this allows to build confidence intervals. We apply these results to the problem of prediction and confidence regions for the French Gross Domestic Product (GDP) growth, with promising results."
in_NB
to_read
prediction
confidence_sets
learning_theory
re:your_favorite_dsge_sucks
re:growing_ensemble_project
february 2012 by cshalizi
[1202.4283] Fast rates in learning with dependent observations
february 2012 by cshalizi
"In this paper we tackle the problem of fast rates in time series forecasting from a statistical learning perspective. In a serie of papers (e.g. Meir 2000, Modha and Masry 1998, Alquier and Wintenberger 2012) it is shown that the main tools used in learning theory with iid observations can be extended to the prediction of time series. The main message of these papers is that, given a family of predictors, we are able to build a new predictor that predicts the series as well as the best predictor in the family, up to a remainder of order $1/sqrt{n}$. It is known that this rate cannot be improved in general. In this paper, we show that in the particular case of the least square loss, and under a strong assumption on the time series (phi-mixing) the remainder is actually of order $1/n$. Thus, the optimal rate for iid variables, see e.g. Tsybakov 2003, and individual sequences, see cite{lugosi} is, for the first time, achieved for uniformly mixing processes. We also show that our method is optimal for aggregating sparse linear combinations of predictors."
--- Assumes observations are in the interval [-B,B] and gets a bound which is O(B^3), and so useless for our purposes.
in_NB
learning_theory
mixing
ergodic_theory
re:your_favorite_dsge_sucks
re:XV_for_mixing
have_read
--- Assumes observations are in the interval [-B,B] and gets a bound which is O(B^3), and so useless for our purposes.
february 2012 by cshalizi
Phys. Rev. E 78, 046102 (2008): Network quotients: Structural skeletons of complex systems
february 2012 by cshalizi
"A defining feature of many large empirical networks is their intrinsic complexity. However, many networks also contain a large degree of structural repetition. An immediate question then arises: can we characterize essential network complexity while excluding structural redundancy? In this article we utilize inherent network symmetry to collapse all redundant information from a network, resulting in a coarse graining which we show to carry the essential structural information of the “parent” network. In the context of algebraic combinatorics, this coarse-graining is known as the “quotient.” We systematically explore the theoretical properties of network quotients and summarize key statistics of a variety of “real-world” quotients with respect to those of their parent networks. In particular, we find that quotients can be substantially smaller than their parent networks yet typically preserve various key functional properties such as complexity (heterogeneity and hub vertices) and communication (diameter and mean geodesic distance), suggesting that quotients constitute the essential structural skeletons of their parent networks. We summarize with a discussion of potential uses of quotients in analysis of biological regulatory networks and ways in which using quotients can reduce the computational complexity of network algorithms."
in_NB
network_data_analysis
february 2012 by cshalizi
[0810.3023] Iterated Regret Minimization: A More Realistic Solution Concept
february 2012 by cshalizi
"For some well-known games, such as the Traveler's Dilemma or the Centipede Game, traditional game-theoretic solution concepts--and most notably Nash equilibrium--predict outcomes that are not consistent with empirical observations. In this paper, we introduce a new solution concept, iterated regret minimization, which exhibits the same qualitative behavior as that observed in experiments in many games of interest, including Traveler's Dilemma, the Centipede Game, Nash bargaining, and Bertrand competition. As the name suggests, iterated regret minimization involves the iterated deletion of strategies that do not minimize regret."
--- Quite astonishingly, no mention at all of low-regret learning!
game_theory
online_learning
have_read
in_NB
halpern.joseph_y.
re:knightian_uncertainty
low-regret_learning
--- Quite astonishingly, no mention at all of low-regret learning!
february 2012 by cshalizi
[1202.3079] Towards minimax policies for online linear optimization with bandit feedback
february 2012 by cshalizi
"We address the online linear optimization problem with bandit feedback. Our contribution is twofold. First, we provide an algorithm (based on exponential weights) with a regret of order $sqrt{d n log N}$ for any finite action set with $N$ actions, under the assumption that the instantaneous loss is bounded by 1. This shaves off an extraneous $sqrt{d}$ factor compared to previous works, and gives a regret bound of order $d sqrt{n log n}$ for any compact set of actions. Without further assumptions on the action set, this last bound is minimax optimal up to a logarithmic factor. Interestingly, our result also shows that the minimax regret for bandit linear optimization with expert advice in $d$ dimension is the same as for the basic $d$-armed bandit with expert advice. Our second contribution is to show how to use the Mirror Descent algorithm to obtain computationally efficient strategies with minimax optimal regret bounds in specific examples. More precisely we study two canonical action sets: the hypercube and the Euclidean ball. In the former case, we obtain the first computationally efficient algorithm with a $d sqrt{n}$ regret, thus improving by a factor $sqrt{d log n}$ over the best known result for a computationally efficient algorithm. In the latter case, our approach gives the first algorithm with a $sqrt{d n log n}$ regret, again shaving off an extraneous $sqrt{d}$ compared to previous works."
in_NB
online_learning
decision_theory
optimization
re:growing_ensemble_project
cesa-bianchi.nicolo
kakade.sham
bubeck.sebastien
february 2012 by cshalizi
Conditional Likelihood Maximisation: A Unifying Framework for Information Theoretic Feature Selection
february 2012 by cshalizi
"We present a unifying framework for information theoretic feature selection, bringing almost two decades of research on heuristic filter criteria under a single theoretical interpretation. This is in response to the question: "what are the implicit statistical assumptions of feature selection criteria based on mutual information?". To answer this, we adopt a different strategy than is usual in the feature selection literature−instead of trying to define a criterion, we derive one, directly from a clearly specified objective function: the conditional likelihood of the training labels. While many hand-designed heuristic criteria try to optimize a definition of feature 'relevancy' and 'redundancy', our approach leads to a probabilistic framework which naturally incorporates these concepts. As a result we can unify the numerous criteria published over the last two decades, and show them to be low-order approximations to the exact (but intractable) optimisation problem. The primary contribution is to show that common heuristics for information based feature selection (including Markov Blanket algorithms as a special case) are approximate iterative maximisers of the conditional likelihood. A large empirical study provides strong evidence to favour certain classes of criteria, in particular those that balance the relative size of the relevancy/redundancy terms. Overall we conclude that the JMI criterion (Yang and Moody, 1999; Meyer et al., 2008) provides the best tradeoff in terms of accuracy, stability, and flexibility with small data samples."
in_NB
information_theory
statistics
variable_selection
model_selection
to_teach:data-mining
to:blog
machine_learning
classifiers
have_read
graphical_models
february 2012 by cshalizi
The Evolution of Cultural Diversity: A Phylogenetic Approach by Ruth Mace - Powell's Books
february 2012 by cshalizi
"Virtually all aspects of human behavior show enormous variation both within and between cultural groups, including material culture, social organization and language. Thousands of distinct cultural groups exist: about 6,000 languages are spoken today, and it is thought that a far greater number of languages existed in the past but became extinct. Using a Darwinian approach, this book seeks to explain this rich cultural variation. There are a number of theoretical reasons to believe that cultural diversification might be tree-like, that is phylogenetic: material and non-material culture is clearly inherited by descendants, there is descent with modification, and languages appear to be hierarchically related. There are also a number of theoretical reasons to believe that cultural evolution is not tree-like: cultural inheritance is not Mendelian and can indeed be vertical, horizontal or oblique, evidence of borrowing abounds, cultures are not necessarily biological populations and can be transient and complex. Here, for the first time, this title tackles these questions of cultural evolution empirically and quantitatively, using a range of case studies from Africa, the Pacific, Europe, Asia and America. A range of powerful theoretical tools developed in evolutionary biology is used to test detailed hypotheses about historical patterns and adaptive functions in cultural evolution. Evidence is amassed from archaeological, linguist and cultural datasets, from both recent and historical or pre-historical time periods. A unifying theme is that the phylogenetic approach is a useful and powerful framework, both for describing the evolutionary history of these traits, and also for testing adaptive hypotheses about their evolution and co-evolution. Contributors include archaeologists, anthropologists, evolutionary biologists and linguists, and this book will be of great interest to all those involved in these areas."
in_NB
books:noted
phylogenetics
evolutionary_biology
human_evolution
cultural_evolution
cultural_transmission
cultural_differences
february 2012 by cshalizi
Phylogenetic Networks - Academic and Professional Books - Cambridge University Press
february 2012 by cshalizi
"The evolutionary history of species is traditionally represented using a rooted phylogenetic tree. However, when reticulate events such as hybridization, horizontal gene transfer or recombination are believed to be involved, phylogenetic networks that can accommodate non-treelike evolution have an important role to play. This book provides the first interdisciplinary overview of phylogenetic networks. Beginning with a concise introduction to both phylogenetic trees and phylogenetic networks, the fundamental concepts and results are then presented for both rooted and unrooted phylogenetic networks. Current approaches and algorithms available for computing phylogenetic networks from different types of datasets are then discussed, accompanied by examples of their application to real biological datasets. The book also summarises the algorithms used for drawing phylogenetic networks, along with the existing software for their computation and evaluation. All datasets, examples and other additional information and links are available from the book's companion website at www.phylogenetic-networks.org."
in_NB
phylogenetics
network_data_analysis
evolutionary_biology
cultural_evolution
books:noted
february 2012 by cshalizi
Sun , Wang , Fang : Regularized k-means clustering of high-dimensional data and its asymptotic consistency
february 2012 by cshalizi
"K-means clustering is a widely used tool for cluster analysis due to its conceptual simplicity and computational efficiency. However, its performance can be distorted when clustering high-dimensional data where the number of variables becomes relatively large and many of them may contain no information about the clustering structure. This article proposes a high-dimensional cluster analysis method via regularized k-means clustering, which can simultaneously cluster similar observations and eliminate redundant variables. The key idea is to formulate the k-means clustering in a form of regularization, with an adaptive group lasso penalty term on cluster centers. In order to optimally balance the trade-off between the clustering model fitting and sparsity, a selection criterion based on clustering stability is developed. The asymptotic estimation and selection consistency of the regularized k-means clustering with diverging dimension is established. The effectiveness of the regularized k-means clustering is also demonstrated through a variety of numerical experiments as well as applications to two gene microarray examples. The regularized clustering framework can also be extended to the general model-based clustering."
in_NB
clustering
statistics
lasso
data_mining
to_teach:data-mining
february 2012 by cshalizi
A General Framework for Dimensionality-Reducing Data Visualization Mapping
february 2012 by cshalizi
"In recent years, a wealth of dimension-reduction techniques for data visualization and preprocessing has been established. Nonparametric methods require additional effort for out-of-sample extensions, because they provide only a mapping of a given finite set of points. In this letter, we propose a general view on nonparametric dimension reduction based on the concept of cost functions and properties of the data. Based on this general principle, we transfer nonparametric dimension reduction to explicit mappings of the data manifold such that direct out-of-sample extensions become possible. Furthermore, this concept offers the possibility of investigating the generalization ability of data visualization to new data points. We demonstrate the approach based on a simple global linear mapping, as well as prototype-based local linear mappings. In addition, we can bias the functional form according to given auxiliary information. This leads to explicit supervised visualization mappings with discriminative properties comparable to state-of-the-art approaches."
in_NB
dimension_reduction
visual_display_of_quantitative_information
data_analysis
data_mining
manifold_learning
to_teach:data-mining
february 2012 by cshalizi
Generalized additive models for location, scale and shape for high dimensional data—a flexible approach based on boosting - Mayr - 2012 - Journal of the Royal Statistical Society: Series C (Applied Statistics) - Wiley Online Library
february 2012 by cshalizi
"Generalized additive models for location, scale and shape (GAMLSSs) are a popular semiparametric modelling approach that, in contrast with conventional generalized additive models, regress not only the expected mean but also every distribution parameter (e.g. location, scale and shape) to a set of covariates. Current fitting procedures for GAMLSSs are infeasible for high dimensional data set-ups and require variable selection based on (potentially problematic) information criteria. The present work describes a boosting algorithm for high dimensional GAMLSSs that was developed to overcome these limitations. Specifically, the new algorithm was designed to allow the simultaneous estimation of predictor effects and variable selection. The algorithm proposed was applied to Munich rental guide data, which are used by landlords and tenants as a reference for the average rent of a flat depending on its characteristics and spatial features. The net rent predictions that resulted from the high dimensional GAMLSSs were found to be highly competitive and covariate-specific prediction intervals showed a major improvement over classical generalized additive models."
in_NB
statistics
regression
additive_models
boosting
february 2012 by cshalizi
[1201.6307] Statistical convergence of Markov experiments to diffusion limits
february 2012 by cshalizi
"Assume that one observes the $k$-th, $2k$-th, ...., $nk$-th value of a Markov chain $X_{1,h},...,X_{nk,h}$. That means we assume that a high frequency Markov chain runs in the background on a very fine time grid but that it is only observed on a coarser grid. This asymptotics reflects a set up occurring in the high frequency statistical analysis for financial data where diffusion approximations are used only for coarser time scales. In this paper we show that under appropriate conditions the L$_1$-distance between the joint distribution of the Markov chain and the distribution of the discretized diffusion limit converges to zero. The result implies that the LeCam deficiency distance between the statistical Markov experiment and its diffusion limit converges to zero. This result can be applied to Euler approximations for the joint distribution of diffusions observed at points $Delta, 2 Delta, ,,,, nDelta$. The joint distribution can be approximated by generating Euler approximations at the points $Delta k^{-1}, 2 Delta k^{-1}, ,,,, nDelta$. Our result implies that under our regularity conditions the Euler approximation is consistent for $n to infty$ if $nk^{-2}to 0$."
in_NB
convergence_of_stochastic_processes
markov_models
stochastic_processes
stochastic_differential_equations
re:almost_none
february 2012 by cshalizi
[1201.6211] On the range of validity of the autoregressive sieve bootstrap
february 2012 by cshalizi
"We explore the limits of the autoregressive (AR) sieve bootstrap, and show that its applicability extends well beyond the realm of linear time series as has been previously thought. In particular, for appropriate statistics, the AR-sieve bootstrap is valid for stationary processes possessing a general Wold-type autoregressive representation with respect to a white noise; in essence, this includes all stationary, purely nondeterministic processes, whose spectral density is everywhere positive. Our main theorem provides a simple and effective tool in assessing whether the AR-sieve bootstrap is asymptotically valid in any given situation. In effect, the large-sample distribution of the statistic in question must only depend on the first and second order moments of the process; prominent examples include the sample mean and the spectral density. As a counterexample, we show how the AR-sieve bootstrap is not always valid for the sample autocovariance even when the underlying process is linear."
in_NB
bootstrap
time_series
statistics
stochastic_processes
february 2012 by cshalizi
[1201.5913] A Component-wise EM Algorithm for Mixtures
february 2012 by cshalizi
"In some situations, EM algorithm shows slow convergence problems. One possible reason is that standard procedures update the parameters simultaneously. In this paper we focus on finite mixture estimation. In this framework, we propose a component-wise EM, which updates the parameters sequentially. We give an interpretation of this procedure as a proximal point algorithm and use it to prove the convergence. Illustrative numerical experiments show how our algorithm compares to EM and a version of the SAGE algorithm."
Huh, is this related to the way that updating partial response functions simultaneously in a GAM can be trouble, and it's better, as in standard backfitting, to update sequentially?
in_NB
statistics
em_algorithm
mixture_models
Huh, is this related to the way that updating partial response functions simultaneously in a GAM can be trouble, and it's better, as in standard backfitting, to update sequentially?
february 2012 by cshalizi
An Alternative Asymptotic Analysis of Residual-Based Statistics
february 2012 by cshalizi
"This paper presents an alternative method to derive the limiting distribution of residual-based statistics. Our method does not impose an explicit assumption of (asymptotic) smoothness of the statistic of interest with respect to the model's parameters and thus is especially useful in cases where such smoothness is difficult to establish. Instead, we use a locally uniform convergence in distribution condition, which is automatically satisfied by residual-based specification test statistics. To illustrate, we derive the limiting distribution of a new functional form specification test for discrete choice models, as well as a runs-based tests for conditional symmetry in dynamic volatility models." (To-teach tag is tentative.)
in_NB
statistics
regression
model-checking
to_teach:undergrad-ADA
february 2012 by cshalizi
The Asymmetric Business Cycle
february 2012 by cshalizi
"The business cycle is a fundamental yet elusive concept in macroeconomics. In this paper, we consider the problem of measuring the business cycle. First, we argue for the output-gap view that the business cycle corresponds to transitory deviations in economic activity away from a permanent, or trend, level. Then we investigate the extent to which a general model-based approach to estimating trend and cycle for the U.S. economy leads to measures of the business cycle that reflect models versus the data. We find empirical support for a nonlinear time series model that produces a business cycle measure with an asymmetric shape across NBER expansion and recession phases. Specifically, this business cycle measure suggests that recessions are periods of relatively large and negative transitory fluctuations in output. However, several close competitors to the nonlinear model produce business cycle measures of widely differing shapes and magnitudes. Given this model-based uncertainty, we construct a model-averaged measure of the business cycle. This measure also displays an asymmetric shape and is closely related to other measures of economic slack such as the unemployment rate and capacity utilization."
--- Worthy, but at the same time makes me want to lock them in a room with a copy of Li and Racine's _Nonparametric Econometrics_, or even _The Elements of Statistical Learning_, and not let them out until they understand it.
in_NB
time_series
statistics
economics
macroeconomics
inference_to_latent_objects
re:your_favorite_dsge_sucks
morley.james
have_read
ensemble_methods
model_selection
--- Worthy, but at the same time makes me want to lock them in a room with a copy of Li and Racine's _Nonparametric Econometrics_, or even _The Elements of Statistical Learning_, and not let them out until they understand it.
february 2012 by cshalizi
On a New Method of Graduation
january 2012 by cshalizi
Whittaker introduces spline smoothing in 1922, complete with the Bayesian derivation. Does not use the word "spline", however --- when did that come in?
in_NB
to_teach:undergrad-ADA
splines
smoothing
regression
statistics
have_read
january 2012 by cshalizi
related tags
1001_nights ⊕ additive_models ⊕ afghanistan ⊕ agent-based_models ⊕ ahmed.amr ⊕ ai ⊕ albers.dave ⊕ algebra ⊕ algebraic_geometry ⊕ algorithmic_information_theory ⊕ allende.salvador ⊕ amaral.luis ⊕ american_history ⊕ ancient_history ⊕ animals ⊕ animal_cognition ⊕ animal_psychology ⊕ antarctica ⊕ anthropology ⊕ apocalypticism ⊕ approximation ⊕ approximation_algorithms ⊕ arbitrage ⊕ arlot.sylvain ⊕ armchair_travel ⊕ art_history ⊕ astronomy ⊕ atay.fatihcan ⊕ attention ⊕ automata_theory ⊕ automated_diagnosis ⊕ ay.nihat ⊕ bad_data_analysis ⊕ barely-comprehensible_metaphysics ⊕ bartlett.m.s. ⊕ bayesianism ⊕ bayesian_consistency ⊕ beer.stafford ⊕ behavioral_ecology ⊕ behavioral_genetics ⊕ bergstrom.carl ⊕ bernstein-von_mises ⊕ bialek.william ⊕ bibliometry ⊕ bickel.david ⊕ bickel.peter ⊕ binder.p.-m. ⊕ biochemical_networks ⊕ biological_computation ⊕ birds ⊕ blei.david ⊕ blogging ⊕ books:noted ⊕ books:recommended ⊕ book_reviews ⊕ boosting ⊕ boots.byron ⊕ bootstrap ⊕ bounded_rationality ⊕ branching_processes ⊕ bubeck.sebastien ⊕ buddhism ⊕ buhlmann.peter ⊕ cai.t._tony ⊕ calibration ⊕ capitalism ⊕ categorization ⊕ catoni.olivier ⊕ causality ⊕ causal_inference ⊕ celisse.alain ⊕ cellular_automata ⊕ central_asia ⊕ central_limit_theorem ⊕ cesa-bianchi.nicolo ⊕ chains_with_complete_connections ⊕ change-point_problem ⊕ change_of_representation ⊕ chatterjee.souav ⊕ chile ⊕ china ⊕ chinese_civilization ⊕ chow-liu_algorithm ⊕ chow-liu_trees ⊕ citation_networks ⊕ classifiers ⊕ clermont.gilles ⊕ clustering ⊕ coarse-graining ⊕ cognitive_science ⊕ coleman.todd ⊕ collective_action ⊕ collective_cognition ⊕ community_discovery ⊕ comparative_history ⊕ complexity ⊕ complexity_measures ⊕ computational_statistics ⊕ concentration_of_measure ⊕ conditional_random_fields ⊕ confidence_sets ⊕ confounding ⊕ contagion ⊕ control_theory ⊕ convergence_of_stochastic_processes ⊕ convexity ⊕ cool_if_true ⊕ copulas ⊕ corporations ⊕ cosmology ⊕ coupling ⊕ covariate_shift ⊕ coveted ⊕ crespi.valentino ⊕ cross-validation ⊕ cultural_appropriation ⊕ cultural_criticism ⊕ cultural_differences ⊕ cultural_evolution ⊕ cultural_exchange ⊕ cultural_transmission ⊕ cultural_universals ⊕ curse_of_dimensionality ⊕ curse_of_dimensonality ⊕ cybenko.george ⊕ cybernetics ⊕ data_analysis ⊕ data_mining ⊕ debowski.lukasz ⊕ debunking ⊕ decision-making ⊕ decision_theory ⊕ del_moral.pierre ⊕ democracy ⊕ density_estimation ⊕ design_for_a_brain ⊕ developmental_biology ⊕ development_economics ⊕ deviation_bounds ⊕ de_haan.laurens ⊕ diaconis.persi ⊕ diffusion_of_innovations ⊕ dimension_estimation ⊕ dimension_reduction ⊕ directed_information ⊕ domingos.pedro ⊕ drees.holger ⊕ dsges ⊕ dynamical_systems ⊕ earthquakes ⊕ ecology ⊕ econometrics ⊕ economics ⊕ education ⊕ empirical_processes ⊕ em_algorithm ⊕ ensemble_methods ⊕ entropy ⊕ entropy_estimation ⊕ entropy_rates ⊕ epidemic_models ⊕ ergodic_theory ⊕ error-in-variables ⊕ estimation ⊕ ethics ⊕ evisceration ⊕ evo-devo ⊕ evolution ⊕ evolutionary_biology ⊕ evolutionary_psychology ⊕ evolution_of_cooperation ⊕ expectation-maximization ⊕ experimental_biology ⊕ experimental_economics ⊕ experimental_psychology ⊕ experimental_sociology ⊕ explanation ⊕ explanation_by_mechanisms ⊕ exploration-exploitation ⊕ exponential_convergence_of_empirical_probabilities ⊕ exponential_families ⊕ exponential_family_random_graphs ⊕ extreme_value_theory ⊕ factor_analysis ⊕ feedback ⊕ filtering ⊕ finance ⊕ financial_markets ⊕ financial_speculation ⊕ fisher.r.a. ⊕ fisher_information ⊕ flocks_and_swarms ⊕ fmri ⊕ foundations_of_statistics ⊕ fox.emily ⊕ fractals ⊕ fraser.d.a.s. ⊕ freund.yoav ⊕ friday_cat_blogging ⊕ functional_connectivity ⊕ galstyan.aram ⊕ galves.antonio ⊕ game_theory ⊕ gaussian_processes ⊕ genetics ⊕ gene_expression_data_analysis ⊕ gene_regulation ⊕ genomics ⊕ genovese.christopher ⊕ geology ⊕ getoor.lise ⊕ geyer.charles ⊕ gibbs_distributions ⊕ gives_economists_a_bad_name ⊕ goernerup.olof ⊕ goodness-of-fit ⊕ gordon.geoffrey_j. ⊕ grammar_induction ⊕ graphical_models ⊕ graph_limits ⊕ graph_theory ⊕ grassberger.peter ⊕ green.peter_j. ⊕ guestrin.carlos ⊕ haavelmo.trygve ⊕ habit ⊕ halpern.joseph_y. ⊕ handcock.mark ⊕ hansen.bruce ⊕ hart.jeffrey ⊕ have_read ⊕ heard_the_talk ⊕ heavy_tails ⊕ heritability ⊕ heteroskedasticity ⊕ hierarchical_models ⊕ high-dimensional_probability ⊕ hilbert_space ⊕ hill.claire ⊕ historical_myths ⊕ history_of_economics ⊕ history_of_ideas ⊕ history_of_mathematics ⊕ history_of_medicine ⊕ history_of_religion ⊕ history_of_science ⊕ history_of_statistics ⊕ history_of_technology ⊕ hjort.nils_lid ⊕ hofling.holger ⊕ hofman.jake ⊕ hofmann.thomas ⊕ homophily ⊕ hooker.giles ⊕ hsu.daniel ⊕ human_evolution ⊕ human_genetics ⊕ hydrodynamic_limits ⊕ hypothesis_testing ⊕ identifiability ⊕ implicit_learning ⊕ incest ⊕ independence_testing ⊕ individual_sequence_prediction ⊕ indonesia ⊕ inference_to_latent_objects ⊕ influence ⊕ information_criteria ⊕ information_geometry ⊕ information_retrieval ⊕ information_theory ⊕ innovation ⊕ institutions ⊕ instrumental_variables ⊕ interacting_particle_systems ⊕ interface_design ⊕ interlocking_directorates ⊕ inverse_problems ⊕ in_NB ⊕ ising_model ⊕ islam ⊕ islamic_civilization ⊕ i_told_you_so ⊕ jaeger.herbert ⊕ janson.svante ⊕ jordan.michael_i. ⊕ k-means ⊕ kakade.sham ⊕ kalisch.markus ⊕ kantz.holger ⊕ kass.rob ⊕ kernel_estimators ⊕ kernel_methods ⊕ kifer.yuri ⊕ kinase ⊕ kirshner.sergey ⊕ kith_and_kin ⊕ kolaczyk.eric ⊕ kolar.mladen ⊕ kontorovich.aryeh ⊕ kontoyiannis.ioannis ⊕ kossinets.gueorgi ⊕ kotelentz.peter_m. ⊕ krivitsky.pavel ⊕ kronecker_graphs ⊕ kurtz.thomas ⊕ kushans ⊕ lafferty.john ⊕ large_deviations ⊕ lasso ⊕ latent_semantic_analysis ⊕ latent_variables ⊕ law_of_the_iterated_logarithm ⊕ learning_theory ⊕ lehmann.erich ⊕ leonardi.florencia ⊕ lerman.kristina ⊕ levina.elizaveta ⊕ levina.liza ⊕ levy_processes ⊕ likelihood ⊕ limit_order_books ⊕ limit_theorems ⊕ linearization_via_hilbert_space ⊕ linear_algebra ⊕ linear_regression ⊕ linguistics ⊕ literary_criticism ⊕ liu.han ⊕ lives_of_the_scientists ⊕ logical_positivism ⊕ logistic_regression ⊕ long-range_dependence ⊕ low-regret-learning ⊕ low-regret_learning ⊕ lyapunov_exponents ⊕ machine_learning ⊕ macroeconomics ⊕ macro_from_micro ⊕ major_transitions_of_evolution ⊕ manifold_learning ⊕ marketing ⊕ markov_models ⊕ martin.john_levi ⊕ martingales ⊕ mason.winter ⊕ mathematical_biology ⊕ mathematics ⊕ matthew_effect ⊕ maya_civilization ⊕ mccallum.andrew ⊕ mean-field_theory ⊕ media_criticism ⊕ medieval_eurasian_history ⊕ meila.marina ⊕ methodology ⊕ meyn.sean_p. ⊕ millenarianism ⊕ minimax ⊕ misspecification ⊕ mixing ⊕ mixture_models ⊕ model-checking ⊕ modeling ⊕ model_averaging ⊕ model_discovery ⊕ model_selection ⊕ moderate_deviations ⊕ modularity ⊕ mongol_empire ⊕ monte_carlo ⊕ morley.james ⊕ moulines.eric ⊕ mountains ⊕ multiple_comparisons ⊕ multiple_testing ⊕ mythology ⊕ narrative ⊕ natural_language_processing ⊕ nelson.richard_r. ⊕ neo-conservatism ⊕ networked_life ⊕ networks ⊕ network_data_analysis ⊕ network_sampling ⊕ neural_coding_and_decoding ⊕ neural_control_of_motion ⊕ neural_data_analysis ⊕ neural_networks ⊕ neuropsychology ⊕ neuroscience ⊕ neville.jennifer ⊕ newman.mark ⊕ neyman.jerzy ⊕ nilsson_jacobi.martin ⊕ noel.hans ⊕ non-equilibrium ⊕ non-stationarity ⊕ nonparametrics ⊕ number_theory ⊕ nyhan.brendan ⊕ online_learning ⊕ ontologies ⊕ op-amps ⊕ optimization ⊕ pakistan ⊕ paleontology ⊕ paninski.liam ⊕ parenting ⊕ particle_filters ⊕ path_dependence ⊕ path_integrals ⊕ perception ⊕ phase_transitions ⊕ philosophy_of_mind ⊕ philosophy_of_science ⊕ phylogenetics ⊕ poincare_recurrence ⊕ point_processes ⊕ political_economy ⊕ pollard.david ⊕ polya.george ⊕ porter.mason ⊕ pr0n ⊕ practices_relating_to_the_transmission_of_genetic_information ⊕ prediction ⊕ predictive_state_representations ⊕ preferences ⊕ probability ⊕ propagation_of_error ⊕ psychoceramics ⊕ race ⊕ radev.dragomir ⊕ raginsky.maxim ⊕ randal.douc ⊕ randomization ⊕ random_fields ⊕ random_graphs ⊕ random_matrix_theory ⊕ random_projections ⊕ rare_event_simulation ⊕ rationality ⊕ re:aggregating_random_graphs ⊕ re:almost_none ⊕ re:AoS_project ⊕ re:bayes_as_evol ⊕ re:democratic_cognition ⊕ re:do-institutions-evolve ⊕ re:functional_communities ⊕ re:growing_ensemble_project ⊕ re:g_paper ⊕ re:homophily_and_confounding ⊕ re:knightian_uncertainty ⊕ re:network_differences ⊕ re:network_model_selection ⊕ re:phil-of-bayes_paper ⊕ re:smoothing_adjacency_matrices ⊕ re:social-networks-as-sensor-networks ⊕ re:spike_train_complexity ⊕ re:sporns_review ⊕ re:stacs ⊕ re:what_is_a_macrostate ⊕ re:XV_for_mixing ⊕ re:XV_for_networks ⊕ re:your_favorite_dsge_sucks ⊕ re:your_favorite_ergm_sucks ⊕ recurrence_times ⊕ regression ⊕ reinforcement_learning ⊕ relational_learning ⊕ renormalization ⊕ renyi_entropy ⊕ richardson.thomas_s. ⊕ rigollet.philippe ⊕ ripley.brian ⊕ robins.james ⊕ robustness ⊕ roman_empire ⊕ rosvall.martin ⊕ roth.camille ⊕ rubin.jonathan ⊕ running_dogs_of_reaction ⊕ sampling ⊕ sarkar.purnamrita ⊕ savage.leonard_j. ⊕ science_as_a_social_process ⊕ science_in_society ⊕ scientific_revolution ⊕ self-organization ⊕ self-promotion ⊕ self-similarity ⊕ separation_of_time-scales ⊕ sex_differences ⊕ shrinkage ⊕ siddiqi.sajid_m. ⊕ signal_processing ⊕ signal_transduction ⊕ silk_road ⊕ simulation ⊕ slime_molds ⊕ smoothing ⊕ socialism ⊕ social_influence ⊕ social_learning ⊕ social_life_of_the_mind ⊕ social_media ⊕ social_networks ⊕ social_psychology ⊕ social_science_methodology ⊕ sociology ⊕ sociology_of_science ⊕ song.le ⊕ sornette.didier ⊕ sparsity ⊕ spatial_statistics ⊕ spectral_clustering ⊕ spectral_estimation ⊕ spectral_gap ⊕ splines ⊕ stability_of_learning ⊕ state-space_models ⊕ state_estimation ⊕ stationarity ⊕ statistical_inference_for_stochastic_processes ⊕ statistical_mechanics ⊕ statistics ⊕ steins_method ⊕ stochastic_differential_equations ⊕ stochastic_processes ⊕ stochastic_volatility ⊕ structural_equations ⊕ sufficiency ⊕ surrogate_data ⊕ sutton.charles ⊕ symbolic_dynamics ⊕ tagging ⊕ text_mining ⊕ theoretical_computer_science ⊕ the_american_dilemma ⊕ the_continuing_crises ⊕ tibshirani.robert ⊕ time_series ⊕ to:blog ⊕ topic_models ⊕ touchette.hugo ⊕ to_be_shot_after_a_fair_trial ⊕ to_read ⊕ to_teach:advanced-stochastic-processes ⊕ to_teach:complexity-and-inference ⊕ to_teach:data-mining ⊕ to_teach:financial-time-series ⊕ to_teach:undergrad-ADA ⊕ transaction_networks ⊕ two-sample_tests ⊕ universal_prediction ⊕ unsupervised_learning ⊕ us_politics ⊕ van_der_vaart.aad ⊕ van_de_geer.sara ⊕ van_handel.ramon ⊕ variable-length_markov_models ⊕ variable_selection ⊕ via:? ⊕ via:ale ⊕ via:ariddell ⊕ via:bookslut ⊕ via:ded-maxim ⊕ via:dsparks ⊕ via:gabriel_rossman ⊕ via:gelman ⊕ via:georg ⊕ via:henry_farrell ⊕ via:kathryn ⊕ via:mind-hacks ⊕ via:shivak ⊕ via:slaniel ⊕ via:timothy-burke ⊕ viral_marketing ⊕ vision ⊕ visual_display_of_quantitative_information ⊕ von_mises.richard ⊕ vu.vincent ⊕ waiting_times ⊕ wald.abraham ⊕ wasserman.larry ⊕ watts.duncan ⊕ weak_dependence ⊕ web ⊕ wiggins.chris ⊕ willett.rebecca ⊕ world_history ⊕ xing.eric ⊕ yarkoni.tal ⊕ yee.danny ⊕ yu.bin ⊕ zanette.damien ⊕ zebrafish ⊕ zenker.sven ⊕ zhang.tong ⊕ zhu.ji ⊕ zoroastrianism ⊕Copy this bookmark: