cshalizi + re:stacs   53

Constructing summary statistics for approximate Bayesian computation: semi-automatic approximate Bayesian computation - Fearnhead - 2012 - Journal of the Royal Statistical Society: Series B (Statistical Methodology) - Wiley Online Library
"Many modern statistical applications involve inference for complex stochastic models, where it is easy to simulate from the models, but impossible to calculate likelihoods. Approximate Bayesian computation (ABC) is a method of inference for such models. It replaces calculation of the likelihood by a step which involves simulating artificial data for different parameter values, and comparing summary statistics of the simulated data with summary statistics of the observed data. Here we show how to construct appropriate summary statistics for ABC in a semi-automatic manner. We aim for summary statistics which will enable inference about certain parameters of interest to be as accurate as possible. Theoretical results show that optimal summary statistics are the posterior means of the parameters. Although these cannot be calculated analytically, we use an extra stage of simulation to estimate how the posterior means vary as a function of the data; and we then use these estimates of our summary statistics within ABC. Empirical results show that our approach is a robust method for choosing summary statistics that can result in substantially more accurate ABC analyses than the ad hoc choices of summary statistics that have been proposed in the literature. We also demonstrate advantages over two alternative methods of simulation-based inference."
to:NB  indirect_inference  estimation  statistics  approximate_bayesian_computation  computational_statistics  to_teach:complexity-and-inference  re:stacs 
13 days ago by cshalizi
[1204.1360] Particle filtering in high-dimensional chaotic systems
"We present an efficient particle filtering algorithm for multiscale systems, that is adapted for simple atmospheric dynamics models which are inherently chaotic. Particle filters represent the posterior conditional distribution of the state variables by a collection of particles, which evolves and adapts recursively as new information becomes available. The difference between the estimated state and the true state of the system constitutes the error in specifying or forecasting the state, which is amplified in chaotic systems that have a number of positive Lyapunov exponents. The purpose of the present paper is to show that the homogenization method developed in Imkeller et al. (2011), which is applicable to high dimensional multi-scale filtering problems, along with important sampling and control methods can be used as a basic and flexible tool for the construction of the proposal density inherent in particle filtering. Finally, we apply the general homogenized particle filtering algorithm developed here to the Lorenz'96 atmospheric model that mimics mid-latitude atmospheric dynamics with microscopic convective processes."
to:NB  particle_filters  chaos  dynamical_systems  state-space_models  state_estimation  re:stacs 
6 weeks ago by cshalizi
[0802.4363] Estimating the entropy of binary time series: Methodology, some theory and a simulation study
"Partly motivated by entropy-estimation problems in neuroscience, we present a detailed and extensive comparison between some of the most popular and effective entropy estimation methods used in practice: The plug-in method, four different estimators based on the Lempel-Ziv (LZ) family of data compression algorithms, an estimator based on the Context-Tree Weighting (CTW) method, and the renewal entropy estimator.
"**Methodology. Three new entropy estimators are introduced. For two of the four LZ-based estimators, a bootstrap procedure is described for evaluating their standard error, and a practical rule of thumb is heuristically derived for selecting the values of their parameters. ** Theory. We prove that, unlike their earlier versions, the two new LZ-based estimators are consistent for every finite-valued, stationary and ergodic process. An effective method is derived for the accurate approximation of the entropy rate of a finite-state HMM with known distribution. Heuristic calculations are presented and approximate formulas are derived for evaluating the bias and the standard error of each estimator. ** Simulation. All estimators are applied to a wide range of data generated by numerous different processes with varying degrees of dependence and memory. Some conclusions drawn from these experiments include: (i) For all estimators considered, the main source of error is the bias. (ii) The CTW method is repeatedly and consistently seen to provide the most accurate results. (iii) The performance of the LZ-based estimators is often comparable to that of the plug-in method. (iv) The main drawback of the plug-in method is its computational inefficiency."
in_NB  to_read  entropy_estimation  information_theory  time_series  statistics  kontoyiannis.ioannis  re:stacs 
6 weeks ago by cshalizi
[1203.5351] Activity driven modeling of dynamic networks
"Network modeling plays a critical role in identifying statistical regularities and structural principles common to many systems. The large majority of recent modeling approaches are connectivity driven, in the sense that the structural pattern of the network is at the basis of the mechanisms ruling the network formation. Connectivity driven models necessarily provide a time-aggregated representation that may fail to describe the instantaneous and fluctuating dynamics of many networks. We address this challenge by defining the activity potential, a time invariant function characterizing the agents' interactions in real-world networks and constructing an activity driven model capable of encoding the instantaneous time description of the network dynamics. The model provides an explanation of structural features such as the presence of hubs, which simply originate from the heterogeneous activity of agents. Additionally, we find that diffusive processes in highly dynamical networks can be described analytically in terms of the activity potential, allowing a quantitative discussion of the biases induced by the time-aggregated network representation in the analysis of dynamical processes in evolving networks."
to:NB  network_data_analysis  networks  stochastic_processes  markov_models  transaction_networks  to_read  re:stacs 
8 weeks ago by cshalizi
Kaiser , Lahiri , Nordman : Goodness of fit tests for a class of Markov random field models
"This paper develops goodness of fit statistics that can be used to formally assess Markov random field models for spatial data, when the model distributions are discrete or continuous and potentially parametric. Test statistics are formed from generalized spatial residuals which are collected over groups of nonneighboring spatial observations, called concliques. Under a hypothesized Markov model structure, spatial residuals within each conclique are shown to be independent and identically distributed as uniform variables. The information from a series of concliques can be then pooled into goodness of fit statistics. Under some conditions, large sample distributions of these statistics are explicitly derived for testing both simple and composite hypotheses, where the latter involves additional parametric estimation steps. The distributional results are verified through simulation, and a data example illustrates the method for model assessment."
to:NB  to_read  statistics  spatial_statistics  random_fields  goodness-of-fit  hypothesis_testing  re:stacs  markov_models 
10 weeks ago by cshalizi
[0803.2963] Consistency of cross validation for comparing regression procedures
"Theoretical developments on cross validation (CV) have mainly focused on selecting one among a list of finite-dimensional models (e.g., subset or order selection in linear regression) or selecting a smoothing parameter (e.g., bandwidth for kernel smoothing). However, little is known about consistency of cross validation when applied to compare between parametric and nonparametric methods or within nonparametric methods. We show that under some conditions, with an appropriate choice of data splitting ratio, cross validation is consistent in the sense of selecting the better procedure with probability approaching 1. Our results reveal interesting behavior of cross validation. When comparing two models (procedures) converging at the same nonparametric rate, in contrast to the parametric case, it turns out that the proportion of data used for evaluation in CV does not need to be dominating in size. Furthermore, it can even be of a smaller order than the proportion for estimation while not affecting the consistency property."
to:NB  statistics  to_read  cross-validation  model_selection  nonparametrics  to_teach:undergrad-ADA  re:stacs 
11 weeks ago by cshalizi
Periodic stripe formation by a Turing mechanism operating at growth zones in the mammalian palate : Nature Genetics : Nature Publishing Group
"We present direct evidence of an activator-inhibitor system in the generation of the regularly spaced transverse ridges of the palate. We show that new ridges, called rugae, that are marked by stripes of expression of Shh (encoding Sonic hedgehog), appear at two growth zones where the space between previously laid rugae increases. However, inter-rugal growth is not absolutely required: new stripes of Shh expression still appeared when growth was inhibited. Furthermore, when a ruga was excised, new Shh expression appeared not at the cut edge but as bifurcating stripes branching from the neighboring stripe of Shh expression, diagnostic of a Turing-type reaction-diffusion mechanism. Genetic and inhibitor experiments identified fibroblast growth factor (FGF) and Shh as components of an activator-inhibitor pair in this system. These findings demonstrate a reaction-diffusion mechanism that is likely to be widely relevant in vertebrate development."
to_read  to:NB  pattern_formation  biology  morphogenesis  reaction-diffusion  turing_mechanism  via:aks  to_teach:complexity-and-inference  re:stacs  experimental_biology  to:blog 
12 weeks ago by cshalizi
Phys. Rev. E 84, 041120 (2011): Building macroscale models from microscale probabilistic models: A general probabilistic approach for nonlinear diffusion and multispecies phenomena
"A discrete agent-based model on a periodic lattice of arbitrary dimension is considered. Agents move to nearest-neighbor sites by a motility mechanism accounting for general interactions, which may include volume exclusion. The partial differential equation describing the average occupancy of the agent population is derived systematically. A diffusion equation arises for all types of interactions and is nonlinear except for the simplest interactions. In addition, multiple species of interacting subpopulations give rise to an advection-diffusion equation for each subpopulation. This work extends and generalizes previous specific results, providing a construction method for determining the transport coefficients in terms of a single conditional transition probability, which depends on the occupancy of sites in an influence region. These coefficients characterize the diffusion of agents in a crowded environment in biological and physical processes."
to:NB  macro_from_micro  agent-based_models  interacting_particle_systems  statistical_mechanics  stochastic_processes  re:stacs 
october 2011 by cshalizi
Phys. Rev. E 84, 016223 (2011): Optimal reconstruction of dynamical systems: A noise amplification approach
"In this work we propose an objective function to guide the search for a state space reconstruction of a dynamical system from a time series of measurements. These statistics can be evaluated on any reconstructed attractor, thereby allowing a direct comparison among different approaches: (uniform or nonuniform) delay vectors, PCA, Legendre coordinates, etc. It can also be used to select the most appropriate parameters of a reconstruction strategy. In the case of delay coordinates this translates into finding the optimal delay time and embedding dimension from the absolute minimum of the advocated cost function. Its definition is based on theoretical arguments on noise amplification, the complexity of the reconstructed attractor, and a direct measure of local stretch which constitutes an irrelevance measure. The proposed method is demonstrated on synthetic and experimental time series."
attractor_reconstruction  dynamical_systems  statistics  to:NB  re:stacs  to_teach:complexity-and-inference  to_read 
july 2011 by cshalizi
[1102.1182] Phase transition in the detection of modules in sparse networks
"We present an asymptotically exact analysis of the problem of detecting communities in sparse random networks. Our results are also applicable to detection of functional modules, partitions, and colorings in noisy planted models. Using a cavity method analysis, we unveil a phase transition from a region where the original group assignment is undetectable to one where detection is possible. In some cases, the detectable region splits into an algorithmically hard region and an easy one. Our approach naturally translates into a practical algorithm for detecting modules in sparse networks, and learning the parameters of the underlying model." --- This is really an EM algorithm, not a Bayesian method.
community_discovery  have_read  kith_and_kin  phase_transitions  network_data_analysis  moore.cris  belief_propagation  re:stacs  to_teach:complexity-and-inference 
february 2011 by cshalizi
Phys. Rev. E 79, 026201 (2009): Complexity measures from interaction structures
"We evaluate information-theoretic quantities that quantify complexity in terms of kth-order statistical dependences that cannot be reduced to interactions among k−1 random variables. Using symbolic dynamics of coupled maps and cellular automata as model systems, we demonstrate that these measures are able to identify complex dynamical regimes."
complexity_measures  information_theory  kith_and_kin  ay.nihat  jost.jurgen  to_read  re:stacs 
september 2010 by cshalizi
[1007.3230] Selecting an exponential random graph model for complex brain networks
Shorter authors: What do you know, all that stuff about how to fit ERGMs to social networks totally works for brain networks too. (I mock, but I'd be flabbergasted if it didn't, if only because there is so little _social_ content in the ERGM formalism...) Also: yay model checking!
neural_data_analysis  network_data_analysis  exponential_family_random_graphs  model_selection  statistics  model-checking  to_read  re:functional_communities  re:stacs 
july 2010 by cshalizi
[0811.3988] Dynamic communities in multichannel data: An application to the foreign exchange market during the 2007--2008 credit crisis
I have seen this technique before. :) (Though, on reflection, if you're going to do everything with the correlation matrix [rather than mutual information], why not just take the inverse correlation matrix and identify edges with non-zero entries there?)
community_discovery  financial_markets  re:stacs  financial_crisis_of_2007--  have_read 
april 2009 by cshalizi
The LaTeX Font Catalogue – Garamond
Needs extra packages. But I do like the idea of writing the book in Garamond.
latex  fonts  re:stacs  via:logista 
january 2009 by cshalizi
Memory traces in dynamical systems — PNAS
How much information (in the Fisher sense) does the present state of a recurrent dynamical network retain about the history of its inputs? All, or almost all, done for linear-Gaussian systems, but numerical results for nonlinear, non-Gaussian systems would be straightforward in principle.
memory  dynamical_systems  information_theory  complexity_measures  fisher_information  to:NB  to_teach:complexity-and-inference  re:stacs 
december 2008 by cshalizi

related tags

agent-based_models  amaral.luis  approximate_bayesian_computation  attractor_reconstruction  ay.nihat  belief_propagation  bergstrom.carl  bibliometry  biology  books:noted  bootstrap  caires.s.  causal_inference  cellular_automata  chaos  citation_networks  clauset.aaron  clustering  community_discovery  complexity_measures  computational_statistics  cross-validation  density_estimation  determinism  dimension_estimation  dynamical_systems  ecology  em_algorithm  entropy_estimation  ergodic_theory  estimation  estimation_of_dynamical_systems  evolutionary_biology  experimental_biology  exponential_family_random_graphs  ferreira.j.a.  financial_crisis_of_2007--  financial_markets  fisher_information  flocks_and_swarms  fluctuation-dissipation_relations  fluid_mechanics  fonts  fractals  functional_connectivity  garamond  goodness-of-fit  granger_causality  graph_spectra  have_read  heard_the_talk  hofman.jake  homophily  hypothesis_testing  indirect_inference  information_theory  interacting_particle_systems  in_NB  jost.jurgen  kith_and_kin  kontoyiannis.ioannis  krivitsky.pavel  latent_variables  latex  learning_theory  machine_learning  macro_from_micro  markov_models  memory  minimum_description_length  model-checking  model_selection  moore.cris  morphogenesis  networks  network_data_analysis  neural_data_analysis  neville.jennifer  nilsson_jacobi.martin  non-equilibrium  nonparametrics  particle_filters  pattern_formation  phase_transitions  philosophy_of_science  point_processes  prediction  preferential_attachment  programming  random_fields  random_walks  re:almost_none  re:AoS_project  re:functional_communities  re:homophily_and_confounding  re:network_differences  re:social-networks-as-sensor-networks  re:stacs  re:XV_for_mixing  re:your_favorite_dsge_sucks  reaction-diffusion  regression  rosvall.martin  simulation  simulation-based_inference  social_influence  spatial_statistics  splines  stability_of_learning  state-space_models  state_estimation  statistical_inference_for_stochastic_processes  statistical_mechanics  statistics  stochastic_processes  time_series  to:blog  to:NB  to_read  to_teach:complexity-and-inference  to_teach:undergrad-ADA  transaction_networks  turing_mechanism  via:aks  via:cris_moore  via:dsparks  via:gelman  via:logista  visual_display_of_quantitative_information  watts.duncan 

Copy this bookmark:



description:


tags: