Constructing summary statistics for approximate Bayesian computation: semi-automatic approximate Bayesian computation - Fearnhead - 2012 - Journal of the Royal Statistical Society: Series B (Statistical Methodology) - Wiley Online Library
13 days ago by cshalizi
"Many modern statistical applications involve inference for complex stochastic models, where it is easy to simulate from the models, but impossible to calculate likelihoods. Approximate Bayesian computation (ABC) is a method of inference for such models. It replaces calculation of the likelihood by a step which involves simulating artificial data for different parameter values, and comparing summary statistics of the simulated data with summary statistics of the observed data. Here we show how to construct appropriate summary statistics for ABC in a semi-automatic manner. We aim for summary statistics which will enable inference about certain parameters of interest to be as accurate as possible. Theoretical results show that optimal summary statistics are the posterior means of the parameters. Although these cannot be calculated analytically, we use an extra stage of simulation to estimate how the posterior means vary as a function of the data; and we then use these estimates of our summary statistics within ABC. Empirical results show that our approach is a robust method for choosing summary statistics that can result in substantially more accurate ABC analyses than the ad hoc choices of summary statistics that have been proposed in the literature. We also demonstrate advantages over two alternative methods of simulation-based inference."
to:NB
indirect_inference
estimation
statistics
approximate_bayesian_computation
computational_statistics
to_teach:complexity-and-inference
re:stacs
13 days ago by cshalizi
[1204.1360] Particle filtering in high-dimensional chaotic systems
6 weeks ago by cshalizi
"We present an efficient particle filtering algorithm for multiscale systems, that is adapted for simple atmospheric dynamics models which are inherently chaotic. Particle filters represent the posterior conditional distribution of the state variables by a collection of particles, which evolves and adapts recursively as new information becomes available. The difference between the estimated state and the true state of the system constitutes the error in specifying or forecasting the state, which is amplified in chaotic systems that have a number of positive Lyapunov exponents. The purpose of the present paper is to show that the homogenization method developed in Imkeller et al. (2011), which is applicable to high dimensional multi-scale filtering problems, along with important sampling and control methods can be used as a basic and flexible tool for the construction of the proposal density inherent in particle filtering. Finally, we apply the general homogenized particle filtering algorithm developed here to the Lorenz'96 atmospheric model that mimics mid-latitude atmospheric dynamics with microscopic convective processes."
to:NB
particle_filters
chaos
dynamical_systems
state-space_models
state_estimation
re:stacs
6 weeks ago by cshalizi
[0802.4363] Estimating the entropy of binary time series: Methodology, some theory and a simulation study
6 weeks ago by cshalizi
"Partly motivated by entropy-estimation problems in neuroscience, we present a detailed and extensive comparison between some of the most popular and effective entropy estimation methods used in practice: The plug-in method, four different estimators based on the Lempel-Ziv (LZ) family of data compression algorithms, an estimator based on the Context-Tree Weighting (CTW) method, and the renewal entropy estimator.
"**Methodology. Three new entropy estimators are introduced. For two of the four LZ-based estimators, a bootstrap procedure is described for evaluating their standard error, and a practical rule of thumb is heuristically derived for selecting the values of their parameters. ** Theory. We prove that, unlike their earlier versions, the two new LZ-based estimators are consistent for every finite-valued, stationary and ergodic process. An effective method is derived for the accurate approximation of the entropy rate of a finite-state HMM with known distribution. Heuristic calculations are presented and approximate formulas are derived for evaluating the bias and the standard error of each estimator. ** Simulation. All estimators are applied to a wide range of data generated by numerous different processes with varying degrees of dependence and memory. Some conclusions drawn from these experiments include: (i) For all estimators considered, the main source of error is the bias. (ii) The CTW method is repeatedly and consistently seen to provide the most accurate results. (iii) The performance of the LZ-based estimators is often comparable to that of the plug-in method. (iv) The main drawback of the plug-in method is its computational inefficiency."
in_NB
to_read
entropy_estimation
information_theory
time_series
statistics
kontoyiannis.ioannis
re:stacs
"**Methodology. Three new entropy estimators are introduced. For two of the four LZ-based estimators, a bootstrap procedure is described for evaluating their standard error, and a practical rule of thumb is heuristically derived for selecting the values of their parameters. ** Theory. We prove that, unlike their earlier versions, the two new LZ-based estimators are consistent for every finite-valued, stationary and ergodic process. An effective method is derived for the accurate approximation of the entropy rate of a finite-state HMM with known distribution. Heuristic calculations are presented and approximate formulas are derived for evaluating the bias and the standard error of each estimator. ** Simulation. All estimators are applied to a wide range of data generated by numerous different processes with varying degrees of dependence and memory. Some conclusions drawn from these experiments include: (i) For all estimators considered, the main source of error is the bias. (ii) The CTW method is repeatedly and consistently seen to provide the most accurate results. (iii) The performance of the LZ-based estimators is often comparable to that of the plug-in method. (iv) The main drawback of the plug-in method is its computational inefficiency."
6 weeks ago by cshalizi
[1203.5351] Activity driven modeling of dynamic networks
8 weeks ago by cshalizi
"Network modeling plays a critical role in identifying statistical regularities and structural principles common to many systems. The large majority of recent modeling approaches are connectivity driven, in the sense that the structural pattern of the network is at the basis of the mechanisms ruling the network formation. Connectivity driven models necessarily provide a time-aggregated representation that may fail to describe the instantaneous and fluctuating dynamics of many networks. We address this challenge by defining the activity potential, a time invariant function characterizing the agents' interactions in real-world networks and constructing an activity driven model capable of encoding the instantaneous time description of the network dynamics. The model provides an explanation of structural features such as the presence of hubs, which simply originate from the heterogeneous activity of agents. Additionally, we find that diffusive processes in highly dynamical networks can be described analytically in terms of the activity potential, allowing a quantitative discussion of the biases induced by the time-aggregated network representation in the analysis of dynamical processes in evolving networks."
to:NB
network_data_analysis
networks
stochastic_processes
markov_models
transaction_networks
to_read
re:stacs
8 weeks ago by cshalizi
Kaiser , Lahiri , Nordman : Goodness of fit tests for a class of Markov random field models
10 weeks ago by cshalizi
"This paper develops goodness of fit statistics that can be used to formally assess Markov random field models for spatial data, when the model distributions are discrete or continuous and potentially parametric. Test statistics are formed from generalized spatial residuals which are collected over groups of nonneighboring spatial observations, called concliques. Under a hypothesized Markov model structure, spatial residuals within each conclique are shown to be independent and identically distributed as uniform variables. The information from a series of concliques can be then pooled into goodness of fit statistics. Under some conditions, large sample distributions of these statistics are explicitly derived for testing both simple and composite hypotheses, where the latter involves additional parametric estimation steps. The distributional results are verified through simulation, and a data example illustrates the method for model assessment."
to:NB
to_read
statistics
spatial_statistics
random_fields
goodness-of-fit
hypothesis_testing
re:stacs
markov_models
10 weeks ago by cshalizi
[0803.2963] Consistency of cross validation for comparing regression procedures
11 weeks ago by cshalizi
"Theoretical developments on cross validation (CV) have mainly focused on selecting one among a list of finite-dimensional models (e.g., subset or order selection in linear regression) or selecting a smoothing parameter (e.g., bandwidth for kernel smoothing). However, little is known about consistency of cross validation when applied to compare between parametric and nonparametric methods or within nonparametric methods. We show that under some conditions, with an appropriate choice of data splitting ratio, cross validation is consistent in the sense of selecting the better procedure with probability approaching 1. Our results reveal interesting behavior of cross validation. When comparing two models (procedures) converging at the same nonparametric rate, in contrast to the parametric case, it turns out that the proportion of data used for evaluation in CV does not need to be dominating in size. Furthermore, it can even be of a smaller order than the proportion for estimation while not affecting the consistency property."
to:NB
statistics
to_read
cross-validation
model_selection
nonparametrics
to_teach:undergrad-ADA
re:stacs
11 weeks ago by cshalizi
Periodic stripe formation by a Turing mechanism operating at growth zones in the mammalian palate : Nature Genetics : Nature Publishing Group
12 weeks ago by cshalizi
"We present direct evidence of an activator-inhibitor system in the generation of the regularly spaced transverse ridges of the palate. We show that new ridges, called rugae, that are marked by stripes of expression of Shh (encoding Sonic hedgehog), appear at two growth zones where the space between previously laid rugae increases. However, inter-rugal growth is not absolutely required: new stripes of Shh expression still appeared when growth was inhibited. Furthermore, when a ruga was excised, new Shh expression appeared not at the cut edge but as bifurcating stripes branching from the neighboring stripe of Shh expression, diagnostic of a Turing-type reaction-diffusion mechanism. Genetic and inhibitor experiments identified fibroblast growth factor (FGF) and Shh as components of an activator-inhibitor pair in this system. These findings demonstrate a reaction-diffusion mechanism that is likely to be widely relevant in vertebrate development."
to_read
to:NB
pattern_formation
biology
morphogenesis
reaction-diffusion
turing_mechanism
via:aks
to_teach:complexity-and-inference
re:stacs
experimental_biology
to:blog
12 weeks ago by cshalizi
Improved Predictions of Lynx Trappings Using a Biological Model
january 2012 by cshalizi
Sweet. (Bayesian estimation seems like overkill here however, especially since predictions are just made from point estimates.)
in_NB
have_read
to_teach:undergrad-ADA
to_teach:complexity-and-inference
re:stacs
dynamical_systems
stochastic_processes
statistical_inference_for_stochastic_processes
statistics
time_series
via:gelman
january 2012 by cshalizi
Hive Plots - Linear Layout for Network Visualization - Visually Interpreting Network Structure and Content Made Possible
december 2011 by cshalizi
Examine carefully. God knows hairballs are not very useful. There's apparently an R package.
in_NB
to_read
network_data_analysis
visual_display_of_quantitative_information
via:dsparks
to_teach:complexity-and-inference
re:stacs
december 2011 by cshalizi
Phys. Rev. E 84, 041120 (2011): Building macroscale models from microscale probabilistic models: A general probabilistic approach for nonlinear diffusion and multispecies phenomena
october 2011 by cshalizi
"A discrete agent-based model on a periodic lattice of arbitrary dimension is considered. Agents move to nearest-neighbor sites by a motility mechanism accounting for general interactions, which may include volume exclusion. The partial differential equation describing the average occupancy of the agent population is derived systematically. A diffusion equation arises for all types of interactions and is nonlinear except for the simplest interactions. In addition, multiple species of interacting subpopulations give rise to an advection-diffusion equation for each subpopulation. This work extends and generalizes previous specific results, providing a construction method for determining the transport coefficients in terms of a single conditional transition probability, which depends on the occupancy of sites in an influence region. These coefficients characterize the diffusion of agents in a crowded environment in biological and physical processes."
to:NB
macro_from_micro
agent-based_models
interacting_particle_systems
statistical_mechanics
stochastic_processes
re:stacs
october 2011 by cshalizi
Randomization Tests for Distinguishing Social Influence and Homophily Effects
october 2011 by cshalizi
Assumes all homophilous traits are measured, I believe.
re:homophily_and_confounding
homophily
social_influence
causal_inference
network_data_analysis
have_read
neville.jennifer
in_NB
re:stacs
to_teach:complexity-and-inference
bootstrap
october 2011 by cshalizi
Phys. Rev. E 84, 016223 (2011): Optimal reconstruction of dynamical systems: A noise amplification approach
july 2011 by cshalizi
"In this work we propose an objective function to guide the search for a state space reconstruction of a dynamical system from a time series of measurements. These statistics can be evaluated on any reconstructed attractor, thereby allowing a direct comparison among different approaches: (uniform or nonuniform) delay vectors, PCA, Legendre coordinates, etc. It can also be used to select the most appropriate parameters of a reconstruction strategy. In the case of delay coordinates this translates into finding the optimal delay time and embedding dimension from the absolute minimum of the advocated cost function. Its definition is based on theoretical arguments on noise amplification, the complexity of the reconstructed attractor, and a direct measure of local stretch which constitutes an irrelevance measure. The proposed method is demonstrated on synthetic and experimental time series."
attractor_reconstruction
dynamical_systems
statistics
to:NB
re:stacs
to_teach:complexity-and-inference
to_read
july 2011 by cshalizi
[1102.1182] Phase transition in the detection of modules in sparse networks
february 2011 by cshalizi
"We present an asymptotically exact analysis of the problem of detecting communities in sparse random networks. Our results are also applicable to detection of functional modules, partitions, and colorings in noisy planted models. Using a cavity method analysis, we unveil a phase transition from a region where the original group assignment is undetectable to one where detection is possible. In some cases, the detectable region splits into an algorithmically hard region and an easy one. Our approach naturally translates into a practical algorithm for detecting modules in sparse networks, and learning the parameters of the underlying model." --- This is really an EM algorithm, not a Bayesian method.
community_discovery
have_read
kith_and_kin
phase_transitions
network_data_analysis
moore.cris
belief_propagation
re:stacs
to_teach:complexity-and-inference
february 2011 by cshalizi
Frailty effects in networks: comparison and identification of individual heterogeneity versus preferential attachment in evolving networks - Blasio - 2011 - Journal of the Royal Statistical Society: Series C (Applied Statistics) - Wiley Online Library
network_data_analysis preferential_attachment to_teach:complexity-and-inference re:stacs
january 2011 by cshalizi
network_data_analysis preferential_attachment to_teach:complexity-and-inference re:stacs
january 2011 by cshalizi
Phys. Rev. E 79, 026201 (2009): Complexity measures from interaction structures
september 2010 by cshalizi
"We evaluate information-theoretic quantities that quantify complexity in terms of kth-order statistical dependences that cannot be reduced to interactions among k−1 random variables. Using symbolic dynamics of coupled maps and cellular automata as model systems, we demonstrate that these measures are able to identify complex dynamical regimes."
complexity_measures
information_theory
kith_and_kin
ay.nihat
jost.jurgen
to_read
re:stacs
september 2010 by cshalizi
[1007.3230] Selecting an exponential random graph model for complex brain networks
july 2010 by cshalizi
Shorter authors: What do you know, all that stuff about how to fit ERGMs to social networks totally works for brain networks too. (I mock, but I'd be flabbergasted if it didn't, if only because there is so little _social_ content in the ERGM formalism...) Also: yay model checking!
neural_data_analysis
network_data_analysis
exponential_family_random_graphs
model_selection
statistics
model-checking
to_read
re:functional_communities
re:stacs
july 2010 by cshalizi
Qi, Zhao: Asymptotic efficiency and finite-sample properties of the generalized profiling estimation of parameters in ordinary differential equations
december 2009 by cshalizi
Results on the Hooker/Ramsey method for estimating ODEs.
time_series
estimation_of_dynamical_systems
statistics
estimation
splines
re:stacs
december 2009 by cshalizi
[0910.2034] Strategies for Online Inference of Model-Based Clustering in large Networks
october 2009 by cshalizi
Can't tell what they're actually doing (other than tweaking estimation procedures). Read carefully.
community_discovery
network_data_analysis
em_algorithm
to_read
re:stacs
october 2009 by cshalizi
[0906.0612] Community detection in graphs
june 2009 by cshalizi
Review paper. From a _very_ superficial glance, looks good.
network_data_analysis
community_discovery
to_teach:complexity-and-inference
re:stacs
june 2009 by cshalizi
[0811.3988] Dynamic communities in multichannel data: An application to the foreign exchange market during the 2007--2008 credit crisis
april 2009 by cshalizi
I have seen this technique before. :) (Though, on reflection, if you're going to do everything with the correlation matrix [rather than mutual information], why not just take the inverse correlation matrix and identify edges with non-zero entries there?)
community_discovery
financial_markets
re:stacs
financial_crisis_of_2007--
have_read
april 2009 by cshalizi
The LaTeX Font Catalogue – Garamond
january 2009 by cshalizi
Needs extra packages. But I do like the idea of writing the book in Garamond.
latex
fonts
re:stacs
via:logista
january 2009 by cshalizi
Memory traces in dynamical systems — PNAS
december 2008 by cshalizi
How much information (in the Fisher sense) does the present state of a recurrent dynamical network retain about the history of its inputs? All, or almost all, done for linear-Gaussian systems, but numerical results for nonlinear, non-Gaussian systems would be straightforward in principle.
memory
dynamical_systems
information_theory
complexity_measures
fisher_information
to:NB
to_teach:complexity-and-inference
re:stacs
december 2008 by cshalizi
[0812.1242] Mapping change in large networks
december 2008 by cshalizi
Heard Martin talk about this at SFI last week. Nice, though I think the MDL frame-tale needs some work.
The "alluvial diagrams" are very pretty.
minimum_description_length
rosvall.martin
bergstrom.carl
kith_and_kin
network_data_analysis
have_read
re:network_differences
community_discovery
visual_display_of_quantitative_information
bootstrap
statistics
clustering
hypothesis_testing
re:stacs
to_teach:complexity-and-inference
citation_networks
bibliometry
The "alluvial diagrams" are very pretty.
december 2008 by cshalizi
Statistical Analysis of Complex Systems Models - Cambridge University Press
may 2008 by cshalizi
Great, I have an ISBN! Now I just need to finish writing...
re:stacs
may 2008 by cshalizi
On the Non-parametric Prediction of Conditionally Stationary Sequences
february 2008 by cshalizi
Kernel-based non-parametric prediction of not-too-non-stationary processes.
stochastic_processes
prediction
nonparametrics
time_series
ergodic_theory
re:stacs
caires.s.
ferreira.j.a.
have_read
february 2008 by cshalizi
related tags
agent-based_models ⊕ amaral.luis ⊕ approximate_bayesian_computation ⊕ attractor_reconstruction ⊕ ay.nihat ⊕ belief_propagation ⊕ bergstrom.carl ⊕ bibliometry ⊕ biology ⊕ books:noted ⊕ bootstrap ⊕ caires.s. ⊕ causal_inference ⊕ cellular_automata ⊕ chaos ⊕ citation_networks ⊕ clauset.aaron ⊕ clustering ⊕ community_discovery ⊕ complexity_measures ⊕ computational_statistics ⊕ cross-validation ⊕ density_estimation ⊕ determinism ⊕ dimension_estimation ⊕ dynamical_systems ⊕ ecology ⊕ em_algorithm ⊕ entropy_estimation ⊕ ergodic_theory ⊕ estimation ⊕ estimation_of_dynamical_systems ⊕ evolutionary_biology ⊕ experimental_biology ⊕ exponential_family_random_graphs ⊕ ferreira.j.a. ⊕ financial_crisis_of_2007-- ⊕ financial_markets ⊕ fisher_information ⊕ flocks_and_swarms ⊕ fluctuation-dissipation_relations ⊕ fluid_mechanics ⊕ fonts ⊕ fractals ⊕ functional_connectivity ⊕ garamond ⊕ goodness-of-fit ⊕ granger_causality ⊕ graph_spectra ⊕ have_read ⊕ heard_the_talk ⊕ hofman.jake ⊕ homophily ⊕ hypothesis_testing ⊕ indirect_inference ⊕ information_theory ⊕ interacting_particle_systems ⊕ in_NB ⊕ jost.jurgen ⊕ kith_and_kin ⊕ kontoyiannis.ioannis ⊕ krivitsky.pavel ⊕ latent_variables ⊕ latex ⊕ learning_theory ⊕ machine_learning ⊕ macro_from_micro ⊕ markov_models ⊕ memory ⊕ minimum_description_length ⊕ model-checking ⊕ model_selection ⊕ moore.cris ⊕ morphogenesis ⊕ networks ⊕ network_data_analysis ⊕ neural_data_analysis ⊕ neville.jennifer ⊕ nilsson_jacobi.martin ⊕ non-equilibrium ⊕ nonparametrics ⊕ particle_filters ⊕ pattern_formation ⊕ phase_transitions ⊕ philosophy_of_science ⊕ point_processes ⊕ prediction ⊕ preferential_attachment ⊕ programming ⊕ random_fields ⊕ random_walks ⊕ re:almost_none ⊕ re:AoS_project ⊕ re:functional_communities ⊕ re:homophily_and_confounding ⊕ re:network_differences ⊕ re:social-networks-as-sensor-networks ⊕ re:stacs ⊖ re:XV_for_mixing ⊕ re:your_favorite_dsge_sucks ⊕ reaction-diffusion ⊕ regression ⊕ rosvall.martin ⊕ simulation ⊕ simulation-based_inference ⊕ social_influence ⊕ spatial_statistics ⊕ splines ⊕ stability_of_learning ⊕ state-space_models ⊕ state_estimation ⊕ statistical_inference_for_stochastic_processes ⊕ statistical_mechanics ⊕ statistics ⊕ stochastic_processes ⊕ time_series ⊕ to:blog ⊕ to:NB ⊕ to_read ⊕ to_teach:complexity-and-inference ⊕ to_teach:undergrad-ADA ⊕ transaction_networks ⊕ turing_mechanism ⊕ via:aks ⊕ via:cris_moore ⊕ via:dsparks ⊕ via:gelman ⊕ via:logista ⊕ visual_display_of_quantitative_information ⊕ watts.duncan ⊕Copy this bookmark: