cshalizi + to:nb   1142

[1205.3845] Forecasting with Historical Data or Process Knowledge under Misspecification: A Comparison
"When faced with the task of forecasting a dynamic system, practitioners often have available historical data, knowledge of the system, or a combination of both. While intuition dictates that perfect knowledge of the system should in theory yield perfect forecasting, often knowledge of the system is only partially known, known up to parameters, or known incorrectly. In contrast, forecasting using previous data without any process knowledge might result in accurate prediction for simple systems, but will fail for highly nonlinear and chaotic systems. In this paper, the authors demonstrate how even in chaotic systems, forecasting with historical data is preferable to using process knowledge if this knowledge exhibits certain forms of misspecification. Through an extensive simulation study, a range of misspecification and forecasting scenarios are examined with the goal of gaining an improved understanding of the circumstances under which forecasting from historical data is to be preferred over using process knowledge."
to:NB  to_read  prediction  time_series  misspecification  re:growing_ensemble_project 
8 days ago by cshalizi
Phys. Rev. Lett. 108, 200601 (2012): Number of Relevant Directions in Principal Component Analysis and Wishart Random Matrices
"We compute analytically, for large N, the probability P(N+,N) that a N×N Wishart random matrix has N+ eigenvalues exceeding a threshold Nζ, including its large deviation tails. This probability plays a benchmark role when performing the principal component analysis of a large empirical data set. We find that P(N+,N)≈exp⁡[-βN2ψζ(N+/N)], where β is the Dyson index of the ensemble and ψζ(κ) is a rate function that we compute explicitly in the full range 0≤κ≤1 and for any ζ. The rate function ψζ(κ) displays a quadratic behavior modulated by a logarithmic singularity close to its minimum κ⋆(ζ). This is shown to be a consequence of a phase transition in an associated Coulomb gas problem. The variance Δ(N) of the number of relevant components is also shown to grow universally (independent of ζ) as Δ(N)∼(βπ2)-1ln⁡N for large N."
to:NB  to_read  principal_components  large_deviations  random_matrices  stochastic_processes  high-dimensional_probability  re:g_paper  phase_transitions 
8 days ago by cshalizi
Lam , Yao : Factor modeling for high-dimensional time series: Inference for the number of factors
"This paper deals with the factor modeling for high-dimensional time series based on a dimension-reduction viewpoint. Under stationary settings, the inference is simple in the sense that both the number of factors and the factor loadings are estimated in terms of an eigenanalysis for a nonnegative definite matrix, and is therefore applicable when the dimension of time series is on the order of a few thousands. Asymptotic properties of the proposed method are investigated under two settings: (i) the sample size goes to infinity while the dimension of time series is fixed; and (ii) both the sample size and the dimension of time series go to infinity together. In particular, our estimators for zero-eigenvalues enjoy faster convergence (or slower divergence) rates, hence making the estimation for the number of factors easier. In particular, when the sample size and the dimension of time series go to infinity together, the estimators for the eigenvalues are no longer consistent. However, our estimator for the number of the factors, which is based on the ratios of the estimated eigenvalues, still works fine. Furthermore, this estimation shows the so-called “blessing of dimensionality” property in the sense that the performance of the estimation may improve when the dimension of time series increases. A two-step procedure is investigated when the factors are of different degrees of strength. Numerical illustration with both simulated and real data is also reported."
to:NB  dimension_reduction  factor_analysis  time_series  high-dimensional_statistics  inference_to_latent_objects 
10 days ago by cshalizi
Wang , Phillips : A specification test for nonlinear nonstationary models
"We provide a limit theory for a general class of kernel smoothed U-statistics that may be used for specification testing in time series regression with nonstationary data. The test framework allows for linear and nonlinear models with endogenous regressors that have autoregressive unit roots or near unit roots. The limit theory for the specification test depends on the self-intersection local time of a Gaussian process. A new weak convergence result is developed for certain partial sums of functions involving nonstationary time series that converges to the intersection local time process. This result is of independent interest and is useful in other applications. Simulations examine the finite sample performance of the test."
to:NB  time_series  non-stationarity  model-checking  statistics  misspecification 
10 days ago by cshalizi
Rigollet : Kullback–Leibler aggregation and misspecified generalized linear models
"In a regression setup with deterministic design, we study the pure aggregation problem and introduce a natural extension from the Gaussian distribution to distributions in the exponential family. While this extension bears strong connections with generalized linear models, it does not require identifiability of the parameter or even that the model on the systematic component is true. It is shown that this problem can be solved by constrained and/or penalized likelihood maximization and we derive sharp oracle inequalities that hold both in expectation and with high probability. Finally all the bounds are proved to be optimal in a minimax sense."
to:NB  regression  ensemble_methods  statistics 
10 days ago by cshalizi
[1205.3703] Generic chaining and the l1-penalty
"We address the choice of the tuning parameter $lambda$ in $ell_1$-penalized M-estimation. Our main concern is models which are highly nonlinear, such as the Gaussian mixture model. The number of parameters $p$ is moreover large, possibly larger than the number of observations $n$. The generic chaining technique of Talagrand[2005] is tailored for this problem. It leads to the choice $lambda asymp sqrt {log p / n}$, as in the standard Lasso procedure (which concerns the linear model and least squares loss)."
to:NB  to_read  statistics  empirical_processes  high-dimensional_statistics  van_de_geer.sara 
11 days ago by cshalizi
Phys. Rev. Lett. 108, 200403 (2012): Time Asymmetry of Probabilities Versus Relativistic Causal Structure: An Arrow of Time
"There is an incompatibility between the symmetries of causal structure in relativity theory and the signaling abilities of probabilistic devices with inputs and outputs: while time reversal in relativity will not introduce the ability to signal between spacelike separated regions, this is not the case for probabilistic devices with spacelike separated input-output pairs. We explicitly describe a nonsignaling device which becomes a perfect signaling device under time reversal, where time reversal can be conceptualized as playing backwards a videotape of an agent manipulating the device. This leads to an arrow of time that is identifiable when studying the correlations of events for spacelike separated regions. Somewhat surprisingly, although the time reversal of Popescu-Rohrlich boxes also allows agents to signal, it does not yield a perfect signaling device. Finally, we realize time reversal using postselection, which could to lead experimental implementation."
to:NB  causality  physics  relativity  arrow_of_time  to_read 
12 days ago by cshalizi
[1205.3208] A New Family of Generalized 3D Cat Maps
"Since the 1990s chaotic cat maps are widely used in data encryption, for their very complicated dynamics within a simple model and desired characteristics related to requirements of cryptography. The number of cat map parameters and the map period length after discretization are two major concerns in many applications for security reasons. In this paper, we propose a new family of 36 distinctive 3D cat maps with different spatial configurations taking existing 3D cat maps [1]-[4] as special cases. Our analysis and comparisons show that this new 3D cat maps family has more independent map parameters and much longer averaged period lengths than existing 3D cat maps. The presented cat map family can be extended to higher dimensional cases."

(to_teach tags for clsses which use the cat map as an example)
to:NB  cat_map  dynamical_systems  cryptography  to_teach:complexity-and-inference  to_teach:statcomp  to_teach:undergrad-ADA 
12 days ago by cshalizi
Quantitative patterns of stylistic influence in the evolution of literature
"Literature is a form of expression whose temporal structure, both in content and style, provides a historical record of the evolution of culture. In this work we take on a quantitative analysis of literary style and conduct the first large-scale temporal stylometric study of literature by using the vast holdings in the Project Gutenberg Digital Library corpus. We find temporal stylistic localization among authors through the analysis of the similarity structure in feature vectors derived from content-free word usage, nonhomogeneous decay rates of stylistic influence, and an accelerating rate of decay of influence among modern authors. Within a given time period we also find evidence for stylistic coherence with a given literary topic, such that writers in different fields adopt different literary styles. This study gives quantitative support to the notion of a literary “style of a time” with a strong trend toward increasingly contemporaneous stylistic influence."

It'll be interesting to see how they handle the bias induced by selective retention.
to:NB  to_read  literary_history  text_mining  kith_and_kin  rockmore.dan  krakuer.david 
13 days ago by cshalizi
Archaeology as a social science
"Because of advances in methods and theory, archaeology now addresses issues central to debates in the social sciences in a far more sophisticated manner than ever before. Coupled with methodological innovations, multiscalar archaeological studies around the world have produced a wealth of new data that provide a unique perspective on long-term changes in human societies, as they document variation in human behavior and institutions before the modern era. We illustrate these points with three examples: changes in human settlements, the roles of markets and states in deep history, and changes in standards of living. Alternative pathways toward complexity suggest how common processes may operate under contrasting ecologies, populations, and economic integration."
to:NB  archaeology  social_science_methodology 
13 days ago by cshalizi
Constructing summary statistics for approximate Bayesian computation: semi-automatic approximate Bayesian computation - Fearnhead - 2012 - Journal of the Royal Statistical Society: Series B (Statistical Methodology) - Wiley Online Library
"Many modern statistical applications involve inference for complex stochastic models, where it is easy to simulate from the models, but impossible to calculate likelihoods. Approximate Bayesian computation (ABC) is a method of inference for such models. It replaces calculation of the likelihood by a step which involves simulating artificial data for different parameter values, and comparing summary statistics of the simulated data with summary statistics of the observed data. Here we show how to construct appropriate summary statistics for ABC in a semi-automatic manner. We aim for summary statistics which will enable inference about certain parameters of interest to be as accurate as possible. Theoretical results show that optimal summary statistics are the posterior means of the parameters. Although these cannot be calculated analytically, we use an extra stage of simulation to estimate how the posterior means vary as a function of the data; and we then use these estimates of our summary statistics within ABC. Empirical results show that our approach is a robust method for choosing summary statistics that can result in substantially more accurate ABC analyses than the ad hoc choices of summary statistics that have been proposed in the literature. We also demonstrate advantages over two alternative methods of simulation-based inference."
to:NB  indirect_inference  estimation  statistics  approximate_bayesian_computation  computational_statistics  to_teach:complexity-and-inference  re:stacs 
13 days ago by cshalizi
Estimating the Causal Effects of Social Interaction with Endogenous Networks
"Identifying causal effects attributable to network membership is a key challenge in empirical studies of social networks. In this article, we examine the consequences of endogeneity for inferences about the effects of networks on network members’ behavior. Using the House office lottery (in which newly elected members select their office spaces in a randomly chosen order) as an instrumental variable to estimate the causal impact of legislative networks on roll call behavior and cosponsorship decisions in the 105th–112th Houses, we find no evidence that office proximity affects patterns of legislative behavior. These results contrast with decades of congressional scholarship and recent empirical studies. Our analysis demonstrates the importance of accounting for selection processes and omitted variables in estimating the causal impact of networks."
to:NB  causal_inference  re:critique_of_diffusion  social_influence  congress  network_data_analysis  social_networks  homophily  re:homophily_and_confounding 
13 days ago by cshalizi
[1205.2609] Which Spatial Partition Trees are Adaptive to Intrinsic Dimension?
"Recent theory work has found that a special type of spatial partition tree - called a random projection tree - is adaptive to the intrinsic dimension of the data from which it is built. Here we examine this same question, with a combination of theory and experiments, for a broader class of trees that includes k-d trees, dyadic trees, and PCA trees. Our motivation is to get a feel for (i) the kind of intrinsic low dimensional structure that can be empirically verified, (ii) the extent to which a spatial partition can exploit such structure, and (iii) the implications for standard statistical tasks such as regression, vector quantization, and nearest neighbor search."
to:NB  decision_trees  prediction  regression  statistics  dimension_reduction  machine_learning 
13 days ago by cshalizi
[1205.2736] How Visibility and Divided Attention Constrain Social Contagion
"How far and how fast does information spread in social media? Researchers have recently examined a number of factors that affect information diffusion in online social networks, including: the novelty of information, users' activity levels, who they pay attention to, and how they respond to friends' recommendations. Using URLs as markers of information, we carry out a detailed study of retweeting, the primary mechanism by which information spreads on the Twitter follower graph. Our empirical study examines how users respond to an incoming stimulus, i.e., a tweet (message) from a friend, and reveals that %retweeting behavior is constrained by a few simple principles. the "principle of least effort" combined with limited attention plays a dominant role in retweeting behavior. Specifically, we observe that users retweet information when it is most visible, such as when it near the top of their Twitter stream. Moreover, our measurements quantify how a user's limited attention is divided among incoming tweets, providing novel evidence that highly connected individuals are less likely to propagate an arbitrary tweet. Our study indicates that the finite ability to process incoming information constrains social contagion, and we conclude that rapid decay of visibility is the primary barrier to information propagation online."
to:NB  social_contagion  networked_life  epidemiology_of_representations 
13 days ago by cshalizi
Likelihood inference for discriminating between long-memory and change-point models - Yau - 2012 - Journal of Time Series Analysis - Wiley Online Library
"We develop a likelihood ratio (LR) test procedure for discriminating between a short-memory time series with a change-point (CP) and a long-memory (LM) time series. Under the null hypothesis, the time series consists of two segments of short-memory time series with different means and possibly different covariance functions. The location of the shift in the mean is unknown. Under the alternative, the time series has no shift in mean but rather is LM. The LR statistic is defined as the normalized log-ratio of the Whittle likelihood between the CP model and the LM model, which is asymptotically normally distributed under the null. The LR test provides a parametric alternative to the CUSUM test proposed by Berkes et al. (2006). Moreover, the LR test is more general than the CUSUM test in the sense that it is applicable to changes in other marginal or dependence features other than a change-in-mean. We show its good performance in simulations and apply it to two data examples."
to:NB  time_series  change-point_problem  long-range_dependence  statistics  to_teach:undergrad-ADA  hypothesis_testing 
13 days ago by cshalizi
Measures of mutual and causal dependence between two time series
"New measures are proposed for mutual and causal dependence between two time series, based on information theoretical ideas. The measure of mutual dependence is shown to be the sum of the measure of unidirectional causal dependence from the first time series to the second, the measure of unidirectional causal dependence from the second to the first, and the measure of instantaneous causal dependence. The measures are applicable to any kind of time series: continuous, discrete, or categorical."
to:NB  causality  information_theory  stochastic_processes  rissanen.jorma  via:coleman 
17 days ago by cshalizi
[1205.2265] Efficient Constrained Regret Minimization
"Online learning constitutes a mathematical framework to analyze sequential decision making problems in adversarial environments. The learner repeatedly chooses an action, the environment responds with an outcome, and then the learner receives a reward for the played action. The goal of the learner is to maximize his total reward. However, there are situations in which, in addition to maximizing the cumulative reward, there are some additional constraints/goals on the sequence of decisions that must be satisfied by the learner. For example, in textit{online marketing}, simultaneously maximizing the cumulative reward and the number of buyers to take advantage of word-of-mouth advertising for future marketing seems to be a more ambitious goal than only maximizing cumulative reward. As another example, learning from costly expert advice captures more realistic settings than the original setting in applications such as routing in networks with power constraint. In this paper we study an extension to the online learning where the learner aims to maximize the total reward given that some additional constraints need to be satisfied. We propose Lagrangian exponentially weighted average (textbf{LEWA}) algorithm, an efficient algorithm to solve constrained online learning, which is a primal dual variant of the well known exponentially weighted average algorithm and inspired by the theory of Lagrangian method in constrained optimization. We establish the regret and the violation of the constraint bounds in full information and bandit feedback models."
to:NB  low-regret_learning  optimization  machine_learning 
17 days ago by cshalizi
Neural Circuit Reconfiguration by Social Status
"The social rank of an animal is distinguished by its behavior relative to others in its community. Although social-status-dependent differences in behavior must arise because of differences in neural function, status-dependent differences in the underlying neural circuitry have only begun to be described. We report that dominant and subordinate crayfish differ in their behavioral orienting response to an unexpected unilateral touch, and that these differences correlate with functional differences in local neural circuits that mediate the responses. The behavioral differences correlate with simultaneously recorded differences in leg depressor muscle EMGs and with differences in the responses of depressor motor neurons recorded in reduced, in vitro preparations from the same animals. The responses of local serotonergic interneurons to unilateral stimuli displayed the same status-dependent differences as the depressor motor neurons. These results indicate that the circuits and their intrinsic serotonergic modulatory components are configured differently according to social status, and that these differences do not depend on a continuous descending signal from higher centers."
to:NB  neuroscience  experimental_biology  experimental_sociology  crustaceans  social_neuroscience 
18 days ago by cshalizi
[1204.3863] The mechanics of stochastic slowdown in evolutionary games
"We study the stochastic dynamics of evolutionary games, and focus on the so-called `stochastic slowdown' effect, previously observed in (Altrock et. al, 2010) for simple evolutionary dynamics. Slowdown here refers to the fact that a beneficial mutation may take longer to fixate than a neutral one. More precisely, the fixation time conditioned on the mutant taking over can show a maximum at intermediate selection strength. We show that this phenomenon is present in the prisoner's dilemma, and also discuss counterintuitive slowdown and speedup in coexistence games. In order to establish the microscopic origins of these phenomena, we calculate the average sojourn times. This allows us to identify the transient states which contribute most to the slowdown effect, and enables us to provide an understanding of slowdown in the takeover of a small group of cooperators by defectors: Defection spreads quickly initially, but the final steps to takeover can be delayed significantly. The analysis of coexistence games reveals even more intricate behavior. In small populations, the conditional average fixation time can show multiple extrema as a function of the selection strength, e.g., slowdown, speedup, and slowdown again. We classify two-player games with respect to the possibility to observe non-monotonic behavior of the conditional average fixation time as a function of selection strength."
to:NB  evolutionary_game_theory  re:do-institutions-evolve 
18 days ago by cshalizi
[1205.0241] Counterfactual Graphical Models for Mediation Analysis via Path-Specific Effects
"Potential outcome counterfactuals represent variation in the outcome of interest after a hypothetical treatment or intervention is performed. Causal graphical models are a concise, intuitive way of representing causal assumptions, including independence constraints among such counterfactuals. Much of modern causal inference is concerned with expressing cause effect relationships of interest in counterfactual form, showing how the resulting counterfactuals can be identified (that is expressed in terms of available data, using domain-specific causal assumptions), and subsequently estimated using statistical methods. In this paper we will use causal graphical models to analyze the identification problem of the so-called emph{path-specific effects}, that is effects of treatment on outcome along certain specified causal paths. Such effects arise in mediation analysis settings where it's important to distinguish direct and indirect effects of treatment. We review existing results on path-specific effects in the fully observable, static treatment setting, and extend them to settings with time-varying treatments, and latent variables."
to:NB  causal_inference  shpister.ilya  graphical_models 
18 days ago by cshalizi
[1204.5421] Epidemics on a stochastic model of temporal network
"Contacts between individuals serve as pathways where infections may propagate. These contact patterns can be represented by network structures. Static structures have been the common modeling paradigm but recent results suggest that temporal structures play different roles to regulate the spread of infections or infection-like dynamics. On temporal networks a vertex is active only at certain moments and inactive otherwise such that a contact is not continuously available. In several empirical networks, the time between two consecutive vertex-activation events typically follows heterogeneous activity (e.g. bursts). In this chapter, we present a simple and intuitive stochastic model of a temporal network and investigate how epidemics co-evolves with the temporal structures, focusing on the growth dynamics of the epidemics. The model assumes no underlying topological structure and is only constrained by the time between two consecutive events of vertex activation. The main observation is that the speed of the infection spread is different in case of heterogeneous and homogeneous temporal patterns but the differences depend on the stage of the epidemics. In comparison to the homogeneous scenario, the power law case results in a faster growth in the beginning but turns out to be slower after a certain time, taking several time steps to reach the whole network."
to:NB  networks  epidemic_models  re:social-networks-as-sensor-networks 
18 days ago by cshalizi
[1205.1828] The Natural Gradient by Analogy to Signal Whitening, and Recipes and Tricks for its Use
"The natural gradient allows for more efficient gradient descent by removing dependencies and biases inherent in a function's parameterization. Several papers present the topic thoroughly and precisely. It remains a very difficult idea to get your head around however. The intent of this note is to provide simple intuition for the natural gradient and its use. We review how an ill conditioned parameter space can undermine learning, introduce the natural gradient by analogy to the more widely understood concept of signal whitening, and present tricks and specific prescriptions for applying the natural gradient to learning problems."

Does this ever mention the phrase "Fisher information"?
to:NB  optimization  statistics  estimation  fisher_information  information_geometry 
18 days ago by cshalizi
[1203.3504] On Measurement Bias in Causal Inference
"This paper addresses the problem of measurement errors in causal inference and highlights several algebraic and graphical methods for eliminating systematic bias induced by such errors. In particulars, the paper discusses the control of partially observable confounders in parametric and non parametric models and the computational problem of obtaining bias-free effect estimates in such models."
to:NB  causal_inference  inference_to_latent_objects  pearl.judea  to_teach:undergrad-ADA  statistics  error_in_variables  via:arthegall 
18 days ago by cshalizi
Schlozman, K. and Verba, S., Brady, H.: The Unheavenly Chorus: Unequal Political Voice and the Broken Promise of American Democracy.
"The Unheavenly Chorus is the first book to look at the political participation of individual citizens alongside the political advocacy of thousands of organized interests--membership associations such as unions, professional associations, trade associations, and citizens groups, as well as organizations like corporations, hospitals, and universities. Drawing on numerous in-depth surveys of members of the public as well as the largest database of interest organizations ever created--representing more than thirty-five thousand organizations over a twenty-five-year period--this book conclusively demonstrates that American democracy is marred by deeply ingrained and persistent class-based political inequality. The well educated and affluent are active in many ways to make their voices heard, while the less advantaged are not. This book reveals how the political voices of organized interests are even less representative than those of individuals, how political advantage is handed down across generations, how recruitment to political activity perpetuates and exaggerates existing biases, how political voice on the Internet replicates these inequalities--and more."
to:NB  inequality  democracy  us_politics  political_science  re:democratic_cognition  books:noted 
18 days ago by cshalizi
Lai , Huang , Lee : Fixed and random effects selection in nonparametric additive mixed models
"This paper considers the problem of model selection in a nonparametric additive mixed modeling framework. The fixed effects are modeled nonparametrically using truncated series expansions with B-spline basis. Estimation and selection of such nonparametric fixed effects are simultaneously achieved by using the adaptive group lasso methodology, while the random effects are selected by a traditional backward selection mechanism. To facilitate the automatic selection of model dimension, computable expressions for the degrees of freedom for both the fixed and random effects components are derived, and the Bayesian Information criterion (BIC) is used to select the final model choice. Theoretically it is shown that this BIC model selection method is consistent, while computationally a practical algorithm is developed for solving the optimization problem involved. Simulation results show that the proposed methodology is often capable of selecting the correct significant fixed and random effects components, especially when the sample size and/or signal to noise ratio are not too small. The new method is also applied to two real data sets."
to:NB  regression  additive_models  statistics 
19 days ago by cshalizi
[1205.1406] Graph Prediction in a Low-Rank and Autoregressive Setting
"We study the problem of prediction for evolving graph data. We formulate the problem as the minimization of a convex objective encouraging sparsity and low-rank of the solution, that reflect natural graph properties. The convex formulation allows to obtain oracle inequalities and efficient solvers. We provide empirical results for our algorithm and comparison with competing methods, and point out two open questions related to compressed sensing and algebra of low-rank and sparse matrices."
to:NB  network_data_analysis  prediction  statistics  low-rank_approximation 
19 days ago by cshalizi
Using Internet Data for Economic Research
"The data used by economists can be broadly divided into two categories. First, structured datasets arise when a government agency, trade association, or company can justify the expense of assembling records. The Internet has transformed how economists interact with these datasets by lowering the cost of storing, updating, distributing, finding, and retrieving this information. Second, some economic researchers affirmatively collect data of interest. For researcher-collected data, the Internet opens exceptional possibilities both by increasing the amount of information available for researchers to gather and by lowering researchers' costs of collecting information. In this paper, I explore the Internet's new datasets, present methods for harnessing their wealth, and survey a sampling of the research questions these data help to answer. The first section of this paper discusses "scraping" the Internet for data—that is, collecting data on prices, quantities, and key characteristics that are already available on websites but not yet organized in a form useful for economic research. A second part of the paper considers online experiments, including experiments that the economic researcher observes but does not control (for example, when Amazon or eBay alters site design or bidding rules); and experiments in which a researcher participates in design, including those conducted in partnership with a company or website, and online versions of laboratory experiments. Finally, I discuss certain limits to this type of data collection, including both "terms of use" restrictions on websites and concerns about privacy and confidentiality."
to:NB  economics  data_sets  web  re:your_favorite_dsge_sucks 
20 days ago by cshalizi
Accurately estimating neuronal correlation requires a new spike-sorting paradigm
"Neurophysiology is increasingly focused on identifying coincident activity among neurons. Strong inferences about neural computation are made from the results of such studies, so it is important that these results be accurate. However, the preliminary step in the analysis of such data, the assignment of spike waveforms to individual neurons (“spike-sorting”), makes a critical assumption which undermines the analysis: that spikes, and hence neurons, are independent. We show that this assumption guarantees that coincident spiking estimates such as correlation coefficients are biased. We also show how to eliminate this bias. Our solution involves sorting spikes jointly, which contrasts with the current practice of sorting spikes independently of other spikes. This new “ensemble sorting” yields unbiased estimates of coincident spiking, and permits more data to be analyzed with confidence, improving the quality and quantity of neurophysiological inferences. These results should be of interest outside the context of neuronal correlations studies. Indeed, simultaneous recording of many neurons has become the rule rather than the exception in experiments, so it is essential to spike sort correctly if we are to make valid inferences about any properties of, and relationships between, neurons."
to:NB  heard_the_talk  neuroscience  neural_data_analysis  ventura.valerie  kith_and_kin  statistics  inference_to_latent_objects 
20 days ago by cshalizi
Clarke , Clarke : Prediction in several conventional contexts
"We review predictive techniques from several traditional branches of statistics. Starting with prediction based on the normal model and on the empirical distribution function, we proceed to techniques for various forms of regression and classification. Then, we turn to time series, longitudinal data, and survival analysis. Our focus throughout is on the mechanics of prediction more than on the properties of predictors."

(to_teach tags are tentative.)
to:NB  prediction  statistics  classifiers  regression  to_teach:undergrad-ADA  to_teach:data-mining 
20 days ago by cshalizi
No-Regret Learning and a Mechanism for Distributed Multiagent Planning
"We develop a novel mechanism for coordinated, distributed multiagent planning. We consider problems stated as a col- lection of single-agent planning problems coupled by com- mon soft constraints on resource consumption. (Resources may be real or fictitious, the latter introduced as a tool for factoring the problem). A key idea is to recast the dis- tributed planning problem as learning in a repeated game between the original agents and a newly introduced group of adversarial agents who influence prices for the resources. The adversarial agents benefit from arbitrage: that is, their incentive is to uncover violations of the resource usage con- straints and, by selfishly adjusting prices, encourage the original agents to avoid plans that cause such violations. If all agents employ no-regret learning algorithms in the course of this repeated interaction, we are able to show that our mechanism can achieve design goals such as social op- timality (efficiency), budget balance, and Nash-equilibrium convergence to within an error which approaches zero as the agents gain experience. In particular, the agents’ average plans converge to a socially optimal solution for the original planning task. We present experiments in a simulated net- work routing domain demonstrating our method’s ability to reliably generate sound plans."
online_learning  economics  markets_as_collective_calculating_devices  re:knightian_uncertainty  gordon.geoff  to:NB  low-regret_learning 
20 days ago by cshalizi
Ehm , Gneiting : Local proper scoring rules of order two
"Scoring rules assess the quality of probabilistic forecasts, by assigning a numerical score based on the predictive distribution and on the event or value that materializes. A scoring rule is proper if it encourages truthful reporting. It is local of order k if the score depends on the predictive density only through its value and the values of its derivatives of order up to k at the realizing event. Complementing fundamental recent work by Parry, Dawid and Lauritzen, we characterize the local proper scoring rules of order 2 relative to a broad class of Lebesgue densities on the real line, using a different approach. In a data example, we use local and nonlocal proper scoring rules to assess statistically postprocessed ensemble weather forecasts."
to:NB  prediction  scoring_rules  statistics  gneiting.tilmann 
21 days ago by cshalizi
Dawid , Lauritzen , Parry : Proper local scoring rules on discrete sample spaces
"A scoring rule is a loss function measuring the quality of a quoted probability distribution Q for a random variable X, in the light of the realized outcome x of X; it is proper if the expected score, under any distribution P for X, is minimized by quoting Q = P. Using the fact that any differentiable proper scoring rule on a finite sample space is the gradient of a concave homogeneous function, we consider when such a rule can be local in the sense of depending only on the probabilities quoted for points in a nominated neighborhood of x. Under mild conditions, we characterize such a proper local scoring rule in terms of a collection of homogeneous functions on the cliques of an undirected graph on the space . A useful property of such rules is that the quoted distribution Q need only be known up to a scale factor. Examples of the use of such scoring rules include Besag’s pseudo-likelihood and Hyvärinen’s method of ratio matching."
to:NB  prediction  scoring_rules  statistics  lauritzen.steffen  dawid.philip 
21 days ago by cshalizi
Parry , Dawid , Lauritzen : Proper local scoring rules
"We investigate proper scoring rules for continuous distributions on the real line. It is known that the log score is the only such rule that depends on the quoted density only through its value at the outcome that materializes. Here we allow further dependence on a finite number m of derivatives of the density at the outcome, and describe a large class of such m-local proper scoring rules: these exist for all even m but no odd m. We further show that for m ≥ 2 all such m-local rules can be computed without knowledge of the normalizing constant of the distribution."
to:NB  prediction  scoring_rules  lauritzen.steffen  dawid.philip  statistics 
21 days ago by cshalizi
[0805.1404] Adaptive estimation of a distribution function and its density in sup-norm loss by wavelet and spline projections
"Given an i.i.d. sample from a distribution $F$ on $mathbb{R}$ with uniformly continuous density $p_0$, purely data-driven estimators are constructed that efficiently estimate $F$ in sup-norm loss and simultaneously estimate $p_0$ at the best possible rate of convergence over H"older balls, also in sup-norm loss. The estimators are obtained by applying a model selection procedure close to Lepski's method with random thresholds to projections of the empirical measure onto spaces spanned by wavelets or $B$-splines. The random thresholds are based on suprema of Rademacher processes indexed by wavelet or spline projection kernels. This requires Bernstein-type analogs of the inequalities in Koltchinskii [Ann. Statist. 34 (2006) 2593-2656] for the deviation of suprema of empirical processes from their Rademacher symmetrizations."
to:NB  density_estimation  wavelets  splines  statistics  empirical_processes 
22 days ago by cshalizi
Testing parametric conditional distributions using the nonparametric smoothing method
"This paper proposes a new goodness-of-fit test for parametric conditional probability distributions using the nonparametric smoothing methodology. An asymptotic normal distribution is established for the test statistic under the null hypothesis of correct specification of the parametric distribution. The test is shown to have power against local alternatives converging to the null at certain rates. The test can be applied to testing for possible misspecifications in a wide variety of parametric models. A bootstrap procedure is provided for obtaining more accurate critical values for the test. Monte Carlo simulations show that the test has good power against some common alternatives."
to:NB  misspecification  density_estimation  smoothing  statistics  to_teach:undergrad-ADA 
22 days ago by cshalizi
[1204.6441] "I Wanted to Predict Elections with Twitter and all I got was this Lousy Paper" -- A Balanced Survey on Election Prediction using Twitter Data
"Predicting X from Twitter is a popular fad within the Twitter research subculture. It seems both appealing and relatively easy. Among such kind of studies, electoral prediction is maybe the most attractive, and at this moment there is a growing body of literature on such a topic. This is not only an interesting research problem but, above all, it is extremely difficult. However, most of the authors seem to be more interested in claiming positive results than in providing sound and reproducible methods. It is also especially worrisome that many recent papers seem to only acknowledge those studies supporting the idea of Twitter predicting elections, instead of conducting a balanced literature review showing both sides of the matter. After reading many of such papers I have decided to write such a survey myself. Hence, in this paper, every study relevant to the matter of electoral prediction using social media is commented. From this review it can be concluded that the predictive power of Twitter regarding elections has been greatly exaggerated, and that hard research problems still lie ahead."
to:NB  social_media  data_mining  prediction  have_read 
24 days ago by cshalizi
Towards Integrative Causal Analysis of Heterogeneous Data Sets and Studies
"We present methods able to predict the presence and strength of conditional and unconditional dependencies (correlations) between two variables Y and Z never jointly measured on the same samples, based on multiple data sets measuring a set of common variables. The algorithms are specializations of prior work on learning causal structures from overlapping variable sets. This problem has also been addressed in the field of statistical matching. The proposed methods are applied to a wide range of domains and are shown to accurately predict the presence of thousands of dependencies. Compared against prototypical statistical matching algorithms and within the scope of our experiments, the proposed algorithms make predictions that are better correlated with the sample estimates of the unknown parameters on test data ; this is particularly the case when the number of commonly measured variables is low.
"The enabling idea behind the methods is to induce one or all causal models that are simultaneously consistent with (fit) all available data sets and prior knowledge and reason with them. This allows constraints stemming from causal assumptions (e.g., Causal Markov Condition, Faithfulness) to propagate. Several methods have been developed based on this idea, for which we propose the unifying name Integrative Causal Analysis (INCA). A contrived example is presented demonstrating the theoretical potential to develop more general methods for co-analyzing heterogeneous data sets. The computational experiments with the novel methods provide evidence that causally-inspired assumptions such as Faithfulness often hold to a good degree of approximation in many real systems and could be exploited for statistical inference. Code, scripts, and data are available at www.mensxmachina.org."
to:NB  to_read  causal_inference  graphical_models  to_teach:undergrad-ADA 
25 days ago by cshalizi
Consistent Model Selection Criteria on High Dimensions
"Asymptotic properties of model selection criteria for high-dimensional regression models are studied where the dimension of covariates is much larger than the sample size. Several sufficient conditions for model selection consistency are provided. Non-Gaussian error distributions are considered and it is shown that the maximal number of covariates for model selection consistency depends on the tail behavior of the error distribution. Also, sufficient conditions for model selection consistency are given when the variance of the noise is neither known nor estimated consistently. Results of simulation studies as well as real data analysis are given to illustrate that finite sample performances of consistent model selection criteria can be quite different."
to:NB  model_selection  statistics  high-dimensional_probability 
25 days ago by cshalizi
"The huge Package for High-dimensional Undirected Graph Estimation in R"
"We describe an R package named huge which provides easy-to-use functions for estimating high dimensional undirected graphs from data. This package implements recent results in the literature, including Friedman et al. (2007), Liu et al. (2009, 2012) and Liu et al. (2010). Compared with the existing graph estimation package glasso, the huge package provides extra features: (1) instead of using Fortan, it is written in C, which makes the code more portable and easier to modify; (2) besides fitting Gaussian graphical models, it also provides functions for fitting high dimensional semiparametric Gaussian copula models; (3) more functions like data-dependent model selection, data generation and graph visualization; (4) a minor convergence problem of the graphical lasso algorithm is corrected; (5) the package allows the user to apply both lossless and lossy screening rules to scale up large-scale problems, making a tradeoff between computational and statistical efficiency."
to:NB  to_teach:undergrad-ADA  graphical_models  statistics  kith_and_kin  wasserman.larry  roeder.kathryn  liu.han 
25 days ago by cshalizi
Unconscious Relational Inference Recruits the Hippocampus
"Relational inference denotes the capacity to encode, flexibly retrieve, and integrate multiple memories to combine past experiences to update knowledge and improve decision-making in new situations. Although relational inference is thought to depend on the hippocampus and consciousness, we now show in young, healthy men that it may occur outside consciousness but still recruits the hippocampus. In temporally distinct and unique subliminal episodes, we presented word pairs that either overlapped (“winter–red”, “red–computer”) or not. Effects of unconscious relational inference emerged in reaction times recorded during unconscious encoding and in the outcome of decisions made 1 min later at test, when participants judged the semantic relatedness of two supraliminal words. These words were either episodically related through a common word (“winter–computer” related through “red”) or unrelated. Hippocampal activity increased during the unconscious encoding of overlapping versus nonoverlapping word pairs and during the unconscious retrieval of episodically related versus unrelated words. Furthermore, hippocampal activity during unconscious encoding predicted the outcome of decisions made at test. Hence, unconscious inference may influence decision-making in new situations."

Relations represented spatially?
to:NB  neuroscience  experimental_psychology 
25 days ago by cshalizi
Larger than Life: Digital Creatures in a Family of Two-Dimensional Cellular Automata (Evans, 2001)
"We introduce the Larger than Life family of two-dimensional two-state cellular automata that generalize certain nearest neighbor outer totalistic cellular automaton rules to large neighborhoods. We describe linear and quadratic rescalings of John Conway's celebrated Game of Life to these large neighborhood cellular automaton rules and present corresponding generalizations of Life's famous gliders and spaceships. We show that, as is becoming well known for nearest neighbor cellular automaton rules, these ``digital creatures'' are ubiquitous for certain parameter values."

(Meta-comment: jeez, guys, how hard is it to re-direct old URLs? Or at least to have a working search box?)
cellular_automata  conways_life  have_read  to:NB  evans.kellie_m. 
28 days ago by cshalizi
[1204.6265] Statistical inference for dynamical systems: a review
"The topic of statistical inference for dynamical systems has been studied extensively across several fields. In this survey we focus on the problem of parameter estimation for non-linear dynamical systems. Our objective is to place results across distinct disciplines in a common setting and highlight opportunities for further research."
to:NB  to_read  statistical_inference_for_stochastic_processes  dynamical_systems  statistics  time_series  state-space_models  state-space_reconstruction  pillai.natesh  via:ded-maxim 
28 days ago by cshalizi
[1006.1015] Computational Tools for Evaluating Phylogenetic and Hierarchical Clustering Trees
"Inferential summaries of tree estimates are useful in the setting of evolutionary biology, where phylogenetic trees have been built from DNA data since the 1960's. In bioinformatics, psychometrics and data mining, hierarchical clustering techniques output the same mathematical objects, and practitioners have similar questions about the stability and `generalizability' of these summaries. This paper provides an implementation of the geometric distance between trees developed by Billera, Holmes and Vogtmann (2001) [BHV] equally applicable to phylogenetic trees and hieirarchical clustering trees, and shows some of the applications in statistical inference for which this distance can be useful. In particular, since BHV have shown that the space of trees is negatively curved (a CAT(0) space), a natural representation of a collection of trees is a tree. We compare this representation to the Euclidean approximations of treespace made available through Multidimensional Scaling of the matrix of distances between trees. We also provide applications of the distances between trees to hierarchical clustering trees constructed from microarrays. Our method gives a new way of evaluating the influence both of certain columns (positions, variables or genes) and of certain rows (whether species, observations or arrays)."
to:NB  clustering  hierarchical_structure  holmes.susan  data_mining  statistics  to_teach:data-mining  gene_expression_data_analysis  via:ryan_t 
4 weeks ago by cshalizi
[1204.5935] On the prevalence of non-Gibbsian states in mathematical physics
"Gibbs measures are the main object of study in equilibrium statistical mechanics, and are used in many other contexts, including dynamical systems and ergodic theory, and spatial statistics. However, in a large number of natural instances one encounters measures that are not of Gibbsian form. We present here a number of examples of such non-Gibbsian measures, and discuss some of the underlying mathematical and physical issues to which they gave rise."
to:NB  statistical_mechanics  stochastic_processes  gibbs_distributions 
4 weeks ago by cshalizi
[1204.6023] Finite Evolutionary Processes
"We consider the evolution of large but finite populations on arbitrary fitness landscapes. We describe the evolutionary process by a Markov, Moran process. We show that to $mathcal O(1/N)$, the time-averaged fitness is lower for the finite population than it is for the infinite population. We also show that fluctuations in the number of individuals for a given genotype can be proportional to a power of the inverse of the mutation rate. Finally, we show that the probability for the system to take a given path through the fitness landscape can be non-monotonic in system size."
to:NB  evolutionary_biology  stochastic_processes 
4 weeks ago by cshalizi
Civilizing the Economy - Academic and Professional Books - Cambridge University Press
"When a handful of people thrive while whole industries implode and millions suffer, it is clear that something is wrong with our economy. The wealth of the few is disconnected from the misery of the many. In Civilizing the Economy, Marvin Brown traces the origin of this economics of dissociation to early capitalism, showing how this is illustrated in Adam Smith's denial of the central role of slavery in wealth creation. In place of the Smithian economics of property, Brown proposes that we turn to the original meaning of economics as household management. He presents a new framework for the global economy that reframes its purpose as the making of provisions instead of the accumulation of property. This bold new vision establishes the civic sphere as the platform for organizing an inclusive economy and as a way to move toward a more just and sustainable world."
to:NB  books:noted  economics  to_be_shot_after_a_fair_trial 
4 weeks ago by cshalizi
Graphlets: a Spectral Perspective for Graph Limits
"Graphlets give a spectral approach to graph limits for general graph sequences in a framework that unifies previous disparate approaches for dealing with dense graphs and sparse graphs. We will show that the con- vergence to graphlets under the appropriate spectral distance is equivalent to the convergence using the (normalized) cut distance. We then examine the geometry of graphlets, illustrated by examples of several families of graphlets and, in particular, graphlets with low ranks. We further dis- cuss a number of usages of graphlets, including universal scalable bases, universal embeddings vis heat kernels and the preservation of Cheeger cuts."

ETA: This is so not an easy read. I like what I understand, but I definitely have to make another attack on it.
to:NB  to_read  graph_theory  graph_limits  re:smoothing_adjacency_matrices  re:network_differences  chung.fan  via:alessandro  graph_spectra 
4 weeks ago by cshalizi
Organic Synthesis via Irradiation and Warming of Ice Grains in the Solar Nebula
"Complex organic compounds, including many important to life on Earth, are commonly found in meteoritic and cometary samples, though their origins remain a mystery. We examined whether such molecules could be produced within the solar nebula by tracking the dynamical evolution of ice grains in the nebula and recording the environments to which they were exposed. We found that icy grains originating in the outer disk, where temperatures were less than 30 kelvin, experienced ultraviolet irradiation exposures and thermal warming similar to that which has been shown to produce complex organics in laboratory experiments. These results imply that organic compounds are natural by-products of protoplanetary disk evolution and should be important ingredients in the formation of all planetary systems, including our own."

Cf. Ken MacLeod's cometary Lucretian gods.
to:NB  origins_of_life  biochemistry  astrobiology  astronomy 
4 weeks ago by cshalizi
Analytic Thinking Promotes Religious Disbelief
"Scientific interest in the cognitive underpinnings of religious belief has grown in recent years. However, to date, little experimental research has focused on the cognitive processes that may promote religious disbelief. The present studies apply a dual-process model of cognitive processing to this problem, testing the hypothesis that analytic processing promotes religious disbelief. Individual differences in the tendency to analytically override initially flawed intuitions in reasoning were associated with increased religious disbelief. Four additional experiments provided evidence of causation, as subtle manipulations known to trigger analytic processing also encouraged religious disbelief. Combined, these studies indicate that analytic processing is one factor (presumably among several) that promotes religious disbelief. Although these findings do not speak directly to conversations about the inherent rationality, value, or truth of religious beliefs, they illuminate one cognitive factor that may influence such discussions."

The part of me which imprinted on _Why I Am Not a Christian_ is chortling. Another part of me, however, is wondering how hard it would be to write "Analytic Thinking Promotes Disbelief in Psychological Studies".
to:NB  to_read  experimental_psychology  cognitive_science  religion 
4 weeks ago by cshalizi
Origins and Genetic Legacy of Neolithic Farmers and Hunter-Gatherers in Europe
"The farming way of life originated in the Near East some 11,000 years ago and had reached most of the European continent 5000 years later. However, the impact of the agricultural revolution on demography and patterns of genomic variation in Europe remains unknown. We obtained 249 million base pairs of genomic DNA from ~5000-year-old remains of three hunter-gatherers and one farmer excavated in Scandinavia and find that the farmer is genetically most similar to extant southern Europeans, contrasting sharply to the hunter-gatherers, whose distinct genetic signature is most similar to that of extant northern Europeans. Our results suggest that migration from southern Europe catalyzed the spread of agriculture and that admixture in the wake of this expansion eventually shaped the genomic landscape of modern-day Europe."

It's cool that they can do that, but that last sentence is, well, more than a bit of a reach.
to:NB  human_genetics  historical_genetics  archaeology 
4 weeks ago by cshalizi
"Network Coevolution and Democracy: A Spatial Econometric Approach" by Aya Kachi
"Regime transitions are contagious according to the diffusion-of-democracy literature: a country's regime is affected by others' through various predefined networks (e.g. geographical proximity), as well as by the country's own political, economic and social attributes (e.g. GDP levels). My account departs from the existing diffusion theory by allowing for countries' self-selection into peer regime networks based on their democracy levels in the past. For example, a country can form stronger dependency ties with countries that demonstrated similar democracy levels in the past (homophily). In the longitudinal setting, the traditional diffusion mechanism with the presence of self-selection generates the "co-evolutionary dynamic" between country networks and democracy levels. With this recursive feedback process between tie formation and democracy levels, it becomes extremely difficult to evaluate empirically how each country's level of democracy is determined, because we need to distinguish the following three processes statistically. First, country-specific attributes determine the level of democracy as in the earliest democratization studies. Second, other states' democracy levels also predict a country's regime as demonstrated in the conventional diffusion studies. Finally with my theory of endogenous network formation, the seeming diffusion effect is partially a consequence of their self-selection into peer networks. A newer spatial econometric model, an "M-STAR + Co-Evolution" model, is one of the first that allows us to test for all of these three dynamics behind democratization. In my first-cut analysis, I find that all three processes indeed exist."

ETA: It's good to recognize the problem exists, but the model used here does not make it go away, and still fails to identify the influence effect (if one exists).
to:NB  to_read  political_science  network_data_analysis  homophily  contagion  re:critique_of_diffusion  democracy 
4 weeks ago by cshalizi
On the Relation Between Encoding and Decoding of Neuronal Spikes
"Neural coding is a field of study that concerns how sensory information is represented in the brain by networks of neurons. The link between external stimulus and neural response can be studied from two parallel points of view. The first, neural encoding, refers to the mapping from stimulus to response. It focuses primarily on understanding how neurons respond to a wide variety of stimuli and constructing models that accurately describe the stimulus-response relationship. Neural decoding refers to the reverse mapping, from response to stimulus, where the challenge is to reconstruct a stimulus from the spikes it evokes. Since neuronal response is stochastic, a one-to-one mapping of stimuli into neural responses does not exist, causing a mismatch between the two viewpoints of neural coding. Here we use these two perspectives to investigate the question of what rate coding is, in the simple setting of a single stationary stimulus parameter and a single stationary spike train represented by a renewal process. We show that when rate codes are defined in terms of encoding, that is, the stimulus parameter is mapped onto the mean firing rate, the rate decoder given by spike counts or the sample mean does not always efficiently decode the rate codes, but it can improve efficiency in reading certain rate codes when correlations within a spike train are taken into account."
to:NB  to_read  neural_coding_and_decoding  kith_and_kin  koyama.shinsuke 
4 weeks ago by cshalizi
Recent Trends in Top Income Shares in the United States: Reconciling Estimates from March CPS and IRS Tax Return Data
"Although most U.S. income inequality research is based on public use March CPS data, a new wave of research using IRS tax return data reports substantially faster inequality growth for recent years. We show that these apparently inconsistent estimates are largely reconciled when the income distribution and inequality are defined the same way. Using internal CPS data for 1967 to 2006, we show that CPS-based estimates of top income shares are similar to IRS data-based estimates reported by Piketty and Saez (2003). Our results imply that income inequality changes since 1993 are largely driven by changes in incomes of the top 1%."
to:NB  economics  inequality  class_struggles_in_america 
4 weeks ago by cshalizi
The Role of Copulas in the Housing Crisis - Review of Economics and Statistics - Abstract
"Due to its simplicity and familiarity, the Gaussian copula is popular in calculating risk in collaterized debt obligations, but it imposes asymptotic independence such that extreme events appear to be unrelated. This restriction might be innocuous in normal times, but during extreme events, such as the housing crisis, the Gaussian copula might be inappropriate. This paper explores various copula specifications and finds that the degree to which housing prices are related based on the Gaussian copula is too small compared with real housing price data."
to:NB  mortgage_crisis  financial_crisis_of_2007--  finance  copulas  bad_data_analysis  mea_copula  mea_maxima_copula 
4 weeks ago by cshalizi
[1204.5540] Learning Graph Structure in Discrete Markov Random Fields
"We present a general algorithm for learning the structure of discrete Markov random fields from i.i.d. samples. Several algorithms have been proposed for structure learning algorithms earlier and each of these address the learning problem under different assumptions. Our algorithm provides a unified view in the following sense: when our algorithm is applied to each of the special cases, it results in a the same computational complexity as earlier algorithms. More importantly, our approach also provides a new low-computational complexity algorithm for the case of Ising models where the underlying graph is the Erdos-Renyi random graph G(p,c/p)."

When would you ever want to learn an Ising model on an E-R graph?
to:NB  graphical_models  machine_learning  networks 
4 weeks ago by cshalizi
Multiple dynamic representations in the motor cortex during sensorimotor learning : Nature : Nature Publishing Group
"The mechanisms linking sensation and action during learning are poorly understood. Layer 2/3 neurons in the motor cortex might participate in sensorimotor integration and learning; they receive input from sensory cortex and excite deep layer neurons, which control movement. Here we imaged activity in the same set of layer 2/3 neurons in the motor cortex over weeks, while mice learned to detect objects with their whiskers and report detection with licking. Spatially intermingled neurons represented sensory (touch) and motor behaviours (whisker movements and licking). With learning, the population-level representation of task-related licking strengthened. In trained mice, population-level representations were redundant and stable, despite dynamism of single-neuron representations. The activity of a subpopulation of neurons was consistent with touch driving licking behaviour. Our results suggest that ensembles of motor cortex neurons couple sensory input to multiple, related motor programs during learning."
to:NB  learning_theory  neuroscience  functional_connectivity  experimental_biology 
4 weeks ago by cshalizi
[1204.5633] Noncentral Limit Theorem and the Bootstrap for Quantiles of Dependent Data
"We will show under minimal conditions on differentiability and dependence that the central limit theorem for quantiles holds and that the block bootstrap is weakly consistent. Under slightly stronger conditions, the bootstrap is strongly consistent. Without the differentiability condition, quantiles might have a non-normal asymptotic distribution and the bootstrap might fail."
to:NB  bootstrap  statistics  statistical_inference_for_stochastic_processes 
4 weeks ago by cshalizi
[1204.5721] Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems
"Multi-armed bandit problems are the most basic examples of sequential decision problems with an exploration-exploitation trade-off. This is the balance between staying with the option that gave highest payoffs in the past and exploring new options that might give higher payoffs in the future. Although the study of bandit problems dates back to the Thirties, exploration-exploitation trade-offs arise in several modern applications, such as ad placement, website optimization, and packet routing. Mathematically, a multi-armed bandit is defined by the payoff process associated with each option. In this survey, we focus on two extreme cases in which the analysis of regret is particularly simple and elegant: i.i.d. payoffs and adversarial payoffs. Besides the basic setting of finitely many actions, we also analyze some of the most important variants and extensions, such as the contextual bandit model."
to:NB  individual_sequence_prediction  online_learning  bandit_problems  re:knightian_uncertainty  low-regret_learning 
4 weeks ago by cshalizi
[1204.5584] Physics of Large Deviation
"A large deviation function mathematically characterizes the statistical property of atypical events. Recently, in non-equilibrium statistical mechanics, large deviation functions have been used to describe universal laws such as the fluctuation theorem. Despite such significance, large deviation functions have not been easily obtained in laboratory experiments. Thus, in order to understand the physical significance of large deviation functions, it is necessary to consider their experimental measurability in greater detail. This aspect of large deviation is discussed with the presentation of a future problem."
to:NB  large_deviations  statistical_mechanics 
4 weeks ago by cshalizi
Game-powered machine learning
"Searching for relevant content in a massive amount of multimedia information is facilitated by accurately annotating each image, video, or song with a large number of relevant semantic keywords, or tags. We introduce game-powered machine learning, an integrated approach to annotating multimedia content that combines the effectiveness of human computation, through online games, with the scalability of machine learning. We investigate this framework for labeling music. First, a socially-oriented music annotation game called Herd It collects reliable music annotations based on the “wisdom of the crowds.” Second, these annotated examples are used to train a supervised machine learning system. Third, the machine learning system actively directs the annotation games to collect new data that will most benefit future model iterations. Once trained, the system can automatically annotate a corpus of music much larger than what could be labeled using human computation alone. Automatically annotated songs can be retrieved based on their semantic relevance to text-based queries (e.g., “funky jazz with saxophone,” “spooky electronica,” etc.). Based on the results presented in this paper, we find that actively coupling annotation games with machine learning provides a reliable and scalable approach to making searchable massive amounts of multimedia data."

--- This is more than a bit of a stunt, but it points in an interesting direction.
to:NB  to_read  data_mining  collective_cognition  active_learning  tagging  classifiers  re:democratic_cognition 
4 weeks ago by cshalizi
Phase Transitions in Machine Learning - Academic and Professional Books - Cambridge University Press
Should ask C.M. if this is worth bothering with.

"Phase transitions typically occur in combinatorial computational problems and have important consequences, especially with the current spread of statistical relational learning as well as sequence learning methodologies. In Phase Transitions in Machine Learning the authors begin by describing in detail this phenomenon, and the extensive experimental investigation that supports its presence. They then turn their attention to the possible implications and explore appropriate methods for tackling them. Weaving together fundamental aspects of computer science, statistical physics and machine learning, the book provides sufficient mathematics and physics background to make the subject intelligible to researchers in AI and other computer science communities. Open research issues are also discussed, suggesting promising directions for future research."
to:NB  books:noted  machine_learning  computational_complexity  phase_transitions 
5 weeks ago by cshalizi
Protein Interaction Networks - Academic and Professional Books - Cambridge University Press
"The analysis of protein-protein interactions is fundamental to the understanding of cellular organization, processes, and functions. Recent large-scale investigations of protein-protein interactions using such techniques as two-hybrid systems, mass spectrometry, and protein microarrays have enriched the available protein interaction data and facilitated the construction of integrated protein-protein interaction networks. The resulting large volume of protein-protein interaction data has posed a challenge to experimental investigation. This book provides a comprehensive understanding of the computational methods available for the analysis of protein-protein interaction networks. It offers an in-depth survey of a range of approaches, including statistical, topological, data-mining, and ontology-based methods. The author discusses the fundamental principles underlying each of these approaches and their respective benefits and drawbacks, and she offers suggestions for future research."
to:NB  books:noted  biochemical_networks 
5 weeks ago by cshalizi
[1204.3915] Theory and Inference for a Class of Observation-driven Models with Application to Time Series of Counts
"This paper studies theory and inference related to a class of time series models that incorporates nonlinear dynamics. It is assumed that the observations follow a one-parameter exponential family of distributions given an accompanying process that evolves as a function of lagged observations. We employ an iterated random function approach and a special coupling technique to show that, under suitable conditions on the parameter space, the conditional mean process is a geometric moment contracting Markov chain and that the observation process is absolutely regular with geometrically decaying coefficients. Moreover the asymptotic theory of the maximum likelihood estimates of the parameters is established under some mild assumptions. These models are applied to two examples; the first is the number of transactions per minute of Ericsson stock and the second is related to return times of extreme events of Goldman Sachs Group stock."

--- Without reading beyond the abstract, I'm guessing chains with complete connections.
to:NB  time_series  markov_models  statistics 
5 weeks ago by cshalizi
[1204.3946] The Dynamics of Influence Systems
"Influence systems form a large class of multiagent systems designed to model how influence, broadly defined, spreads across a dynamic network. We build a general analytical framework which we then use to prove that, while sometimes chaotic, influence dynamics is almost always asymptotically periodic. Besides resolving the dynamics of a popular family of multiagent systems, the other contribution of this work is to introduce a new type of renormalization-based bifurcation analysis for multiagent systems."
to:NB  influence  agent-based_models  dynamical_systems  chazelle.bernard 
5 weeks ago by cshalizi
Zipes, J.: The Irresistible Fairy Tale: The Cultural and Social History of a Genre.
"If there is one genre that has captured the imagination of people in all walks of life throughout the world, it is the fairy tale. Yet we still have great difficulty understanding how it originated, evolved, and spread--or why so many people cannot resist its appeal, no matter how it changes or what form it takes. In this book, renowned fairy-tale expert Jack Zipes presents a provocative new theory about why fairy tales were created and retold--and why they became such an indelible and infinitely adaptable part of cultures around the world.
"Drawing on cognitive science, evolutionary theory, anthropology, psychology, literary theory, and other fields, Zipes presents a nuanced argument about how fairy tales originated in ancient oral cultures, how they evolved through the rise of literary culture and print, and how, in our own time, they continue to change through their adaptation in an ever-growing variety of media. In making his case, Zipes considers a wide range of fascinating examples, including fairy tales told, collected, and written by women in the nineteenth century; Catherine Breillat's film adaptation of Perrault's "Bluebeard"; and contemporary fairy-tale drawings, paintings, sculptures, and photographs that critique canonical print versions.
"While we may never be able to fully explain fairy tales, The Irresistible Fairy Tale provides a powerful theory of how and why they evolved--and why we still use them to make meaning of our lives."
to:NB  books:noted  mythology  fairy_tales  literary_criticism 
5 weeks ago by cshalizi
Life Behind the Lobby: Indian American Motel Owners and the American Dream - Pawan Dhingra
"Indian Americans own about half of all the motels in the United States. Even more remarkable, most of these motel owners come from the same region in India and—although they are not all related—seventy percent of them share the surname of Patel. Most of these motel owners arrived in the United States with few resources and, broadly speaking, they are self-employed, self-sufficient immigrants who have become successful—they live the American dream.
"However, framing this group as embodying the American dream has profound implications. It perpetuates the idea of American exceptionalism—that this nation creates opportunities for newcomers unattainable elsewhere—and also downplays the inequalities of race, gender, culture, and globalization immigrants continue to face. Despite their dominance in the motel industry, Indian American moteliers are concentrated in lower- and mid-budget markets. Life Behind the Lobby explains Indian Americans' simultaneous accomplishments and marginalization and takes a close look at their own role in sustaining that duality."
to:NB  books:noted  ethnography  sociology  something_about_america  india  immigration 
5 weeks ago by cshalizi
Xiao , Wu : Covariance matrix estimation for stationary time series
"We obtain a sharp convergence rate for banded covariance matrix estimates of stationary processes. A precise order of magnitude is derived for spectral radius of sample covariance matrices. We also consider a thresholded covariance matrix estimator that can better characterize sparsity if the true covariance matrix is sparse. As our main tool, we implement Toeplitz [Math. Ann. 70 (1911) 351–376] idea and relate eigenvalues of covariance matrices to the spectral densities or Fourier transforms of the covariances. We develop a large deviation result for quadratic forms of stationary processes using m-dependence approximation, under the framework of causal representation and physical dependence measures."
to:NB  time_series  statistics  estimation  variance_estimation 
6 weeks ago by cshalizi
Arias-Castro , Bubeck , Lugosi : Detection of correlations
"We consider the hypothesis testing problem of deciding whether an observed high-dimensional vector has independent normal components or, alternatively, if it has a small subset of correlated components. The correlated components may have a certain combinatorial structure known to the statistician. We establish upper and lower bounds for the worst-case (minimax) risk in terms of the size of the correlated subset, the level of correlation, and the structure of the class of possibly correlated sets. We show that some simple tests have near-optimal performance in many cases, while the generalized likelihood ratio test is suboptimal in some important cases."
to:NB  statistics  factor_analysis 
6 weeks ago by cshalizi
Bai , Li : Statistical analysis of factor models of high dimension
"This paper considers the maximum likelihood estimation of factor models of high dimension, where the number of variables (N) is comparable with or even greater than the number of observations (T). An inferential theory is developed. We establish not only consistency but also the rate of convergence and the limiting distributions. Five different sets of identification conditions are considered. We show that the distributions of the MLE estimators depend on the identification restrictions. Unlike the principal components approach, the maximum likelihood estimator explicitly allows heteroskedasticities, which are jointly estimated with other parameters. Efficiency of MLE relative to the principal components method is also considered."
to:NB  to_read  factor_analysis  statistics  high-dimensional_statistics 
6 weeks ago by cshalizi
The Global Diffusion of Public Policies: Social Construction, Coercion, Competition, or Learning? - Annual Review of Sociology, 33(1):449
"Social scientists have sketched four distinct theories to explain a phenomenon that appears to have ramped up in recent years, the diffusion of policies across countries. Constructivists trace policy norms to expert epistemic communities and international organizations, who define economic progress and human rights. Coercion theorists point to powerful nation-states, and international financial institutions, that threaten sanctions or promise aid in return for fiscal conservatism, free trade, etc. Competition theorists argue that countries compete to attract investment and to sell exports by lowering the cost of doing business, reducing constraints on investment, or reducing tariff barriers in the hope of reciprocity. Learning theorists suggest that countries learn from their own experiences and, as well, from the policy experiments of their peers. We review the large body of research from sociologists and political scientists, as well as the growing body of work from economists and psychologists, pointing to the diverse mechanisms that are theorized and to promising avenues for distinguishing among causal mechanisms."
to:NB  political_science  political_economy  re:critique_of_diffusion 
6 weeks ago by cshalizi
Contagion or Confusion? Why Conflicts Cluster in Space - Buhaug - 2008 - International Studies Quarterly - Wiley Online Library
"Civil wars cluster in space as well as time. In this study, we develop and evaluate empirically alternative explanations for this observed clustering. We consider whether the spatial pattern of intrastate conflict simply stems from a similar distribution of relevant country attributes or whether conflicts indeed constitute a threat to other proximate states. Our results strongly suggest that there is a genuine neighborhood effect of armed conflict, over and beyond what individual country characteristics can account for. We then examine whether the risk of contagion depends on the degree of exposure to proximate conflicts. Contrary to common expectations, this appears not to be the case. Rather, we find that conflict is more likely when there are ethnic ties to groups in a neighboring conflict and that contagion is primarily a feature of separatist conflicts. This suggests that transnational ethnic linkages constitute a central mechanism of conflict contagion."
to:NB  contagion  political_science  war  re:critique_of_diffusion 
6 weeks ago by cshalizi
[1204.2612] Computing bounds for entropy of stationary Z^d Markov random fields
"For any stationary $mZ^d$-Gibbs measure that satisfies strong spatial mixing, we obtain sequences of upper and lower approximations that converge to its entropy. In the case, $d=2$, these approximations are efficient in the sense that the approximations are accurate to within $epsilon$ and can be computed in time polynomial in $1/epsilon$."
to:NB  information_theory  markov_models  stochastic_processes  entropy 
6 weeks ago by cshalizi
Space–time modelling of coupled spatiotemporal environmental variables - Ippoliti - 2012 - Journal of the Royal Statistical Society: Series C (Applied Statistics) - Wiley Online Library
"dynamic factor model for spatiotemporal coupled environmental variables. The model is proposed in a state space formulation which, through Kalman recursions, allows a unified approach to prediction and estimation. Full probabilistic inference for the model parameters is facilitated by adapting standard Markov chain Monte Carlo algorithms for dynamic linear models to our model formulation. The predictive ability of the model is discussed for two different data sets with variables measured at two different scales. Some possibilities for further research are also outlined."
to:NB  spatial_statistics  state-space_models  statistics 
6 weeks ago by cshalizi
Local polynomial regression for symmetric positive definite matrices - Yuan - 2012 - Journal of the Royal Statistical Society: Series B (Statistical Methodology) - Wiley Online Library
"Local polynomial regression has received extensive attention for the non-parametric estimation of regression functions when both the response and the covariate are in Euclidean space. However, little has been done when the response is in a Riemannian manifold. We develop an intrinsic local polynomial regression estimate for the analysis of symmetric positive definite matrices as responses that lie in a Riemannian manifold with covariate in Euclidean space. The primary motivation and application of the methodology proposed is in computer vision and medical imaging. We examine two commonly used metrics, including the trace metric and the log-Euclidean metric on the space of symmetric positive definite matrices. For each metric, we develop a cross-validation bandwidth selection method, derive the asymptotic bias, variance and normality of the intrinsic local constant and local linear estimators, and compare their asymptotic mean-square errors. Simulation studies are further used to compare the estimators under the two metrics and to examine their finite sample performance. We use our method to detect diagnostic differences between diffusion tensors along fibre tracts in a study of human immunodeficiency virus."
to:NB  variance_estimation  statistics  regression  nonparametrics  kernel_estimators 
6 weeks ago by cshalizi
Phys. Rev. E 85, 031129 (2012): Entropy production and Kullback-Leibler divergence between stationary trajectories of discrete systems
"The irreversibility of a stationary time series can be quantified using the Kullback-Leibler divergence (KLD) between the probability of observing the series and the probability of observing the time-reversed series. Moreover, this KLD is a tool to estimate entropy production from stationary trajectories since it gives a lower bound to the entropy production of the physical process generating the series. In this paper we introduce analytical and numerical techniques to estimate the KLD between time series generated by several stochastic dynamics with a finite number of states. We examine the accuracy of our estimators for a specific example, a discrete flashing ratchet, and investigate how close the KLD is to the entropy production depending on the number of degrees of freedom of the system that are sampled in the trajectories."
to:NB  stochastic_processes  information_theory 
6 weeks ago by cshalizi
[1204.0608] Mixing times in evolutionary game dynamics
"Without mutation and migration, evolutionary dynamics ultimately leads to the extinction of all but one species. Such fixation processes are well understood and can be characterized analytically with methods from statistical physics. However, many biological arguments focus on stationary distributions in a mutation-selection equilibrium. Here, we address the equilibration time required to reach stationarity in the presence of mutation, this is known as the mixing time in the theory of Markov processes. We show that mixing times in evolutionary games have the opposite behaviour from fixation times when the intensity of selection increases: In coordination games with bistabilities, the fixation time decreases, but the mixing time increases. In coexistence games with metastable states, the fixation time increases, but the mixing time decreases. Our results are based on simulations and the WKB approximation of the master equation."
to:NB  evolutionary_game_theory  markov_models  mixing  re:do-institutions-evolve  stochastic_processes 
6 weeks ago by cshalizi
Quantifying the weight of evidence from a forensic fingerprint comparison: a new paradigm - Neumann - 2012 - Journal of the Royal Statistical Society: Series A (Statistics in Society) - Wiley Online Library
"The fingerprint has, with considerable justification, come to be regarded as the acme of forensic identification. Over the last century, millions of cases have been resolved world wide because of marks left at crime scenes. The comparison methodology has not evolved greatly during its history and it is universal practice to present fingerprint evidence to a court as a categoric opinion of identification or exclusion, or to classify the evidence as inconclusive and not to report it. There has been a growing movement to supplement the fingerprint examination process by one that has a statistical model, supported by appropriate databases for calculating numerical measures of weight of evidence. The movement calls for the establishment of a logical framework for informing conclusions, based on explicit assumptions and data and open to revision and improvement. The aim is to enable the numerical evaluation of evidence that would currently be reported as a categorical identification and also of evidence that would currently be classified as inconclusive. The paper presents the results of a project carried out by the Forensic Science Service that aims to attain this goal. After a historical review, we describe a formal model for assigning numerical values to configurations of minutiae in fingerprints. We describe how the parameters of the model have been optimized to take account of interoperator variability and distortion of the finger pad, and we present the results of a substantial validation experiment that was based on searches that have been carried out on the US national fingerprint database of approximately 600 million fingerprints."
to:NB  fingerprints  statistics 
6 weeks ago by cshalizi
« earlier      

related tags

20th_century_history  aaronson.scott  abduction  abstraction  abstract_algebra  academia  action_principles  active_learning  adamic.lada  adaptive_behavior  adaptive_preferences  additive_models  advertising  afghanistan  agent-based_models  agriculture  ai  air_hockey  alexander.jennifer_karns  algorithmic_information_theory  altruism  american_hegemony  american_history  american_revolution  analogy  analysis_of_variance  ancel_meyers.lauren  ancient_greece  ancient_history  animals  anthropology  appropriations_of_complexity  approximate_bayesian_computation  approximation  arab_spring  aral.sinan  archaeology  armchair_travel  arnold.v.i.  arrow_of_time  art  artificial_intelligence  artificial_life  ass-covering  associative_learning  astrobiology  astrology  astronomy  asymptotics  ataturk  atomic_physics  attention  attractor_reconstruction  author-identification  autism  automata_theory  automation  autonomous_agents  autonomy  axelrod_model  ay.nihat  bad_data_analysis  bad_management  bad_science  baez.john  balduzzi.david  ballistic_computation  bandit_problems  banking  barrel_cortex  bartlett.m.s.  bayesianism  bayesian_consistency  bayesian_nonparametrics  beer.randall  behavioral_ecology  bells_inequality  belusov-zhabotinsky  benfords_law  beran.jan  bernstein-von-mises  bernstein-von_mises  bialek.william  bickel.peter  bifurcations  biochemical_networks  biochemistry  bioinformatics  biological_organization  biology  biophysics  blanchard.olivier  blattman.chris  blei.david  blogging  blogs  blume.andreas  boltzmann.ludwig  boltzmann_brains  books:noted  books:recommended  boolean_algebra  boolean_networks  boosting  bootstrap  borsboom.denny  botany  bounded_rationality  bowles.samuel  branching_processes  brillinger.david  brock.william_a.  bubbles  buhlmann.peter  buntine.wray  bureaucracy  burke.edmund  burning_man  calibration  carroll.sean  categorical_data  catullus  cat_map  causality  causal_inference  cavalli-sforza  cellular_automata  central_asia  central_limit_theorem  cesa-bianchi.nicolo  cetaceans  change-point_problem  chaos  chazelle.bernard  chechnya  chemotaxis  china  christakis.nicholas  chung.fan  citation_networks  civil_disobedience  clarke.kevin  classification  classifiers  class_struggles_in_america  clauset.aaron  climate_change  climatology  clustering  co-evolution  coarse-graining  cockroaches  cognition  cognitive_development  cognitive_science  cognitive_triage  cohen.michael  coleman.todd  collaborative_filtering  collective_cognition  collective_support_for_individual_choice  combinatorics  commons  community_discovery  comparative_methods  complexity  complexity_measures  compressed_sensing  computational_complexity  computational_statistics  computers  computer_networks_as_provinces_of_the_commonwealth_of_letters  concentration_of_measure  conditional_random_fields  confidence_sets  congress  connectionism  consciousness  conservatism  conspiracy_theories  contagion  context_of_discovery_vs_context_of_justification  control  control_of_movement  control_theory  convergence_of_stochastic_processes  convexity  conways_life  copernicus  copulas  corporations  cortical_maps  cosmology  counter-terrorism  counterculture  coupled_map_lattices  covariance  coveted  cox.david_r.  cramer-rao  creativity  credit  crime  cross-validation  crustaceans  crutchfield.james_p.  cryptography  cultural_criticism  cultural_differences  cultural_diversity  cultural_evolution  cultural_exchange  cultural_transmission_of_cognitive_tools  cultural_universals  cumulative_advantage  curse_of_dimensionality  curve-estimation  curve_fitting  dataset_shift  data_analysis  data_mining  data_sets  dawid.philip  debunking  decision-making  decision_theory  decision_trees  decoherence  dehaene.stanislas  della_penna.nicholas  delong.brad  democracy  demography  density_estimation  deregulation  design_for_a_brain  determinism  development_economics  deviation_bounds  deviation_inequalities  devroye.luc  dewey.john  de_deo.simon  differential_equations  differential_geometry  diffusion_of_innovations  dimension_reduction  discretization  distance_covariance  distributed_systems  distributions  diversity  document_summarization  douc.randal  dsges  dynamical_systems  eagle.nathan  early_modern_european_history  early_modern_world_history  earthquakes  ecological_rationality  ecology  econometrics  economics  economics_of_information  economic_growth  economic_history  economic_policy  econophysics  edge_of_chaos  education  EEG  effective_connectivity  effective_field_theories  efficiency  ellis.warren  emergence  emotion  empirical_likelihood  empirical_processes  em_algorithm  endocrinology  engineering  engineering_of_self-organization  enlightenment  ensemble_methods  entropy  environmental_management  epidemic_models  epidemiology  epidemiology_of_representations  epistemology  equilibrium  ergodic_decomposition  ergodic_theory  error_in_variables  erwin.douglas  esotericism  estimation  estimation_of_dynamical_systems  ethics  ethnography  eurasian_history  evangelicals  evans.kellie_m.  event_related_potentials  evidence  evisceration  evo-devo  evolution  evolutionary_biology  evolutionary_economics  evolutionary_game_theory  evolutionary_psychology  evolution_of_complexity  evolution_of_cooperation  evolution_of_learning  evolving_local_rules  excitable_media  executive_function  expectation-maximization  experimental_biology  experimental_design  experimental_economics  experimental_physics  experimental_psychology  experimental_sociology  explanation  explanation_by_mechanisms  exponential_families  exponential_family_random_graphs  externalities  face_recognition  factor_analysis  fairy_tales  falsification  fan.jianqing  fantasy  fear  feature_selection  feedback  feldman.david  field_theory  filtering  filtrations  finance  financial_crisis_of_2007--  financial_markets  financial_speculation  fingerprints  fish  fisher_information  fluctuation-response  fluid_mechanics  fmri  fontana.walter  food_webs  foreign_policy  forensics  foundations_of_statistics  fowler.james  fox.emily  freckleton.robert_p  freedman.david_a  freeman.peter  freese.jeremy  french_revolution  functional_connectivity  functional_data  functional_data_analysis  funny:geeky  funny:malicious  galbraith.james_k.  galstyan.aram  game_theory  gaussian_processes  generalized_linear_models  genetics  gene_expression  gene_expression_data_analysis  gene_regulation  genocide  genres  geology  geometry  gershman.samuel  geyer.charles  gibbs_distributions  giere.ronald  gifford_lectures  gigerenzer.gerd  gilder.george  globalization  gneiting.tilmann  goerg.georg_m.  goldenfeld.nigel  goodness-of-fit  gordon.geoff  gothic  gotts.nick  grammar_induction  granger_causality  graphical_models  graph_grammars  graph_limits  graph_spectra  graph_theory  great_transformation  greenland  griffiths.thomas  grumble  grunwald.peter  haavelmo.trygve  habit  hansen.bruce  hansen.christian  harris.zellig  haslinger.rob  have_read  hayek.f.a._von  heard  heard_the_talk  heavy_tails  hebbian_learning  herding  heritability  heuristics  hierarchical_models  hierarchical_structure  high-dimensional_probability  high-dimensional_statistics  hilbert_space  hill.claire  hill.jennifer  historical_genetics  historical_linguistics  historical_materialism  historiography  history_of_biology  history_of_economics  history_of_ideas  history_of_mathematics  history_of_physics  history_of_religion  history_of_science  history_of_statistics  history_of_technology  holme.petter  holmes.susan  homeostasis  hommes.cars  homophily  horror  human_ecology  human_evolution  human_genetics  hydrodynamic_limits  hypercycles  hypothesis_testing  ideal-point_models  idealization  identity_politics  ideology  image_retrieval  imagination  imitation  immigration  immunology  imperfect_competition  imperialism  independent_components_analysis  india  indirect_inference  individualism  individual_sequence_prediction  induction  industrial_organization  industrial_revolution  inequality  inference_to_latent_objects  influence  information_cascades  information_criteria  information_geometry  information_retrieval  information_society  information_theory  innovation  input-output_analysis  insects  institutions  instrumental_variables  intelligence  interacting_particle_systems  interlocking_directorates  international_relations  intuition  inverse_problems  iq  ising_model  islam  isomorphism_problem  israel.jonathan  iterative_approximation  i_see_what_you_did_there  jackson.matthew_o.  jakulin.aleks  Japan  johnson-laird.philip  jordan.michael_i.  jost.jurgen  journalism  justification  kadanoff.leo  kalman_filter  karev.g.p.  karl  karrer.brian  keller-segel_model  kernel_estimators  kernel_methods  kinsella.stephen  kith_and_kin  kontoyiannis.ioannis  koyama.shinsuke  krakauer.david  krakuer.david  krugman.paul  labor  landauers_principle  laplace_approximation  large_deviations  lasso  latent_semantic_analysis  latent_variables  latin  launius.roger_d  lauritzen.steffen  lazer.david  learning_in_games  learning_theory  lebanon.guy  lebowitz.joel  lee.ann  lee.ann_b.  lei.jing  leibniz  lerman.kristina  levy_processes  liberalism  likelihood  limit_cycles  linear_algebra  linear_regression  linguistics  linguistic_evolution  liquid_crystals  literary_criticism  literary_history  liu.han  lives_of_the_scientists  locusts  loeb.jacques  logic  logical_positivism  logistic_map  lohr.wolfgang  long-memory_processes  long-range_dependence  low-rank_approximation  low-regret_learning  low_dimensional_projections  lugosi.gabor  lyapunov_exponents  machine_learning  machta.jon  macroeconomics  macro_from_micro  mad_science  magical_thinking  magnets  management  manhattan_project  manifold_learning  manski.charles  map-reduce  marketing  markets_as_collective_calculating_devices  market_bubbles  market_failures_in_everything  markovian_representations  markov_models  martingales  mason.winter  mathematical_biology  mathematical_logic  mathematics  maximum_entropy  maxwell.james_clerk  mayo-wilson.conor  mccurdy.howard_e  mean-field  meaning_as_location_in_a_system_of_relations  measurement_problem  measure_theory  mea_copula  mea_maxima_copula  mechanism_design  media  medieval_european_history  memory  mental_models  mental_testing  meta-analysis  methodological_advice  method_of_moments  method_of_sieves  meyn.sean  military_industrial_complex  military_privatization  minimax  mirror_neurons  missing_data  misspecification  mixing  model-checking  modeling  model_checking  model_selection  moderate_deviations  modularity  molecular_biology  moment_closures  monopolistic_competition  monte_carlo  moral_philosophy  moral_psychology  morphogenesis  mortgage_crisis  morvai.gusztav  moulines.eric  multiple_comparisons  multiple_testing  music  mythology  nadler.boaz  nardi.yuval  nationalism  native_american_history  natural_born_cyborgs  natural_language_processing  nearest_neighbors  neat_nonlinear_nonsense  networked_life  networks  network_data  network_data_analysis  network_formation  network_growth  network_sampling  neural_coding_and_decoding  neural_data_analysis  neural_modeling  neural_networks  neurath.otto  neuropsychology  neuroscience  neutral_models  newman.mark  newspapers  neyman.jerzy  noethers_theorem  noise_in_dynamics  nominate  non-equilibrium  non-stationarity  nonparametrics  north.jill  norton.john  not_making_this_up  nudges  nukes  numeracy  nye.mary_jo  obesity  observable_operator_models  oceanography  odlyzko.andrew  olfaction  online_learning  optimization  oracle_inequalities  orbanz.peter  ordinal_data  organizations  origins_of_life  oscillators  otters  our_decrepit_institutions  p-values  pac-bayesian  page_rank  paintings  paleontology  parallel_computing  partial_identification  particle_filters  particle_physics  parzen.emanuel  path_integrals  pattern_formation  pattern_recognition  pearl.judea  pearson  peer_production  pentland.alex  perception  percival.daniel  perturbation_theory  phase_transitions  philosophy  philosophy_of_history  philosophy_of_mind  philosophy_of_science  phys  physics  physics_of_information  pillai.natesh  pinsker.m.s.  plagues_and_peoples  poetry  poincare_recurrence  point_processes  polanyi.michael  policy_analysis_as_a_social_process  politi.antonio  political_economy  political_networks  political_philosophy  political_science  politkovskaya.anna  popular_science  porter.mason  post-soviet_politics  practices_relating_to_the_transmission_of_genetic_information  pragmatics  pre-validation  prediction  prediction_trees  predictive_state_representations  primates  primo.david  principal_components  prisoners_dilemma  privacy  privatization  probability  programming  progressive_forces  prosthetics  protest  psychiatry  psychoceramica  psychology  psychometrics  publication_bias  public_policy  quantum_mechanics  quasi-species_models  R  race_in_America  racism  racist_idiocy  rademacher_complexity  radev.dragomir  raginsky.maxim  randomness  random_boolean_networks  random_fields  random_graphs  random_matrices  random_matrix_theory  random_time_changes  random_walks  rare_event_simulation  rat_whiskers  re:almost  re:almost_none  re:anti-tsallis  re:AoS_project  re:bayes_as_evol  re:critique_of_diffusion  re:democratic_cognition  re:do-institutions-evolve  re:friday_cat-blogging  re:functional_communities  re:growing_ensemble_project  re:g_paper  re:heart-waves-project  re:homophily_and_confounding  re:knightian_uncertainty  re:LICORS  re:LoB_project  re:naive-semi-supervised  re:network_differences  re:phil-of-bayes_paper  re:smoothing_adjacency_matrices  re:social-networks-as-sensor-networks  re:stacs  re:what_is_a_macrostate  re:what_is_the_right_null_model_for_linear_regression  re:XV_for_mixing  re:your_favorite_dsge_sucks  re:your_favorite_ergm_sucks  reaction-diffusion  reciprocity  recurrence_times  reductionism  regression  regulation  reichardt.joerg  reinforcement_learning  relativity  religion  renaissance_history  renormalization  replication  replicator_dynamics  representation  review_papers  rhetoric  richards.joey  rinaldo.alessandro  risk_assessment  risk_perception  risk_vs_uncertainty  rissanen.jorma  robin.corey  robins.james  robots_and_robotics  robustness  robust_statistics  rockmore.dan  roeder.kathryn  roman_empire  ryabko.b._ya.  ryabko.daniil  sarwate.anand  satisficing  schafer.chad  schelling_model  science_as_a_social_process  science_fiction  science_journalism  science_studies  scientific_revolution  scoring_rules  self-fulfilling_prophecy  self-organization  self-organized_criticality  self-replication  semantics_from_syntax  semi-supervised_learning  sensor_networks  serotonin  set_theory  sex_differences  shiller.robert  shot_after_a_fair_trial  shpister.ilya  signal_processing  signal_transduction  simulation  simulation-based_inference  singh.aarti  singular_value_decomposition  smith.eric  smoking  smola.alex  smoothing  smyth.padhraic  socialism  socialist_calculation_debate  social_cognition  social_construction  social_contagion  social_engineering  social_influence  social_life_of_the_mind  social_media  social_mobility  social_movements  social_networks  social_neuroscience  social_norms  social_psychology  social_science_methodology  social_theory  sociology  sociology_of_science  sole.ricard  something_about_america  song_dynasty  south-east_asia  space_exploration  space_station  sparsity  spatial_statistics  spectral_clustering  spectral_methods  sperber.dan  splines  stability_of_learning  standardized_testing  state-building  state-space_models  state-space_reconstruction  state_estimation  statistical_inference_for_stochastic_processes  statistical_mechanics  statistics  stepping_stone_model  stigler.stephen  stiglitz.joseph  stochastic_differential_equations  stochastic_models  stochastic_processes  stotz.karola  strategic_position_in_networks  strength_of_weak_ties  stress  structured_data  sufficiency  suhay.liz  superheroes  supervenience  surprisingly_not_via:warrenellis  sutherland.william_j  symbolic_dynamics  symbols_from_dynamics  synchronization  synchronizing_words  systems_identification  tagging  tang_dynasty  tao.terence  taxes  technological_change  technological_unemployment  teeth  tenenbaum.joshua  terrorism  text_mining  theoretical_biology  theoretical_computer_science  theory_of_mind  thermodynamics  thermodynamic_formalism  the_american_dilemma  the_continuing_crises  the_great_transformation  the_nightmare_from_which_we_are_trying_to_awake  the_organism_as_an_adaptive_control_system  the_present_before_it_was_widely_distributed  the_running-dogs_of_reaction  tibshirani.robert  time_rescaling  time_series  titan  to:blog  to:NB  topic_models  topological_defects  total_factor_productivity  touchette.hugo  to_be_shot_after_a_fair_trial  to_read  to_teach  to_teach:complexity-and-inference  to_teach:data-mining  to_teach:statcomp  to_teach:undergrad-ADA  track_down_references  trade_in_antiquity  tradition  transaction_networks  translation  transmission_of_inequality  travelers'_tales  triadic_closure  tsallis_statistics  turbulence  turing_mechanism  turkey  turner.adair  tutorials  twitter  two-sample_tests  ulam.stanislaw  unions  universal_prediction  university_industrial_complex  us_politics  utter_stupidity  vaccination  vagueness  value_of_information  van_alystne.marshall  van_der_vaart.aad  van_de_geer.sara  van_handel.ramon  van_roy.benjamin  variable_selection  variance_estimation  variational_inference  vc-dimension  ventura.valerie  via:?  via:aaronsw  via:afinetheorem  via:aks  via:aleks  via:alessandro  via:anand  via:anoopsarkar  via:arthegall  via:asarwate  via:a_fine_theorem  via:bruces  via:chl  via:coleman  via:ded-maxim  via:dsquared  via:flaxman  via:flint_riemen  via:gmg  via:henry_farrell  via:ICCI  via:joncgoodwin  via:justin  via:klk  via:krugman  via:mathbabe  via:mejn  via:merriam  via:mind-hacks  via:moritz-heene  via:mraginsky  via:neuroanthropology  via:nick-watkins  via:orzelc  via:rortybomb  via:ryan_t  via:scotte  via:shivak  via:suresh  via:tomslee  via:tsuomela  via:vaguery  via:wiggins  videogames  vietnam_war  violence  vishwanathan.s.v.n.  vision  visual_display_of_quantitative_information  von_hippel.eric  von_neumann.john  voter_model  voting  vul.edward  wahba.grace  wainright.martin  waiting_times  wallace.david  war  warmuth.manfred  wasserman.larry  watts.duncan  wavelets  web  weiss.benjamin  welling.max  whats_gone_wrong_with_america  why_oh_why_cant_we_have_a_better_academic_publishing_system  why_oh_why_cant_we_have_a_better_press_corps  wiggins.chris  willett.rebecca  world_history  WWI  young.j.z.  yu.bin  zhang.tong 

Copy this bookmark:



description:


tags: