cshalizi + heavy_tails   69

On robust tail index estimation for linear long-memory processes - Beran - 2012 - Journal of Time Series Analysis - Wiley Online Library
"We consider robust estimation of the tail index α for linear long-memory processes with i.i.d. innovations εj following a symmetric α-stable law (1 < α < 2) and coefficients aj ∼ c·j−β. Estimates based on the left and right tail respectively are obtained together with a combined statistic with improved efficiency, and a test statistic comparing both tails. Asymptotic results are derived. Simulations illustrate the finite sample performance."
to:NB  heavy_tails  time_series  statistics  beran.jan 
8 weeks ago by cshalizi
[1203.0738] Avalanche analysis from multi-electrode ensemble recordings in cat, monkey and human cerebral cortex during wakefulness and sleep
"Self-organized critical states are found in many natural systems, from earthquakes to forest fires, they have also been found in neural systems, particularly, in neuronal cultures. However, the presence of critical states in the awake brain remains controversial. Here, we compared avalanche analyses performed on different in vivo preparations during wakefulness, slow-wave sleep and REM sleep, in cat parietal cortex (8 electrodes), monkey motor cortex (64/96 electrodes) and human temporal cortex (96 electrodes) in epileptic patients. In neuronal avalanches defined from units (up to 152 single units), the size of avalanches never clearly scaled as power-law, but rather scaled exponentially or displayed intermediate scaling. We also analyzed the dynamics of local field potentials (LFPs) and in particular LFP negative peaks (nLFPs) among the different electrodes (up to 96 sites in temporal cortex or up to 128 sites in adjacent motor and pre-motor cortices). In this case, the avalanches defined from nLFPs displayed power-law scaling in double logarithmic representations, as reported previously in monkey. However, avalanche defined as positive LFP (pLFP) peaks, which are not related to neuronal firing, also displayed apparent power-law scaling. Closer examination of this scaling using the more reliable cumulative distribution function (CDF) and other rigorous statistical measures, did not confirm power-law scaling. The same pattern was seen for cats, monkey and human, as well as for different brain states of wakefulness and sleep. We also tested other alternative distributions. While simple exponentials yielded very good fits of the avalanche dynamics, the "sum of exponentials" provided the best fit to the data. Collectively, these results show no clear evidence for power-law scaling or self-organized critical states in the awake and sleeping brain of mammals, from cat to man."

Impressions from a quick scan: yes, those are not power laws (way too curved), but no, you cannot use R^2 like that --- and in fact we explained why, in that paper you cite. Oy.
to:NB  self-organized_criticality  neuroscience  to_read  heavy_tails 
11 weeks ago by cshalizi
The Power (Law) of Twitter - NYTimes.com
And here I was worried from the headline that I might have to call out Uncle Paul.
twitter  social_media  heavy_tails  krugman.paul 
february 2012 by cshalizi
Heavy tail phenomenon and convergence to stable laws for iterated Lipschitz maps
To many math symbols to copy the abstract. Shorter: iterating randomly chosen Lipschitz maps can lead to time-averges converging to a heavy-tailed distribution.
to:NB  to_read  heavy_tails  stochastic_processes  dynamical_systems  to_teach:complexity-and-inference 
november 2011 by cshalizi
"Dynamic threshold modeling of budget changes"
"A family of models was given to explain how the public budgeting process, as a multi-stage institutional decision making mechanism transforms the stimuli characterized by Gaussian distribution to skew, power law distributions. While the annual change is generally incremental, deviations from this incremental changes are more frequent, than the Gaussian distribution suggests. A set of threshold models, reflecting error-accumulation and friction, was suggested. The three-threshold model seems to be good to describe appropriately the basic statistical features of the data."
have_read  heavy_tails  political_science  via:blyth 
november 2011 by cshalizi
Corrections to the Central Limit Theorem for Heavy-Tailed Probability Densities - Journal of Theoretical Probability, Volume 24, Number 4
"Classical Edgeworth expansions provide asymptotic correction terms to the Central Limit Theorem (CLT) up to an order that depends on the number of moments available. In this paper, we provide subsequent correction terms beyond those given by a standard Edgeworth expansion in the general case of regularly varying distributions with diverging moments (beyond the second). The subsequent terms can be expressed in a simple closed form in terms of certain special functions (Dawson’s integral and parabolic cylinder functions), and there are qualitative differences depending on whether the number of moments available is even, odd, or not an integer, and whether the distributions are symmetric or not. If the increments have an even number of moments, then additional logarithmic corrections must also be incorporated in the expansion parameter. An interesting feature of our correction terms for the CLT is that they become dominant outside the central region and blend naturally with known large-deviation asymptotics when these are applied formally to the spatial scales of the CLT."
to:NB  re:almost_none  heavy_tails  central_limit_theorem  large_deviations 
november 2011 by cshalizi
Goerg : Lambert W random variables—a new family of generalized skewed distributions with applications to risk estimation
"Originating from a system theory and an input/output point of view, I introduce a new class of generalized distributions. A parametric nonlinear transformation converts a random variable X into a so-called Lambert W random variable Y, which allows a very flexible approach to model skewed data. Its shape depends on the shape of X and a skewness parameter γ. In particular, for symmetric X and nonzero γ the output Y is skewed. Its distribution and density function are particular variants of their input counterparts. Maximum likelihood and method of moments estimators are presented, and simulations show that in the symmetric case additional estimation of γ does not affect the quality of other parameter estimates. Applications in finance and biomedicine show the relevance of this class of distributions, which is particularly useful for slightly skewed data. A practical by-result of the Lambert W framework: data can be “unskewed.”

The R package LambertW developed by the author is publicly available (CRAN)."

I'm very proud.
to:NB  kith_and_kin  goerg.georg_m.  statistics  estimation  distributions  heavy_tails 
october 2011 by cshalizi
Maximum Kernel Likelihood Estimation - Journal of Computational and Graphical Statistics - 17(4):976
"We introduce an estimator for the population mean based on maximizing likelihoods formed by parameterizing a kernel density estimate. Due to these origins, we have dubbed the estimator the maximum kernel likelihood estimate (MKLE). A speedy computational method to compute the MKLE based on binning is implemented in a simulation study which shows that the MKLE at an optimal bandwidth is decidedly superior in terms of efficiency to the sample mean and other measures of location for heavy tailed symmetric distributions. An empirical rule and a computational method to estimate this optimal bandwidth are developed and used to construct bootstrap confidence intervals for the population mean. We show that the intervals have approximately nominal coverage and have significantly smaller average width than the standard t and z intervals. Finally, we develop some mathematical properties for a very close approximation to the MKLE called the kernel mean. In particular, we demonstrate that the kernel mean is indeed unbiased for the population mean for symmetric distributions."
statistics  estimation  kernel_estimators  to:NB  heavy_tails 
october 2011 by cshalizi
Temporary Employment Agencies Make the World Smaller
"This paper investigates how employment intermediaries affected the inter-firm network of worker mobility in an region of Italy in response of the reform that first allowed for temporary employment agencies in 1997. We map worker reallocations from a matched employer-employee dataset onto a directed graph, where vertices indicate firms, and links denote transfers of workers between firms. Using network-based methodologies we find that temporary employment agencies significantly increase network integration and practicability, while fastly increasing control over hiring channels. The policy implications of the results are discussed, highlighting the potential of network analysis as monitoring tool for regional and local labour markets."
networks  labor  economics  heavy_tails  to:NB 
october 2011 by cshalizi
[1108.0833] Temporal statistical analysis on human article creation patterns
Sadly, in this case fitting crappy power laws to the works of Gene Stanley and Laszlo Barabasi is not an_intentional_ joke.
bad_data_analysis  heavy_tails  barabasi.albert-laszlo  stanley.h._eugene  newman.mark  su.shi  have_read  blogged 
august 2011 by cshalizi
Network structure of production
"Complex social networks have received increasing attention from researchers. Recent work has focused on mechanisms that produce scale-free networks. We theoretically and empirically characterize the buyer–supplier network of the US economy and find that purely scale-free models have trouble matching key attributes of the network. We construct an alternative model that incorporates realistic features of firms’ buyer–supplier relationships and estimate the model’s parameters using microdata on firms’ self-reported customers. This alternative framework is better able to match the attributes of the actual economic network and aids in further understanding several important economic phenomena."
to_read  networks  economics  to_teach:complexity-and-inference  heavy_tails 
march 2011 by cshalizi
Phys. Rev. E 83, 031123 (2011): Weibull-type limiting distribution for replicative systems
"The Weibull function is widely used to describe skew distributions observed in nature. However, the origin of this ubiquity is not always obvious to explain. In the present paper, we consider the well-known Galton-Watson branching process describing simple replicative systems. The shape of the resulting distribution, about which little has been known, is found essentially indistinguishable from the Weibull form in a wide range of the branching parameter; this can be seen from the exact series expansion for the cumulative distribution, which takes a universal form. We also find that the branching process can be mapped into a process of aggregation of clusters. In the branching and aggregation process, the number of events considered for branching and aggregation grows cumulatively in time, whereas, for the binomial distribution, an independent event occurs at each time with a given success probability."
branching_processes  heavy_tails  in_NB  re:aggregating_random_graphs 
march 2011 by cshalizi
Mikosch , Resnick , Rootzén , Stegeman : Is Network Traffic Appriximated by Stable Lévy Motion or Fractional Brownian Motion?
"Cumulative broadband network traffic is often thought to be well modeled by fractional Brownian motion (FBM). However, some traffic measurements do not show an agreement with the Gaussian marginal distribution assumption. We show that if connection rates are modest relative to heavy tailed connection length distribution tails, then stable Lévy motion is a sensible approximation to cumulative traffic over a time period. If connection rates are large relative to heavy tailed connection length distribution tails, then FBM is the appropriate approximation. The results are framed as limit theorems for a sequence of cumulative input processes whose connection rates are varying in such a way as to remove or induce long range dependence."
heavy_tails  stochastic_processes  convergence_of_stochastic_processes  re:almost_none  long-range_dependence 
january 2011 by cshalizi
The Power Law Shop
"I went to a physics conference, and all I got was a lousy power law"
funny:geeky  funny:malicious  heavy_tails  statistics  bad_data_analysis  porter.mason  via:aaron_clauset 
september 2010 by cshalizi
Chaos, Complexity, and Inference, Lecture 24: Contagion on Networks
I'll have to revise this (in light of arxiv:1004.4704, no less!), but it was a very fun lecture to write and give, and covers the essential points. (& of course blew most of the kids minds.)
epidemiology  epidemiology_of_ideas  epidemic_models  plagues_and_peoples  bubonic_plague  percolation  mongol_empire  world_history  medieval_eurasian_history  heavy_tails  self-promotion  networks  contagion  influence  branching_processes 
june 2010 by cshalizi
[1004.3138] Statistical Analysis of Global Connectivity and Activity Distributions in Cellular Networks
"we analyze a comprehensive data set of protein-protein and transcriptional regulatory interaction networks in yeast, an E. coli metabolic network, and gene activity profiles for different metabolic states in both organisms. We show that in all cases the networks have a heavy-tailed distribution, but most of them present significant differences from a power-law model according to a stringent statistical test. Those few data sets that have a statistically significant fit with a power-law model follow other distributions equally well. Thus, while our analysis supports that both global connectivity interaction networks and activity distributions are heavy-tailed, they are not generally described by any specific distribution model, leaving space for further inferences on generative models."
have_read  biochemical_networks  heavy_tails  blogged 
april 2010 by cshalizi
[1003.2159] Central Limit Theorem and Large Deviations for truncated heavy-tailed random vectors
"the extent to which truncated heavy tailed random vectors retain the characteristic features of heavy tailed random vectors, is answered from the point of views of the central limit theorem and the large deviations behavior. The analysis of the central limit behavior of the partial sums of observations coming from a heavy-tailed model is done for random vectors taking values in a separable Banach space. For the large deviations analysis, the random vectors are assumed to be R^d-valued. It turns out that there are two regimes depending on the growth rate of the truncating threshold, so that in one regime, much of the heavy tailedness is retained, while in the other regime, the same is lost."
heavy_tails  large_deviations  probability 
march 2010 by cshalizi
[cond-mat/0009219] Renormalization Group and Probability Theory
Understanding phase transitions probabilistically, as places where the failure of mixing makes the ordinary central limit theorem break down, and non-Gaussian, heavy-tailed distributions appear for macroscopic averages. (I think I bookmarked this in 2000 and then forgot about it... and making me find it again is the only good thing about refereeing this **** paper, grumble.)
probability  heavy_tails  phase_transitions  renormalization  limit_theorems  random_fields  statistical_mechanics  to_teach:complexity-and-inference  have_read 
august 2009 by cshalizi
Universal Generation of Statistical Self-Similarity: A Randomized Central Limit Theorem
Sounds suspiciously like they're rediscovering the connection between random walks and stable distributions.
heavy_tails  central_limit_theorem  to_be_shot_after_a_fair_trial 
july 2009 by cshalizi
Superstars without Talent? The Yule Distribution Controversy
"Chung and Cox (1994) provided an intuitively appealing stochastic model indicating that superstars may exist regardless of talent, giving rise to the Yule distribution. We adopt a different empirical approach and test its goodness of fit using a parametric bootstrap and several powerful test statistics. Just like the discrete Pareto distribution, it is overwhelmingly rejected: it is a fairly accurate approximation of the lower quantiles of the superstar distribution but overestimates the snowball effect that makes consumers purchase records of the most successful artists. In other words, the Yule distribution captures stardom, but not superstardom. A generalization of the Yule distribution provides an excellent fit in two of the three data sets." --- We only seem to subscribe with a one-issue delay (?); preprint at http://swopec.hhs.se/hastef/papers/hastef0658.pdf
heavy_tails  inequality  economics_of_superstars  hypothesis_testing  economics  statistics  evisceration  have_read 
july 2009 by cshalizi
Multiplicative Noise and Second Order Phase Transitions
"The scale-free distribution of cluster sizes in continuous phase transitions is linked to the law of proportional effect. A numerical study of a two-dimensional Ising model suggests that a cluster size undergoes a multiplicative birth-death process. At the transition the ratio between birth and death rates approaches unity for large clusters, and the resulting steady state shows a power-law behavior. The percolation dynamic, on the other hand, yields a geometric phase transition without ergodicity breaking, where large-scale merging and splitting of clusters dominate the distribution. Instead of short-range birth-death jumps, the percolation transition is characterized by Lévi [sic] flights along the cluster-size axis."
phase_transitions  statistical_mechanics  stochastic_processes  heavy_tails  to_teach:complexity-and-inference  re:almost_none 
july 2009 by cshalizi
Comment on ``Coexistence of Self-Organized Criticality and Intermittent Turbulence in the Solar Corona''
Shorter Watkins-Chapman-Rosenberg: It is vain to posit two mechanisms to explain two effects, when one of them will produce both effects. --- This seems like a sound instance of Occam's Razor (as they say themselves), but it is not clear to me how to formalize this in either the compact-description way or in Kevin Kelly's.
self-organized_criticality  turbulence  plasmas  heavy_tails  kith_and_kin  occams_razor 
july 2009 by cshalizi
[0906.3202] Distance Is Not Dead: Social Interaction and Geographical Distance in the Internet Era
Well, their power law estimation is bad, of course, but more to the point I don't think they're really dealing with an interesting version of the thesis they set out to undermine. (At the very least: even if geography was irrelevant for Internet users, the latter are not uniformly distributed geographically.) The pictures of the diffusion of baby names are cool, though.
geography  the_internet  diffusion_of_innovations  epidemiology_of_representations  social_networks  heavy_tails  shot_after_a_fair_trial  re:critique_of_diffusion  re:social-networks-as-sensor-networks 
june 2009 by cshalizi
[0903.2533] An evolutionary model of long tailed distributions in the social sciences
This is the Yule-Simon model with a limited memory effect. The rank-size plots (i.e., empirical CDFs) they show make me pretty sure they're not producing power laws, though they may be power laws with exponential truncation; since their stats are bad, it's hard to say. Should make good take-home-final fodder, however.
heavy_tails  shot_after_a_fair_trial  to_teach:complexity-and-inference  statistics  to:blog  have_read 
march 2009 by cshalizi
"Tail events": phrase considered harmful
"As I regularly find myself having to remind cadet risk managers with newly-minted PhDs in financial econometrics, the Great Depression did actually happen; it wasn't just a particularly innaccurate observation of the underlying 4% rate of return on equities."
dsquared  finance  heavy_tails  economics 
february 2009 by cshalizi
Structure+Strangeness: Power laws in the mist
Hallucinating power-laws in the interactome. Complete with a sound re-analysis of the data!
heavy_tails  bioinformatics  molecular_biology  interactome  clauset.aaron  barabasi.albert-laszlo  bad_data_analysis 
october 2008 by cshalizi
Steve Klepper
Vigorously critiqued Lamberson during his talk
economics  industrial_organization  heavy_tails  to:NB 
february 2008 by cshalizi

related tags

astrophysics  bad_data_analysis  barabasi.albert-laszlo  beran.jan  bibliometry  biochemical_networks  bioinformatics  blogged  books:noted  branching_processes  bubonic_plague  carnegie_mellon  central_limit_theorem  chapman.sandra  chess  clauset.aaron  clustering  complexity  conferences  contagion  convergence_of_stochastic_processes  dasgupta.anirban  data_analysis  debunking  density_estimation  de_haan.laurens  diffusion_of_innovations  distributions  drees.holger  dsquared  dynamical_systems  earthquakes  econometrics  economics  economics_of_superstars  elberse.anita  epidemic_models  epidemiology  epidemiology_of_ideas  epidemiology_of_representations  ergodic_theory  estimation  evisceration  extreme_value_theory  finance  freckleton.robert_p  functional_central_limit_theorem  functional_connectivity  funny:geeky  funny:malicious  g20  gabaix.xavier  geography  geology  gibrats_law  gigs  gives_physicists_a_bad_name  goerg.georg_m.  good-turing_estimation  good.i.j.  graphical_models  have_read  headdesk  heavy_tails  hopcroft.john  hypothesis_testing  industrial_organization  inequality  influence  interactome  internet  in_NB  kernel_estimators  kith_and_kin  kleinberg.jon  krugman.paul  labor  large_deviations  levy_processes  liberman.mark  libraries  limit_theorems  long-range_dependence  macroeconomics  marketing  markov_models  medieval_eurasian_history  menczer.filippo  mitchell.melanie  mixture_models  molecular_biology  mongol_empire  networked_life  networks  network_data_analysis  neuroscience  newman.mark  obituaries  occams_razor  order_statistics  percolation  phase_transitions  pittsburgh  plagues_and_peoples  plasma  plasmas  political_networks  political_science  porter.mason  prediction  preferential_attachment  price.derek_de_solla  probability  psychology  random_fields  random_walks  rare_events  re:aggregating_random_graphs  re:almost_none  re:AoS_project  re:critique_of_diffusion  re:social-networks-as-sensor-networks  renormalization  sandler.mark  scaling  self-centered  self-organized_criticality  self-promotion  self-referential  semantics_from_syntax  shot_after_a_fair_trial  social_media  social_networks  spatial_statistics  stanley.h._eugene  statistical_inference_for_stochastic_processes  statistical_mechanics  statistics  stochastic_processes  su.shi  sutherland.william_j  the_internet  the_wired_ideology  time_series  to:blog  to:NB  to_be_shot_after_a_fair_trial  to_read  to_teach:complexity-and-inference  traceroute  turbulence  twitter  via:aaron_clauset  via:blyth  via:email  via:henry_farrell  via:nick-watkins  via:slee.tom  violence  watkins.nicholas  wilks.s._s.  world_history 

Copy this bookmark:



description:


tags: