cshalizi + visual_display_of_quantitative_information   60

PM's Question Time: The price elasticity of labor-saving devices
"Fourth, the presentist bias in this chart is extreme in two ways. First, we forget things that we don't count as "technology" anymore (e.g., toilets, coal furnaces, sewing machines), and so they are left off. Second, we don't know what innovations are at low levels of adoption right now--imagine someone in 1960 trying to predict the adoption arc for personal computers!--and so our current rates of adoption are vastly overestimated compared to what the same chart will look like in 50 years."
to:blog  visual_display_of_quantitative_information  technological_change  the_present_before_it_was_widely_distributed 
7 weeks ago by cshalizi
Taylor & Francis Online :: Graphical Diagnostics for Markov Models for Categorical Data - Journal of Computational and Graphical Statistics - Volume 20, Issue 2
"Markov models are widely used as a method for describing categorical data that exhibit stationary and nonstationary autocorrelation. However, diagnostic methods are a largely overlooked topic for Markov models. We introduce two types of residuals for this purpose: one for assessing the length of runs between state changes, and the other for assessing the frequency with which the process moves from any given state to the other states. Methods for calculating the sampling distribution of both types of residuals are presented, enabling objective interpretation through graphical summaries. The graphical summaries are formed using a modification of the probability integral transformation that is applicable for discrete data. Residuals from simulated datasets are presented to demonstrate when the model is, and is not, adequate for the data. The two types of residuals are used to highlight inadequacies of a model posed for real data on seabed fauna from the marine environment."
to:NB  visual_display_of_quantitative_information  statistics  markov_models  to_teach:undergrad-ADA 
8 weeks ago by cshalizi
Taylor & Francis Online :: Dissimilarity Plots: A Visual Exploration Tool for Partitional Clustering - Journal of Computational and Graphical Statistics - Volume 20, Issue 2
"For hierarchical clustering, dendrograms are a convenient and powerful visualization technique. Although many visualization methods have been suggested for partitional clustering, their usefulness deteriorates quickly with increasing dimensionality of the data and/or they fail to represent structure between and within clusters simultaneously. In this article we extend (dissimilarity) matrix shading with several reordering steps based on seriation techniques. Both ideas, matrix shading and reordering, have been well known for a long time. However, only recent algorithmic improvements allow us to solve or approximately solve the seriation problem efficiently for larger problems. Furthermore, seriation techniques are used in a novel stepwise process (within each cluster and between clusters) which leads to a visualization technique that is able to present the structure between clusters and the micro-structure within clusters in one concise plot. This not only allows us to judge cluster quality but also makes misspecification of the number of clusters apparent. We give a detailed discussion of the construction of dissimilarity plots and demonstrate their usefulness with several examples. Experiments show that dissimilarity plots scale very well with increasing data dimensionality."
to:NB  visual_display_of_quantitative_information  clustering  data_mining  to_teach:data-mining 
8 weeks ago by cshalizi
Taylor & Francis Online :: Functional Boxplots - Journal of Computational and Graphical Statistics - Volume 20, Issue 2
"This article proposes an informative exploratory tool, the functional boxplot, for visualizing functional data, as well as its generalization, the enhanced functional boxplot. Based on the center outward ordering induced by band depth for functional data, the descriptive statistics of a functional boxplot are: the envelope of the 50% central region, the median curve, and the maximum non-outlying envelope. In addition, outliers can be detected in a functional boxplot by the 1.5 times the 50% central region empirical rule, analogous to the rule for classical boxplots. The construction of a functional boxplot is illustrated on a series of sea surface temperatures related to the El Niño phenomenon and its outlier detection performance is explored by simulations. As applications, the functional boxplot and enhanced functional boxplot are demonstrated on children growth data and spatio-temporal U.S. precipitation data for nine climatic regions, respectively. This article has supplementary material online."
to:NB  visual_display_of_quantitative_information  statistics  functional_data_analysis 
8 weeks ago by cshalizi
Four Ways to Slice Obama’s 2013 Budget Proposal - Interactive Feature - NYTimes.com
Not sure how useful this is as an actual visualization, but very nice as eye candy. (And, if you look at the department totals, as an illustration of "an insurance company with an army".)
visual_display_of_quantitative_information  us_politics  economic_policy  via:flowing_data  to:blog 
february 2012 by cshalizi
A General Framework for Dimensionality-Reducing Data Visualization Mapping
"In recent years, a wealth of dimension-reduction techniques for data visualization and preprocessing has been established. Nonparametric methods require additional effort for out-of-sample extensions, because they provide only a mapping of a given finite set of points. In this letter, we propose a general view on nonparametric dimension reduction based on the concept of cost functions and properties of the data. Based on this general principle, we transfer nonparametric dimension reduction to explicit mappings of the data manifold such that direct out-of-sample extensions become possible. Furthermore, this concept offers the possibility of investigating the generalization ability of data visualization to new data points. We demonstrate the approach based on a simple global linear mapping, as well as prototype-based local linear mappings. In addition, we can bias the functional form according to given auxiliary information. This leads to explicit supervised visualization mappings with discriminative properties comparable to state-of-the-art approaches."
in_NB  dimension_reduction  visual_display_of_quantitative_information  data_analysis  data_mining  manifold_learning  to_teach:data-mining 
february 2012 by cshalizi
[1111.1855] Fr'echet means of curves for signal averaging and application to ECG data analysis
"Signal averaging is the process that consists in computing a mean shape from a set of noisy signals. In the presence of geometric variability in time in the data, the usual Euclidean mean of the raw data yields a mean pattern that does not reflect the typical shape of the observed signals. In this setting, it is necessary to use alignment techniques for a precise synchronization of the signals, and then to average the aligned data to obtain a consistent mean shape. In this paper, we study the numerical performances of Fr'echet means of curves which are extensions of the usual Euclidean mean to spaces endowed with non-Euclidean metrics. This yields a new algorithm for signal averaging without a reference template. We apply this approach to the estimation of a mean heart cycle from ECG records."
to:NB  statistics  data_analysis  visual_display_of_quantitative_information 
november 2011 by cshalizi
IDV User Experience: Shipping Mix
I'm not 100% sure what we're supposed to learn from this, other than that shipping concentrates at ports and straits, but it's cute.
visual_display_of_quantitative_information  maps  trade  logistics  via:schweitzer 
october 2011 by cshalizi
[1110.3917] How to Evaluate Dimensionality Reduction? - Improving the Co-ranking Matrix
"The growing number of dimensionality reduction methods available for data visualization has recently inspired the development of quality assessment measures, in order to evaluate the resulting low-dimensional representation independently from a methods' inherent criteria. Several (existing) quality measures can be (re)formulated based on the so-called co-ranking matrix, which subsumes all rank errors (i.e. differences between the ranking of distances from every point to all others, comparing the low-dimensional representation to the original data). The measures are often based on the partioning of the co-ranking matrix into 4 submatrices, divided at the K-th row and column, calculating a weighted combination of the sums of each submatrix. Hence, the evaluation process typically involves plotting a graph over several (or even all possible) settings of the parameter K. Considering simple artificial examples, we argue that this parameter controls two notions at once, that need not necessarily be combined, and that the rectangular shape of submatrices is disadvantageous for an intuitive interpretation of the parameter. We debate that quality measures, as general and flexible evaluation tools, should have parameters with a direct and intuitive interpretation as to which specific error types are tolerated or penalized. Therefore, we propose to replace K with two parameters to control these notions separately, and introduce a differently shaped weighting on the co-ranking matrix. The two new parameters can then directly be interpreted as a threshold up to which rank errors are tolerated, and a threshold up to which the rank-distances are significant for the evaluation. Moreover, we propose a color representation of local quality to visually support the evaluation process for a given mapping, where every point in the mapping is colored according to its local contribution to the overall quality." --- Look at this carefully, and see if it could be taught in data mining (and whether it's worth doing so.)
to:NB  dimension_reduction  statistics  data_analysis  visual_display_of_quantitative_information  to_teach:data-mining 
october 2011 by cshalizi
R Graph Gallery - Donations Welcome - Romain Francois, Professional R Enthusiast
The R Graph Gallery is an under-utilized resource, and sending a little money Romain's way is not a bad thing.
R  programming  to_teach:statcomp  statistics  visual_display_of_quantitative_information 
october 2011 by cshalizi
Visualization methods for longitudinal social networks and stochastic actor-oriented modeling
"As a consequence of the rising interest in longitudinal social networks and their analysis, there is also an increasing demand for tools to visualize them. We argue that similar adaptations of state-of-the-art graph-drawing methods can be used to visualize both, longitudinal networks and predictions of stochastic actor-oriented models (SAOMs), the most prominent approach for analyzing such networks. The proposed methods are illustrated on a longitudinal network of acquaintanceship among university freshmen."
social_networks  network_data_analysis  visual_display_of_quantitative_information  statistics 
july 2011 by cshalizi
Disease Maps: Epidemics on the Ground, Koch
"In the seventeenth century, a map of the plague suggested a radical idea—that the disease was carried and spread by humans. In the nineteenth century, maps of cholera cases were used to prove its waterborne nature. More recently, maps charting the swine flu pandemic caused worldwide panic and sent shockwaves through the medical community. In Disease Maps, Tom Koch contends that to understand epidemics and their history we need to think about maps of varying scale, from the individual body to shared symptoms evidenced across cities, nations, and the world.  "
books:noted  maps  epidemiology  history_of_science  history_of_medicine  contagion  plague  visual_display_of_quantitative_information  disease  medicine 
july 2011 by cshalizi
The Washington Monthly - The Magazine - The Information Sage
This confirms my sense from his books that Tufte is probably a complete pain in the ass to deal with, though genius must be excused much.  (Also, I will now take odds that he will succumb to the Brain Eater within 10 years, which will be much more of a tragedy than usual.)
tufte.edward  visual_display_of_quantitative_information  via:?  cult_followings  to:blog 
may 2011 by cshalizi
VIsualizing the Bubble « Sustainable Cities and Transport
Looks like there ought to be some sort of data collapse possible here (a common time trend multiplied by some city-specific noise?).
to_teach:undergrad-ADA  visual_display_of_quantitative_information  mortgage_crisis 
march 2011 by cshalizi
Building a Better Word Cloud « Zero Intelligence Agents
I like the point that the axes in a plot should _mean_ something.  Not sure that these are the best choices however --- what if I want to just deal with one document, or for that matter with three?
visual_display_of_quantitative_information  text_mining 
february 2011 by cshalizi
Information-Theoretic Methods for the Visual Analysis of Climate and Flow Data (Tutorial Slides)
My reaction to this must, I imagine, be a little bit like how a proud parent feels when they hear from someone else about their child doing something worthwhile.
visual_display_of_quantitative_information  complexity_measures  computational_mechanics  via:georg  janicke.heiki  to:blog 
october 2010 by cshalizi
[1003.0529] A Unified Algorithmic Framework for Multi-Dimensional Scaling
"In this paper, we propose a unified algorithmic framework for solving many known variants of \mds. Our algorithm is a simple iterative scheme with guaranteed convergence, and is \emph{modular}; by changing the internals of a single subroutine in the algorithm, we can switch cost functions and target spaces easily. In addition to the formal guarantees of convergence, our algorithms are accurate; in most cases, they converge to better quality solutions than existing methods, in comparable time. "
multidimensional_scaling  dimension_reduction  visual_display_of_quantitative_information  to_teach:data-mining  data_mining 
march 2010 by cshalizi
[0911.3349] Seeing Science
"The ability to represent scientific data and concepts visually is becoming increasingly important due to the unprecedented exponential growth of computational power during the present digital age. The data sets and simulations scientists in all fields can now create are literally thousands of times as large as those created just 20 years ago. Historically successful methods for data visualization can, and should, be applied to today's huge data sets, but new approaches, also enabled by technology, are needed as well. Increasingly, "modular craftsmanship" will be applied, as relevant functionality from the graphically and technically best tools for a job are combined as-needed, without low-level programming."
visual_display_of_quantitative_information  have_read  automating_craft 
november 2009 by cshalizi
Visualizing Empires Decline
It's a start, but: where's China? Russia? The Ottomans and the Habsburgs? The US? Japan? Holland? (Also: what're the units, area or population?)
visual_display_of_quantitative_information  imperialism  world_history  via:idlethink 
november 2009 by cshalizi
Schneier on Security: Police Data Mining Done Right
Sounds more like straight-up visualization than data-mining --- not that that's bad! The human visual cortex is a powerful pattern-recognition technology, albeit largely undocumented and without any basis in existing theory.
data-mining  police  to_teach:data-mining  visual_display_of_quantitative_information 
june 2009 by cshalizi
Invariant co-ordinate selection
"A general method for exploring multivariate data by comparing different estimates of multivariate scatter is presented. The method is based on the eigenvalue–eigenvector decomposition of one scatter matrix relative to another. In particular, it is shown that the eigenvectors can be used to generate an affine invariant co-ordinate system for the multivariate data. Consequently, we view this method as a method for invariant co-ordinate selection."
statistics  data_analysis  visual_display_of_quantitative_information  principal_components 
june 2009 by cshalizi
Grammar of Graphics 2 (R)
Nice-looking graphics system for R; draft book and R package.
R  visual_display_of_quantitative_information  via:kjhealy  books:noted 
january 2009 by cshalizi
Wordle - My mind as a c. 1965 book cover design
I like the fact that the largest single element is the one reminding me to transfer things to the notebooks...
social_media  pretty_pictures  visual_display_of_quantitative_information  via:vaguery 
june 2008 by cshalizi
Skyeome.net » Blog Archive » Fashionable Networks
Skye is right; compared to what actual designers produce, our graphics are _painfully ugly_ and _uncompelling_. How can we do better?
visual_display_of_quantitative_information  design  bender-de_moll.skye 
april 2008 by cshalizi
WikiPediaVision (beta)
Watch where people are editing Wikipedia, as they edit Wikipedia
visual_display_of_quantitative_information  funny:geeky  wikipedia  via:logista 
october 2007 by cshalizi
The Topography of Poverty in the United States: A Spatial Analysis Using County-Level Data From the Community Health Status Indicators Project
"A distinctive north–south demarcation of low versus high poverty concentrations was found, along with isolated pockets of high and low poverty within areas in which the predominant poverty rates were opposite. This pattern can be described as following
statistics  poverty  inequality  america  american_south  visual_display_of_quantitative_information  via:john-burke  to_teach:data-mining  to:blog 
october 2007 by cshalizi

related tags

academia  ai  america  american_south  automating_craft  bad_data_analysis  bender-de_moll.skye  bergstrom.carl  bibliometry  books:noted  books:recommended  bootstrap  brumm.maria  cartoons  cham.jorge  citation_networks  clustering  cognitive_science  community_discovery  complexity_measures  computational_mechanics  computational_statistics  contagion  cox.amanda  cult_followings  data-mining  data_analysis  data_mining  debunking  design  dimension_reduction  disease  dondis.donis_a.  early_modern_european_history  economics  economic_policy  eichengreen.barry  electric_power_grid  epidemiology  experimental_psychology  financial_crisis_of_2007--  fluid_mechanics  freeman.linton_c.  fry.ben  functional_data_analysis  funny:geeky  funny:tasteless  gelman.andrew  great_depression  have_read  hinton.geoffrey  history_of_ideas  history_of_medicine  history_of_science  hyperbole  hypothesis_testing  imperialism  inequality  inflation  information_theory  infrastructure  in_NB  janicke.heiki  kith_and_kin  labor  linear_regression  logistics  machine_learning  manifold_learning  mapping  maps  markov_models  medicine  minimum_description_length  mortgage_crisis  multidimensional_scaling  neo-conservatism  networks  network_data_analysis  newman.mark  occupy_wall_street  plague  police  political_science  poverty  pretty_pictures  principal_components  programming  public_opinion  r  re:network_differences  re:stacs  red_state_blue_state  rose.stephen_j.  rosvall.martin  running_dogs_of_reaction  social_life_of_the_mind  social_media  social_networks  software  something_about_america  statistics  technological_change  text_mining  the_present_before_it_was_widely_distributed  time_series  to:blog  to:NB  to_read  to_teach:complexity-and-inference  to_teach:data-mining  to_teach:statcomp  to_teach:undergrad-ADA  trade  tufte.edward  us_politics  utter_stupidity  van_der_maaten.laurens  vast_right-wing_conspiracy  via:?  via:aaron_clauset  via:chl  via:dpfeldman  via:dsparks  via:flowing_data  via:gelman  via:georg  via:idlethink  via:jbdelong  via:john-burke  via:joncgoodwin  via:kjhealy  via:krugman  via:logista  via:phnk  via:rob_h  via:schweitzer  via:unfogged  via:vaguery  visual_display_of_quantitative_information  whats_gone_wrong_with_america  wikipedia  world_history 

Copy this bookmark:



description:


tags: