cshalizi + re:smoothing_adjacency_matrices 27
Graphlets: a Spectral Perspective for Graph Limits
4 weeks ago by cshalizi
"Graphlets give a spectral approach to graph limits for general graph sequences in a framework that unifies previous disparate approaches for dealing with dense graphs and sparse graphs. We will show that the con- vergence to graphlets under the appropriate spectral distance is equivalent to the convergence using the (normalized) cut distance. We then examine the geometry of graphlets, illustrated by examples of several families of graphlets and, in particular, graphlets with low ranks. We further dis- cuss a number of usages of graphlets, including universal scalable bases, universal embeddings vis heat kernels and the preservation of Cheeger cuts."
ETA: This is so not an easy read. I like what I understand, but I definitely have to make another attack on it.
to:NB
to_read
graph_theory
graph_limits
re:smoothing_adjacency_matrices
re:network_differences
chung.fan
via:alessandro
graph_spectra
ETA: This is so not an easy read. I like what I understand, but I definitely have to make another attack on it.
4 weeks ago by cshalizi
[1201.5871] Null models for network data
5 weeks ago by cshalizi
"The analysis of datasets taking the form of simple, undirected graphs continues to gain in importance across a variety of disciplines. Two choices of null model, the logistic-linear model and the implicit log-linear model, have come into common use for analyzing such network data, in part because each accounts for the heterogeneity of network node degrees typically observed in practice. Here we show how these both may be viewed as instances of a broader class of null models, with the property that all members of this class give rise to essentially the same likelihood-based estimates of link probabilities in sparse graph regimes. This facilitates likelihood-based computation and inference, and enables practitioners to choose the most appropriate null model from this family based on application context. Comparative model fits for a variety of network datasets demonstrate the practical implications of our results."
in_NB
network_data_analysis
have_read
statistics
estimation
approximation
re:smoothing_adjacency_matrices
5 weeks ago by cshalizi
[1202.3123] Right-convergence of sparse random graphs
february 2012 by cshalizi
"The paper is devoted to the problem of establishing right-convergence of sparse random graphs. This concerns the convergence of the logarithm of number of homomorphisms from graphs or hyper-graphs $G_N, Nge 1$ to some target graph $W$. The theory of dense graph convergence, including random dense graphs, is now well understood, but its counterpart for sparse random graphs presents some fundamental difficulties. Phrased in the statistical physics terminology, the issue is the existence of the log-partition function limits, also known as free energy limits, appropriately normalized for the Gibbs distribution associated with $W$. In this paper we prove that the sequence of sparse ER graphs is right-converging when the tensor product associated with the target graph $W$ satisfies certain convexity property. We treat the case of discrete and continuous target graphs $W$. The latter case allows us to prove a special case of Talagrand's recent conjecture (more accurately stated as level III Research Problem 6.7.2 in his recent book), concerning the existence of the limit of the measure of a set obtained from $R^N$ by intersecting it with linearly in $N$ many subsets, generated according to some common probability law.
Our proof is based on the interpolation technique, introduced first by Guerra and Toninelli and developed further in a series of papers. Specifically, Bayati et al establish the right-convergence property for Erdos-Renyi graphs for some special cases of $W$. In this paper most of the results in this paper follow as a special case of our main theorem."
to:NB
to_read
graph_theory
graph_limits
re:smoothing_adjacency_matrices
Our proof is based on the interpolation technique, introduced first by Guerra and Toninelli and developed further in a series of papers. Specifically, Bayati et al establish the right-convergence property for Erdos-Renyi graphs for some special cases of $W$. In this paper most of the results in this paper follow as a special case of our main theorem."
february 2012 by cshalizi
Bickel , Chen , Levina : The method of moments and degree distributions for network models
november 2011 by cshalizi
"Probability models on graphs are becoming increasingly important in many applications, but statistical tools for fitting such models are not yet well developed. Here we propose a general method of moments approach that can be used to fit a large class of probability models through empirical counts of certain patterns in a graph. We establish some general asymptotic properties of empirical graph moments and prove consistency of the estimates as the graph size grows for all ranges of the average degree including Omega(1). Additional results are obtained for the important special case of degree distributions."
After reading this, I note that they do not go through even one example of actually estimating anything. I think this is because the inversion from moments to graphons, while mathematically well-defined, is hellish to calculate (and probably very numerically unstable).
network_data_analysis
statistics
estimation
bickel.peter
levina.elizaveta
re:smoothing_adjacency_matrices
in_NB
have_read
After reading this, I note that they do not go through even one example of actually estimating anything. I think this is because the inversion from moments to graphons, while mathematically well-defined, is hellish to calculate (and probably very numerically unstable).
november 2011 by cshalizi
Banff blog « An Ergodic Walk
october 2011 by cshalizi
Sounds delightful (and Banff is beautiful).
conferences
statistics
learning_theory
statistical_inference_for_stochastic_processes
track_down_references
markov_models
network_data_analysis
estimation
van_handel.ramon
nonparametrics
re:smoothing_adjacency_matrices
information_theory
october 2011 by cshalizi
[1110.5383] Quilting Stochastic Kronecker Product Graphs to Generate Multiplicative Attribute Graphs
october 2011 by cshalizi
"We describe the first sub-quadratic sampling algorithm for the Multiplicative Attribute Graph Model (MAGM) of Kim and Leskovec (2010). We exploit the close connection between MAGM and the Kronecker Product Graph Model (KPGM) of Leskovec et al. (2010), and show that to sample a graph from a MAGM it suffices to sample small number of KPGM graphs and emph{quilt} them together. Under mild technical conditions our algorithm runs in $O((log_2(n))^3 |E|)$ time, where $n$ is the number of nodes and $|E|$ is the number of edges in the sampled graph. We demonstrate the scalability of our algorithm via extensive empirical evaluation; we can sample a MAGM graph with 8 million nodes and 20 billion edges in under 6 hours."
in_NB
to_read
network_data_analysis
re:smoothing_adjacency_matrices
october 2011 by cshalizi
Orbanz : Projective limit random probabilities on Polish spaces
october 2011 by cshalizi
"A pivotal problem in Bayesian nonparametrics is the construction of prior distributions on the space M(V) of probability measures on a given domain V. In principle, such distributions on the infinite-dimensional space M(V) can be constructed from their finite-dimensional marginals—the most prominent example being the construction of the Dirichlet process from finite-dimensional Dirichlet distributions. This approach is both intuitive and applicable to the construction of arbitrary distributions on M(V), but also hamstrung by a number of technical difficulties. We show how these difficulties can be resolved if the domain V is a Polish topological space, and give a representation theorem directly applicable to the construction of any probability distribution on M(V) whose first moment measure is well-defined. The proof draws on a projective limit theorem of Bochner, and on properties of set functions on Polish spaces to establish countable additivity of the resulting random probabilities."
to:NB
probability
stochastic_processes
measure_theory
orbanz.peter
to_read
re:smoothing_adjacency_matrices
high-dimensional_probability
october 2011 by cshalizi
[1110.4088] Infinitely exchangeable random graphs generated from a Poisson point process on monotone sets and applications to cluster analysis for networks
october 2011 by cshalizi
"We construct an infinitely exchangeable process on the set $cate$ of subsets of the power set of the natural numbers $mathbb{N}$ via a Poisson point process with mean measure $Lambda$ on the power set of $mathbb{N}$. Each $Eincate$ has a least monotone cover in $catf$, the collection of monotone subsets of $cate$, and every monotone subset maps to an undirected graph $Gincatg$, the space of undirected graphs with vertex set $mathbb{N}$. We show a natural mapping $caterightarrowcatfrightarrowcatg$ which induces an infinitely exchangeable measure on the projective system $catg^{rest}$ of graphs $catg$ under permutation and restriction mappings given an infinitely exchangeable family of measures on the projective system $cate^{rest}$ of subsets with permutation and restriction maps. We show potential connections of this process to applications in cluster analysis, machine learning, classification and Bayesian inference."
to:NB
to_read
stochastic_processes
networks
network_data_analysis
point_processes
graph_limits
re:smoothing_adjacency_matrices
re:your_favorite_ergm_sucks
october 2011 by cshalizi
[1110.3225] Mining Patterns in Networks using Homomorphism
october 2011 by cshalizi
"In recent years many algorithms have been developed for finding patterns in graphs and networks. A disadvantage of these algorithms is that they use subgraph isomorphism to determine the support of a graph pattern; subgraph isomorphism is a well-known NP complete problem. In this paper, we propose an alternative approach which mines tree patterns in networks by using subgraph homomorphism. The advantage of homomorphism is that it can be computed in polynomial time, which allows us to develop an algorithm that mines tree patterns in arbitrary graphs in incremental polynomial time. Homomorphism however entails two problems not found when using isomorphism: (1) two patterns of different size can be equivalent; (2) patterns of unbounded size can be frequent. In this paper we formalize these problems and study solutions that easily fit within our algorithm."
in_NB
to_read
re:smoothing_adjacency_matrices
network_data_analysis
data_mining
graph_theory
october 2011 by cshalizi
Weisfiler-Lehman Graph Kernels
october 2011 by cshalizi
"In this article, we propose a family of efficient kernels for large graphs with discrete node labels. Key to our method is a rapid feature extraction scheme based on the Weisfeiler-Lehman test of isomorphism on graphs. It maps the original graph to a sequence of graphs, whose node attributes capture topological and label information. A family of kernels can be defined based on this Weisfeiler-Lehman sequence of graphs, including a highly efficient kernel comparing subtree-like patterns. Its runtime scales only linearly in the number of edges of the graphs and the length of the Weisfeiler-Lehman graph sequence. In our experimental evaluation, our kernels outperform state-of-the-art graph kernels on several graph classification benchmark data sets in terms of accuracy and runtime. Our kernels open the door to large-scale applications of graph kernels in various disciplines such as computational biology and social network analysis."
in_NB
network_data_analysis
kernel_methods
data_analysis
graph_limits
machine_learning
re:smoothing_adjacency_matrices
to_read
october 2011 by cshalizi
[1106.2125] Bootstrapping data arrays of arbitrary order
june 2011 by cshalizi
"In this paper we study a bootstrap strategy for estimating the variance of a mean taken over large multifactor crossed random effects data sets. We apply bootstrap reweighting independently to the levels of each factor, giving each observation the product of its factor weights. No exact bootstrap exists for this problem (McCullagh (2000)). We show that the proposed bootstrap is mildly conservative, under sufficient conditions that allow very unbalanced and heteroscedastic inputs. Earlier results for a resampling bootstrap only apply to two factors and are not suitable to online computation. The proposed reweighting approach can be implemented in parallel and online settings. The results for this method apply to any number of factors. The method is illustrated using a 3 factor data set of comment lengths from Facebook."
bootstrap
statistics
eckles.dean
owen.art
have_read
network_data_analysis
re:smoothing_adjacency_matrices
to:blog
june 2011 by cshalizi
Owen : The pigeonhole bootstrap
march 2011 by cshalizi
"Recently there has been much interest in data that, in statistical language, may be described as having a large crossed and severely unbalanced random effects structure. Such data sets arise for recommender engines and information retrieval problems. Many large bipartite weighted graphs have this structure too. We would like to assess the stability of algorithms fit to such data. Even for linear statistics, a naive form of bootstrap sampling can be seriously misleading and McCullagh [Bernoulli 6 (2000) 285–301] has shown that no bootstrap method is exact. We show that an alternative bootstrap separately resampling rows and columns of the data matrix satisfies a mean consistency property even in heteroscedastic crossed unbalanced random effects models. This alternative does not require the user to fit a crossed random effects model to the data."
bootstrap
network_data_analysis
via:deaneckles
re:smoothing_adjacency_matrices
have_read
march 2011 by cshalizi
Mccullagh : Resampling and exchangeable arrays
march 2011 by cshalizi
But, ummm, you need to make sure your resampling plan respects the dependence structure. I'm pretty sure that I could use this to "prove" that you couldn't use resampling to get standard errors for the mean of a stationary time series. Something very weird here. To re-read.
bootstrap
network_data_analysis
statistics
via:deaneckles
re:smoothing_adjacency_matrices
have_read
march 2011 by cshalizi
[1102.2650] Estimating and Understanding Exponential Random Graph Models
february 2011 by cshalizi
Rigorous results on conditions under which mean field approximations become exact for large ERGMs, and consequently they end up looking like Erdos-Renyi graphs (at some effective density); these come from clever large-deviations arguments. To over-simplify some pretty theorems, the cases are (1) all interactions between edges are positive, so introducing any positive density of edges at all tends to produce a blow-up towards a nearly complete graph (where edges are independent); or (2) all interactions are extremely weak, and consequently negligible in the large-scale limit.
in_NB
exponential_family_random_graphs
network_data_analysis
large_deviations
mean-field_theory
statistics
stochastic_processes
re:almost_none
graph_limits
chatterjee.souav
diaconis.persi
statistical_mechanics
have_read
re:smoothing_adjacency_matrices
re:your_favorite_ergm_sucks
february 2011 by cshalizi
Statistical Inference on Random Graphs; Comparative Power Analysis- Journal of Computational and Graphical Statistics - 0(0):1
february 2011 by cshalizi
"We present a comparative power analysis, via Monte Carlo, of various graph invariants used as statistics for testing graph homogeneity versus a “chatter” alternative—the existence of a local region of excessive activity. Our results indicate that statistical inference on random graphs, even in a relatively simple setting, can be decidedly nontrivial. We find that none of the graph invariants considered is uniformly most powerful throughout our space of alternatives. Code for reproducing all the simulation results presented in this article is available online."
statistics
network_data_analysis
hypothesis_testing
re:smoothing_adjacency_matrices
re:network_differences
february 2011 by cshalizi
[1011.2526] Ergodic Theory on Stationary Random Graphs
november 2010 by cshalizi
"A stationary random graph is a random rooted graph whose distribution is invariant under re-rooting along the simple random walk. We adapt the entropy technique developed for Cayley graphs and show in particular that stationary random graphs of subexponential growth are almost surely Liouville, that is, admit no non constant bounded harmonic function. Applications include the uniform infinite planar quadrangulation and long-range percolation clusters."
ergodic_theory
random_graphs
re:smoothing_adjacency_matrices
have_read
november 2010 by cshalizi
[1011.3552] Polytopes from Subgraph Statistics
november 2010 by cshalizi
"We study polytopes that are convex hulls of vectors of subgraph densities. Many graph theoretical questions can be expressed in terms of these polytopes, and statisticians use them to understand exponential random graph models.
Relations among their Ehrhart polynomials are described, their duals are applied to certify that polynomials are non-negative, and we find some of their faces.
For the general picture we inscribe cyclic polytopes in them and calculate volumes. From the volume calculations we conjecture that a variation of the Selberg integral indexed by Schur polynomials has a combinatorial formula. We inscribe polynomially parametrized sets, called curvy zonotopes, in the polytopes and show that they approximate the polytopes arbitrarily close."
re:smoothing_adjacency_matrices
networks
network_data_analysis
algebraic_statistics
Relations among their Ehrhart polynomials are described, their duals are applied to certify that polynomials are non-negative, and we find some of their faces.
For the general picture we inscribe cyclic polytopes in them and calculate volumes. From the volume calculations we conjecture that a variation of the Selberg integral indexed by Schur polynomials has a combinatorial formula. We inscribe polynomially parametrized sets, called curvy zonotopes, in the polytopes and show that they approximate the polytopes arbitrarily close."
november 2010 by cshalizi
[1005.1136] Random graphs with a given degree sequence
july 2010 by cshalizi
Per Newman, Strogatz & Watts (2001; cond-mat/0007235), these graphs are very natural null models for networks, so this may be useful for getting analytic theories for such tests.
random_graphs
network_data_analysis
diaconis.persi
chatterjee.souav
via:ale
re:smoothing_adjacency_matrices
graph_limits
have_read
july 2010 by cshalizi
related tags
algebraic_statistics ⊕ approximation ⊕ bickel.peter ⊕ blogging ⊕ bootstrap ⊕ chatterjee.souav ⊕ chung.fan ⊕ community_discovery ⊕ conferences ⊕ data_analysis ⊕ data_mining ⊕ diaconis.persi ⊕ eckles.dean ⊕ ergodic_decomposition ⊕ ergodic_theory ⊕ estimation ⊕ exchangeable_arrays ⊕ exchangeable_sequences ⊕ exponential_family_random_graphs ⊕ graph_limits ⊕ graph_spectra ⊕ graph_theory ⊕ have_read ⊕ high-dimensional_probability ⊕ hypothesis_testing ⊕ inference_to_latent_objects ⊕ information_theory ⊕ in_NB ⊕ janson.svante ⊕ kernel_methods ⊕ large_deviations ⊕ lauritzen.steffen ⊕ learning_theory ⊕ levina.elizaveta ⊕ levina.liza ⊕ machine_learning ⊕ manifold_learning ⊕ markov_models ⊕ mean-field_theory ⊕ measure_theory ⊕ networks ⊕ network_data_analysis ⊕ nonparametrics ⊕ orbanz.peter ⊕ owen.art ⊕ point_processes ⊕ probability ⊕ random_graphs ⊕ re:almost_none ⊕ re:network_differences ⊕ re:smoothing_adjacency_matrices ⊖ re:your_favorite_ergm_sucks ⊕ renormalization ⊕ snijders.tom ⊕ spectral_clustering ⊕ statistical_inference_for_stochastic_processes ⊕ statistical_mechanics ⊕ statistics ⊕ stochastic_processes ⊕ sufficiency ⊕ to:blog ⊕ to:NB ⊕ to_read ⊕ track_down_references ⊕ van_handel.ramon ⊕ via:ale ⊕ via:alessandro ⊕ via:deaneckles ⊕ zhu.ji ⊕Copy this bookmark: