cshalizi + community_discovery   88

[1204.2255] Identifying edge clusters in networks via edge graphlet degree vectors (edge-GDVs) and edge-GDV-similarities
"Inference of new biological knowledge, e.g., prediction of protein function, from protein-protein interaction (PPI) networks has received attention in the post-genomic era. A popular strategy has been to cluster the network into functionally coherent groups of proteins and predict protein function from the clusters. Traditionally, network research has focused on clustering of nodes. However, why favor nodes over edges, when clustering of edges may be preferred? For example, nodes belong to multiple functional groups, but clustering of nodes typically cannot capture the group overlap, while clustering of edges can. Clustering of adjacent edges that share many neighbors was proposed recently, outperforming different node clustering methods. However, since some biological processes can have characteristic "signatures" throughout the network, not just locally, it may be of interest to consider edges that are not necessarily adjacent. Hence, we design a sensitive measure of the "topological similarity" of edges that can deal with edges that are not necessarily adjacent. We cluster edges that are similar according to our measure in different baker's yeast PPI networks, outperforming existing node and edge clustering approaches."
to:NB  network_data_analysis  community_discovery 
6 weeks ago by cshalizi
[1204.2296] Co-clustering for Directed Graphs; the Stochastic Co-Blockmodel and a Spectral Algorithm
"This paper extends the spectral clustering algorithm to directed networks in a way that co-clusters or bi-clusters the rows and columns of a graph Laplacian. Co-clustering leverages the increased complexity of asymmetric relationships to gain new insight into the structure of the directed network. To understand this algorithm and to study its asymptotic properties in a canonical setting, we propose the Stochastic Co-Blockmodel to encode co-clustering structure. This is the first statistical model of co-clustering and it is derived using the concept of stochastic equivalence that motivated the original Stochastic Blockmodel. Although directed spectral clustering is not derived from the Stochastic Co-Blockmodel, we show that, asymptotically, the algorithm can estimate the blocks in a high dimensional asymptotic setting in which the number of blocks grows with the number of nodes. The algorithm, model, and asymptotic results can all be extended to bipartite graphs."
in_NB  relational_learning  network_data_analysis  statistics  clustering  community_discovery  spectral_clustering  yu.bin 
6 weeks ago by cshalizi
[1204.2581] Modeling Relational Data via Latent Factor Blockmodel
"In this paper we address the problem of modeling relational data, which appear in many applications such as social network analysis, recommender systems and bioinformatics. Previous studies either consider latent feature based models but disregarding local structure in the network, or focus exclusively on capturing local structure of objects based on latent blockmodels without coupling with latent characteristics of objects. To combine the benefits of the previous work, we propose a novel model that can simultaneously incorporate the effect of latent features and covariates if any, as well as the effect of latent structure that may exist in the data. To achieve this, we model the relation graph as a function of both latent feature factors and latent cluster memberships of objects to collectively discover globally predictive intrinsic properties of objects and capture latent block structure in the network to improve prediction performance. We also develop an optimization transfer algorithm based on the generalized EM-style strategy to learn the latent factors. We prove the efficacy of our proposed model through the link prediction task and cluster analysis task, and extensive experiments on the synthetic data and several real world datasets suggest that our proposed LFBM model outperforms the other state of the art approaches in the evaluated tasks."
in_NB  network_data_analysis  community_discovery  statistics  inference_to_latent_objects  factor_analysis  relational_learning 
6 weeks ago by cshalizi
[1204.1002] Fast Multi-Scale Detection of Relevant Communities
"Nowadays, networks are almost ubiquitous. In the past decade, community detection received an increasing interest as a way to uncover the structure of networks by grouping nodes into communities more densely connected internally than externally. Yet most of the effective methods available do not consider the potential levels of organisation, or scales, a network may encompass and are therefore limited. In this paper we present a method compatible with global and local criteria that enables fast multi-scale community detection. The method is derived in two algorithms, one for each type of criterion, and implemented with 6 known criteria. Uncovering communities at various scales is a computationally expensive task. Therefore this work puts a strong emphasis on the reduction of computational complexity. Some heuristics are introduced for speed-up purposes. Experiments demonstrate the efficiency and accuracy of our method with respect to each algorithm and criterion by testing them against large generated multi-scale networks. This study also offers a comparison between criteria and between the global and local approaches."
to:NB  community_discovery  data_mining 
7 weeks ago by cshalizi
[1203.3037] Expanding the Transfer Entropy to Identify Information Subgraphs in Complex Systems
"We propose a formal expansion of the transfer entropy to put in evidence irreducible sets of variables which provide information for the future state of each assigned target. Multiplets characterized by an high value will be associated to informational circuits present in the system, with an informational character (synergetic or redundant) which can be associated to the sign of the contribution. We also present preliminary results on fMRI and EEG data sets."
in_NB  graphical_models  information_theory  community_discovery  time_series  re:functional_communities 
7 weeks ago by cshalizi
[1203.3083] Clustering in networks with the collapsed Stochastic Block Model
"We present an efficient MCMC algorithm to cluster the nodes of a network such that nodes with similar role in the network are clustered together. This is known as block-modelling or block-clustering. We extend the stochastic blockmodel (SBM) of Nowicki & Snijders (2001), by exploiting parameter collapsing to integrate out block parameters. The resulting model defines a posterior over the number of clusters and cluster memberships. Sampling from this model is simpler than from the original SBM as transdimensional MCMC can be avoided. Moreover, our extensions allow the number of clusters to be directly estimated, rather than given as an input parameter. The algorithm is based on the allocation sampler of Nobile & Fearnside (2007). We use synthetic and real data to test the speed and accuracy of our model and algorithm, including the ability to estimate the number of clusters. The algorithm can scale to networks with up to ten thousand nodes."
in_NB  network_data_analysis  community_discovery  statistics 
7 weeks ago by cshalizi
[1203.2200] Role-Dynamics: Fast Mining of Large Dynamic Networks
"To understand the structural dynamics of a large-scale social, biological or technological network, it may be useful to discover behavioral roles representing the main connectivity patterns present over time. In this paper, we propose a scalable non-parametric approach to automatically learn the structural dynamics of the network and individual nodes. Roles may represent structural or behavioral patterns such as the center of a star, peripheral nodes, or bridge nodes that connect different communities. Our novel approach learns the appropriate structural role dynamics for any arbitrary network and tracks the changes over time. In particular, we uncover the specific global network dynamics and the local node dynamics of a technological, communication, and social network. We identify interesting node and network patterns such as stationary and non-stationary roles, spikes/steps in role-memberships (perhaps indicating anomalies), increasing/decreasing role trends, among many others. Our results indicate that the nodes in each of these networks have distinct connectivity patterns that are non-stationary and evolve considerably over time. Overall, the experiments demonstrate the effectiveness of our approach for fast mining and tracking of the dynamics in large networks. Furthermore, the dynamic structural representation provides a basis for building more sophisticated models and tools that are fast for exploring large dynamic networks."
in_NB  network_data_analysis  data_mining  community_discovery  neville.jennifer 
7 weeks ago by cshalizi
[1203.5974] The Concentration and Stability of the Community Detecting Functions on Random Networks
"We propose a general form of community detecting functions for finding the communities or the optimal partition of a random network, and examine the concentration and stability of the function values using the bounded difference martingale method. We derive LDP inequalities for both the general case and several specific community detecting functions: modularity, graph bipartitioning and q-Potts community structure. We also discuss the concentration and stability of community detecting functions on different types of random networks: the sparse and non-sparse networks and some examples such as ER and CL networks."
in_NB  to_read  community_discovery  network_data_analysis  statistics 
8 weeks ago by cshalizi
[1203.6093] Consensus clustering in complex networks
"The community structure of complex networks reveals both their organization and hidden relationships among their constituents. Most community detection methods currently available are not deterministic, and their results typically depend on the specific random seeds, initial conditions and tie-break rules adopted for their execution. Consensus clustering is used in data analysis to generate stable results out of a set of partitions delivered by stochastic methods. Here we show that consensus clustering can be combined with any existing method in a self-consistent way, enhancing considerably both the stability and the accuracy of the resulting partitions. This framework is also particularly suitable to monitor the evolution of community structure in temporal networks. An application of consensus clustering to a large citation network of physics papers demonstrates its capability to keep track of the birth, death and diversification of topics."
in_NB  community_discovery  network_data_analysis  clustering  data_mining 
8 weeks ago by cshalizi
[0803.1628] Component models for large networks
"Being among the easiest ways to find meaningful structure from discrete data, Latent Dirichlet Allocation (LDA) and related component models have been applied widely. They are simple, computationally fast and scalable, interpretable, and admit nonparametric priors. In the currently popular field of network modeling, relatively little work has taken uncertainty of data seriously in the Bayesian sense, and component models have been introduced to the field only recently, by treating each node as a bag of out-going links. We introduce an alternative, interaction component model for communities (ICMc), where the whole network is a bag of links, stemming from different components. The former finds both disassortative and assortative structure, while the alternative assumes assortativity and finds community-like structures like the earlier methods motivated by physics. With Dirichlet Process priors and an efficient implementation the models are highly scalable, as demonstrated with a social network from the Last.fm web site, with 670,000 nodes and 1.89 million links."
in_NB  community_discovery  statistics  nonparametrics  clustering 
10 weeks ago by cshalizi
[1112.6028] The entropy of stochastic blockmodel ensembles
"Stochastic blockmodels are generative network models where the vertices are separated into discrete groups, and the probability of an edge existing between two vertices is determined solely by their group membership. In this paper, we derive expressions for the entropy of stochastic blockmodel ensembles. We consider several ensemble variants, including the traditional model as well as the newly introduced degree-corrected version [Karrer et al. Phys. Rev. E {bf 83}, 016107 (2011)], which imposes a degree sequence on the vertices, in addition to the block structure. The imposed degree sequence is implemented both as "soft" constraints, where only the expected degrees are imposed, and as "hard" constraints, where they are required to be the same on all samples of the ensemble. We also consider generalizations to multigraphs and directed graphs. We illustrate one of many applications of this measure by directly deriving a log-likelihood function from the entropy expression, and using it to infer latent block structure in observed data. Due to the general nature of the ensembles considered, the method works well for ensembles with intrinsic degree correlations (i.e. with entropic origin) as well as extrinsic degree correlations, which go beyond the block structure."
in_NB  network_data_analysis  exponential_family_random_graphs  re:your_favorite_ergm_sucks  community_discovery 
january 2012 by cshalizi
[1112.3308] Spatial correlations in attribute communities
"Community detection is an important tool for exploring and classifying the properties of large complex networks and should be of great help for spatial networks. Indeed, in addition to their location, nodes in spatial networks can have attributes such as the language for individuals, or any other socio-economical feature that we would like to identify in communities. We discuss in this paper a crucial aspect which was not considered in previous studies which is the possible existence of correlations between space and attributes. Introducing a simple toy model in which both space and node attributes are considered, we discuss the effect of space-attribute correlations on the results of various community detection methods proposed for spatial networks in this paper and in previous studies. When space is irrelevant, our model is equivalent to the stochastic block model which has been shown to display a detectability-non detectability transition. In the regime where space dominates the link formation process, most methods can fail to recover the communities, an effect which is particularly marked when space-attributes correlations are strong. In this latter case, community detection methods which remove the spatial component of the network can miss a large part of the community structure and can lead to incorrect results."
in_NB  to_read  statistics  community_discovery  network_data_analysis  spatial_statistics 
december 2011 by cshalizi
[1112.1831] Finding Overlapping Communities in Social Networks: Toward a Rigorous Approach
"A "community" in a social network is usually understood to be a group of nodes more densely connected with each other than with the rest of the network. This is an important concept in most domains where networks arise: social, technological, biological, etc. For many years algorithms for finding communities implicitly assumed communities are nonoverlapping (leading to use of clustering-based approaches) but there is increasing interest in finding overlapping communities. A barrier to finding communities is that the solution concept is often defined in terms of an NP-complete problem such as Clique or Hierarchical Clustering.
This paper seeks to initiate a rigorous approach to the problem of finding overlapping communities, where "rigorous" means that we clearly state the following: (a) the object sought by our algorithm (b) the assumptions about the underlying network (c) the (worst-case) running time.
Our assumptions about the network lie between worst-case and average-case. An average case analysis would require a precise probabilistic model of the network, on which there is currently no consensus. However, some plausible assumptions about network parameters can be gleaned from a long body of work in the sociology community spanning five decades focusing on the study of individual communities and ego-centric networks. Thus our assumptions are somewhat "local" in nature. Nevertheless they suffice to permit a rigorous analysis of running time of algorithms that recover global structure.
Our algorithms use random sampling similar to that in property testing and algorithms for dense graphs. However, our networks are not necessarily dense graphs, not even in local neighborhoods.
Our algorithms explore a local-global relationship between ego-centric and socio-centric networks that we hope will provide a fruitful framework for future work both in computer science and sociology."
in_NB  community_discovery  theoretical_computer_science  network_data_analysis 
december 2011 by cshalizi
[1112.1115] Social-Topical Affiliations: The Interplay between Structure and Popularity
"Information popularity and social relationships are intimately connected. However, measuring the extent to which they affect each other has remained an open question. Because we now have access to rich and large data sets from online social networks, we can begin to quantitatively understand the interplay between them. We examine the interface of two decisive structures forming the backbone of online social media: the graph structure of social networks - who is friends with whom - and the set structure of topical affiliations - who talks about what. In studying this interface, we identify key relationships whereby each of these structures can be understood in terms of the other. The context for our study is Twitter, where we look at the social network of both follower relationships and communication relationships, alongside the affiliations outlined by the hashtags used by people to label their communications. On Twitter, we demonstrate how the hashtags that a user adopts can be used to predict their social relationships, and also how the social relationships between the adopters of a hashtag can be used to predict the future popularity of that hashtag. Importantly, we find that both relationships are driven by highly computationally simple structural determinants. While our analysis focuses on Twitter, we view our analysis of social-topical affiliations as broadly applicable to a host of diverse affiliations, including the movies people watch, the brands people like, or the locations people frequent."
in_NB  network_data_analysis  social_media  text_mining  community_discovery 
december 2011 by cshalizi
Phys. Rev. E 84, 056111 (2011): Exploring the structural regularities in networks
"In this paper, we consider the problem of exploring structural regularities of networks by dividing the nodes of a network into groups such that the members of each group have similar patterns of connections to other groups. Specifically, we propose a general statistical model to describe network structure. In this model, a group is viewed as a hidden or unobserved quantity and it is learned by fitting the observed network data using the expectation-maximization algorithm. Compared with existing models, the most prominent strength of our model is the high flexibility. This strength enables it to possess the advantages of existing models and to overcome their shortcomings in a unified way. As a result, not only can broad types of structure be detected without prior knowledge of the type of intrinsic regularities existing in the target network, but also the type of identified structure can be directly learned from the network. Moreover, by differentiating outgoing edges from incoming edges, our model can detect several types of structural regularities beyond competing models. Tests on a number of real world and artificial networks demonstrate that our model outperforms the state-of-the-art model in shedding light on the structural regularities of networks, including the overlapping community structure, multipartite structure, and several other types of structure, which are beyond the capability of existing models." [At a guess from the abstract: block models]
to:NB  network_data_analysis  community_discovery 
december 2011 by cshalizi
[0811.4208] An information-theoretic derivation of min-cut based clustering
"Min-cut clustering, based on minimizing one of two heuristic cost-functions proposed by Shi and Malik, has spawned tremendous research, both analytic and algorithmic, in the graph partitioning and image segmentation communities over the last decade. It is however unclear if these heuristics can be derived from a more general principle facilitating generalization to new problem settings. Motivated by an existing graph partitioning framework, we derive relationships between optimizing relevance information, as defined in the Information Bottleneck method, and the regularized cut in a K-partitioned graph. For fast mixing graphs, we show that the cost functions introduced by Shi and Malik can be well approximated as the rate of loss of predictive information about the location of random walkers on the graph. For graphs generated from a stochastic algorithm designed to model community structure, the optimal information theoretic partition and the optimal min-cut partition are shown to be the same with high probability."
in_NB  kith_and_kin  wiggins.chris  community_discovery 
december 2011 by cshalizi
[1111.6115] Discovering Network Structure Beyond Communities
"To understand the formation, evolution, and function of complex systems, it is crucial to understand the internal organization of their interaction networks. Partly due to the impossibility of visualizing large complex networks, resolving network structure remains a challenging problem. Here we overcome this difficulty by combining the visual pattern recognition ability of humans with the high processing speed of computers to develop an exploratory method for discovering groups of nodes characterized by common network properties, including but not limited to communities of densely connected nodes. Without any prior information about the nature of the groups, the method simultaneously identifies the number of groups, the group assignment, and the properties that define these groups. The results of applying our method to real networks suggest the possibility that most group structures lurk undiscovered in the fast-growing inventory of social, biological, and technological networks of scientific interest."
in_NB  network_data_analysis  community_discovery 
december 2011 by cshalizi
[1111.2018] Intrinsically Dynamic Network Communities
"Community finding algorithms for networks have recently been extended to dynamic data. Most of these recent methods aim at exhibiting community partitions from successive graph snapshots and thereafter connecting or smoothing these partitions using clever time-dependent features and sampling techniques. These approaches are nonetheless achieving longitudinal rather than dynamic community detection. We assume that communities are fundamentally defined by the repetition of interactions among a set of nodes over time. According to this definition, analyzing the data by considering successive snapshots induces a significant loss of information: we suggest that it blurs essentially dynamic phenomena - such as communities based on repeated inter-temporal interactions, nodes switching from a community to another across time, or the possibility that a community survives while its members are being integrally replaced over a longer time period. We propose a formalism which aims at tackling this issue in the context of time-directed datasets (such as citation networks), and present several illustrations on both empirical and synthetic dynamic networks. We eventually introduce intrinsically dynamic metrics to qualify temporal community structure and emphasize their possible role as an estimator of the quality of the community detection - taking into account the fact that various empirical contexts may call for distinct `community' definitions and detection criteria."
in_NB  community_discovery  network_data_analysis  transaction_networks  roth.camille 
november 2011 by cshalizi
Evolving Cluster Mixed-Membership Blockmodel for Time-Evolving Networks
"Time-evolving networks are a natural presentation for dynamic social and biological interactions. While latent space models are gaining popularity in network modeling and analysis, previous works mostly ignore networks with temporal behavior and multi-modal actor roles. Furthermore, prior knowledge, such as division and grouping of social actors or biological specificity of molecular functions, has not been systematically exploited in network modeling. In this paper, we develop a network model featuring a state space mixture prior that tracks complex actor latent role changes through time. We provide a fast variational inference algorithm for learning our model, and validate it with simulations and held-out likelihood comparisons on real-world time-evolving networks. Finally, we demonstrate our model's utility as a network analysis tool, by applying it to United States Congress voting data. "
in_NB  to_read  community_discovery  network_data_analysis  machine_learning  xing.eric 
november 2011 by cshalizi
Multiscale Community Blockmodel for Network Exploration
"Real world networks exhibit a complex set of phenomena such as underlying hierarchical organization, multiscale interaction, and varying topologies of communities. Most existing methods do not adequately capture the intrinsic interplay among such phenomena. We propose a nonparametric Multiscale Community Blockmodel (MSCB) to model the generation of hierarchies in social communities, selective membership of actors to subsets of these communities, and the resultant networks due to within- and cross- community interactions. By using the nested Chinese Restaurant Process, our model automatically infers the hierarchy structure from the data. We develop a collapsed Gibbs sampling algorithm for posterior inference, conduct extensive validation using synthetic networks, and demonstrate the utility of our model in real-world datasets such as predator-prey networks and citation networks. "
in_NB  to_read  community_discovery  network_data_analysis  machine_learning  statistics  xing.eric 
november 2011 by cshalizi
[1110.6517] Classification and estimation in the Stochastic Block Model based on the empirical degrees
"The Stochastic Block Model (Holland et al., 1983) is a mixture model for heterogeneous network data. Unlike the usual statistical framework, new nodes give additional information about the previous ones in this model. Thereby the distribution of the degrees concentrates in points conditionally on the node class. We show under a mild assumption that classification, estimation and model selection can actually be achieved with no more than the empirical degree data. We provide an algorithm able to process very large networks and consistent estimators based on it. In particular, we prove a bound of the probability of misclassification of at least one node, including when the number of classes grows."
in_NB  to_read  network_data_analysis  community_discovery  statistics 
november 2011 by cshalizi
[1110.2515] Normalized Mutual Information to evaluate overlapping community finding algorithms
"Given the increasing popularity of algorithms for overlapping clustering, in particular in social network analysis, quantitative measures are needed to measure the accuracy of a method. Given a set of true clusters, and the set of clusters found by an algorithm, these sets of clusters must be compared to see how similar or different the sets are. A normalized measure is desirable in many contexts, for example assigning a value of 0 where the two sets are totally dissimilar, and 1 where they are identical. A measure based on normalized mutual information, [1], has recently become popular. We demonstrate unintuitive behaviour of this measure, and show how this can be corrected by using a more conventional normalization. We compare the results to that of other measures, such as the Omega index [2]."
in_NB  community_discovery  information_theory  clustering  data_mining 
october 2011 by cshalizi
[0812.2184] Protein-Interaction-Networks: More than mere modules
"Cellular function is widely believed to be organized in a modular fashion. On all scales and at all levels of complexity, relatively independent sub-units perform relatively independent sub-tasks of biological function. This functional modularity must be reflected in the topology of molecular networks. But how a functional module should be represented in an interaction network is an open question. In protein-interaction networks (PIN), one can identify a protein-complex as a module on a small scale, i.e. modules are understood as densely linked, resp. interacting, groups of proteins, that are only sparsely interacting with the rest of the network.
In this contribution, we show that extrapolating this concept of cohesively linked clusters of proteins as modules to the scale of the entire PIN inevitable misses important and functionally relevant structure inherent in the network. As an alternative, we introduce a novel way of decomposing a network into functional roles and show that this represents network structure and function more efficiently. This finding should have a profound impact on all module assisted methods of protein function prediction and should shed new light on how functional modules can be represented in molecular interaction networks in general."
to:NB  community_discovery  biochemical_networks  reichardt.joerg 
october 2011 by cshalizi
[0812.1072] Multiresolution community detection for megascale networks by information-based replica correlations
"We use a Potts model community detection algorithm to accurately and quantitatively evaluate the hierarchical or multiresolution structure of a graph. Our multiresolution algorithm calculates correlations among multiple copies ("replicas") of the same graph over a range of resolutions. Significant multiresolution structures are identified by strongly correlated replicas. The average normalized mutual information, the variation of information, and other measures in principle give a quantitative estimate of the "best" resolutions and indicate the relative strength of the structures in the graph. Because the method is based on information comparisons, it can in principle be used with any community detection model that can examine multiple resolutions. Our approach may be extended to other optimization problems. As a local measure, our Potts model avoids the "resolution limit" that affects other popular models. With this model, our community detection algorithm has an accuracy that ranks among the best of currently available methods. Using it, we can examine graphs over 40 million nodes and more than one billion edges. We further report that the multiresolution variant of our algorithm can solve systems of at least 200000 nodes and 10 million edges on a single processor with exceptionally high accuracy. For typical cases, we find a super-linear scaling, O(L^{1.3}) for community detection and O(L^{1.3} log N) for the multiresolution algorithm where L is the number of edges and N is the number of nodes in the system."
in_NB  community_discovery 
october 2011 by cshalizi
[1108.2228] A consistent dot product embedding for stochastic blockmodel graphs
"We present a method to estimate block membership of nodes in a random graph generated by a stochastic blockmodel. We use an embedding procedure motivated by the random dot product graph model, a particular example of the latent position model. The embedded vectors are clustered through minimization of a mean square error/criteria. We prove that this method is consistent for assigning nodes to blocks, as only a negligible number of nodes will be mis-assigned. We prove consistency of the method for directed and undirected graphs. The consistent block assignment makes possible consistent parameter estimation for a stochastic blockmodel. We extend the result for when the number of blocks grows slowly with the number of nodes. Our method is also computationally feasible even for very large graphs."
community_discovery  network_data_analysis  in_NB  statistics  re:homophily_and_confounding 
august 2011 by cshalizi
Dynamic Infinite Relational Model for Time-varying Relational Data Analysis
From a quick scan, I can't tell if this is a version of a paper I refereed for another conference, or just everyone condensing on the same idea at once.
graphical_models  community_discovery  network_data_analysis  machine_learning  in_NB  relational_learning 
february 2011 by cshalizi
[1102.1182] Phase transition in the detection of modules in sparse networks
"We present an asymptotically exact analysis of the problem of detecting communities in sparse random networks. Our results are also applicable to detection of functional modules, partitions, and colorings in noisy planted models. Using a cavity method analysis, we unveil a phase transition from a region where the original group assignment is undetectable to one where detection is possible. In some cases, the detectable region splits into an algorithmically hard region and an easy one. Our approach naturally translates into a practical algorithm for detecting modules in sparse networks, and learning the parameters of the underlying model." --- This is really an EM algorithm, not a Bayesian method.
community_discovery  have_read  kith_and_kin  phase_transitions  network_data_analysis  moore.cris  belief_propagation  re:stacs  to_teach:complexity-and-inference 
february 2011 by cshalizi
Mariadassou, Robin, Vacher: Uncovering latent structure in valued graphs: A variational approach
"As more and more network-structured data sets are available, the statistical analysis of valued graphs has become common place. Looking for a latent structure is one of the many strategies used to better understand the behavior of a network. Several methods already exist for the binary case.

We present a model-based strategy to uncover groups of nodes in valued graphs. This framework can be used for a wide span of parametric random graphs models and allows to include covariates. Variational tools allow us to achieve approximate maximum likelihood estimation of the parameters of these models. We provide a simulation study showing that our estimation method performs well over a broad range of situations. We apply this method to analyze host–parasite interaction networks in forest ecosystems."
network_data_analysis  inference_to_latent_objects  community_discovery  statistics  estimation 
august 2010 by cshalizi
Stability of graph communities across time scales — PNAS
This sounds an _awful lot_ like Lee & Wasserman in arxiv:0811.0121.  Also: no actual time.
community_discovery  to_be_shot_after_a_fair_trial 
july 2010 by cshalizi
A nonparametric view of network models and Newman–Girvan and other modularities — PNAS
Well --- steps towards nonparametric network models might be fairer. But the results on recovering the community structure of block models _are_ genuinely interesting.
network_data_analysis  community_discovery  spectral_clustering  bickel.peter  have_read 
december 2009 by cshalizi
[0908.0449] On the Stability of Community Detection Algorithms on Longitudinal Citation Data
Ummm --- do people really try standard community discovery algorithms on citation networks? It would seem like an obviously bad idea. (Collaboration networks are another story.) Clustering by co-citation (in either direction) is different, no?
Anyway: should read this.
community_discovery  network_data_analysis 
august 2009 by cshalizi
SSRN-Party Polarization in Congress: A Social Networks Approach by Andrew Waugh, Liuyi Pei, James Fowler, Peter Mucha, Mason Porter
Need to re-examine the polarization bit. I suspect it's not actually incompatible with a sensible story about how polarization has actually grown.
social_networks  community_discovery  congress  us_politics  political_science  re:donor_networks  via:henry_farrell 
july 2009 by cshalizi
[0811.3988] Dynamic communities in multichannel data: An application to the foreign exchange market during the 2007--2008 credit crisis
I have seen this technique before. :) (Though, on reflection, if you're going to do everything with the correlation matrix [rather than mutual information], why not just take the inverse correlation matrix and identify edges with non-zero entries there?)
community_discovery  financial_markets  re:stacs  financial_crisis_of_2007--  have_read 
april 2009 by cshalizi
About XStructure
Interface to arxiv via some kind of hierarchical clustering of the citation graph. (Can't find details.) Interesting but doesn't look all that useful (yet).
community_discovery  hierarchical_structure  information_retrieval  arxiv 
july 2008 by cshalizi
« earlier      

related tags

arxiv  belief_propagation  bergstrom.carl  bibliometry  bickel.peter  biochemical_networks  blogging  books:recommended  bootstrap  citation_networks  clauset.aaron  clustering  community_discovery  computational_complexity  congress  d'souza.raissa  data_mining  em_algorithm  estimation  exponential_family_random_graphs  factor_analysis  finance  financial_crisis_of_2007--  financial_markets  functional_connectivity  graphical_models  graph_theory  have_read  heard_the_talk  hierarchical_structure  hypothesis_testing  inference_to_latent_objects  information_retrieval  information_theory  in_NB  kith_and_kin  latent_variables  learning  learning_theory  leone.michele  lerman.kristina  levina.liza  machine_learning  markov_models  minimum_description_length  moore.cris  networks  network_data_analysis  neuroscience  neville.jennifer  newman.mark  nonparametrics  phase_transitions  political_science  principal_components  random_walks  re:donor_networks  re:functional_communities  re:homophily_and_confounding  re:network_differences  re:sensor-networks-as-social-networks  re:smoothing_adjacency_matrices  re:stacs  re:your_favorite_ergm_sucks  reichardt.joerg  relational_learning  rosvall.martin  roth.camille  snijders.tom  social_media  social_networks  sociology_of_science  spatial_statistics  spectral_clustering  statistical_mechanics  statistics  text_mining  theoretical_computer_science  time_series  to:NB  to_be_shot_after_a_fair_trial  to_read  to_teach:complexity-and-inference  to_teach:data-mining  transaction_networks  us_politics  via:ale  via:henry_farrell  via:mason  via:mejn  visual_display_of_quantitative_information  wiggins.chris  xing.eric  yu.bin  zhu.ji 

Copy this bookmark:



description:


tags: