«JMLR: Workshop and Conference Proceedings vol 30 (2013) 1–62 A Tensor Spectral Approach to Learning Mixed Membership Community Models Animashree ...»
JMLR: Workshop and Conference Proceedings vol 30 (2013) 1–62
A Tensor Spectral Approach to Learning Mixed Membership
Animashree Anandkumar email@example.com
University of California, Irvine
Rong Ge firstname.lastname@example.org
Daniel Hsu email@example.com
Microsoft Research Sham M. Kakade firstname.lastname@example.org Microsoft Research Abstract Detecting hidden communities from observed interactions is a classical problem. Theoretical analysis of community detection has so far been mostly limited to models with non-overlapping communities such as the stochastic block model. In this paper, we provide guaranteed community detection for a family of probabilistic network models with overlapping communities, termed as the mixed membership Dirichlet model, ﬁrst introduced in Airoldi et al. (2008). This model allows for nodes to have fractional memberships in multiple communities and assumes that the community memberships are drawn from a Dirichlet distribution. Moreover, it contains the stochastic block model as a special case.
We propose a uniﬁed approach to learning communities in these models via a tensor spectral decomposition approach. Our estimator uses low-order moment tensor of the observed network, consisting of 3-star counts. Our learning method is based on simple linear algebraic operations such as singular value decomposition and tensor power iterations. We provide guaranteed recovery of community memberships and model parameters, and present a careful ﬁnite sample analysis of our learning method. Additionally, our results match the best known scaling requirements for the special case of the (homogeneous) stochastic block model.
Keywords: Community detection, spectral methods, tensor methods, moment-based estimation, mixed membership models.
1. Introduction1 Studying communities forms an integral part of social network analysis. A community generally refers to a group of individuals with shared interests (e.g. music, sports), or relationships (e.g. friends, co-workers). Various probabilistic and non-probabilistic network models attempt to explain community formation. In addition, they also attempt to quantify interactions and the extent of overlap between diﬀerent communities, relative sizes among Part of this work was done when AA and RG were visiting MSR New England. AA is supported in part by the NSF Career award CCF-1254106, NSF Award CCF-1219234, AFOSR Award FA9550-10-1-0310 and the ARO Award W911NF-12-1-0404.
the communities, and various other network properties. Studying such community models is also of interest in other domains, e.g. in biological networks.
While there exists a vast literature on community models, learning these models is typically challenging, and various heuristics such as Markov Chain Monte Carlo (MCMC) or variational expectation maximization (EM) are employed in practice. Such heuristics tend to be unreliable and scale poorly for large networks. On the other hand, community models with guaranteed learning methods tend to be restrictive. A popular class of probabilistic models, termed as the stochastic blockmodels, have been widely studied and enjoy strong theoretical learning guarantees, e.g. (White et al., 1976; Holland et al., 1983; Fienberg et al., 1985; Wang and Wong, 1987; Snijders and Nowicki, 1997; McSherry, 2001). However, they posit that an individual belongs to a single community, which does not hold in most real settings (Palla et al., 2005).
In this paper, we consider a class of mixed membership community models, originally introduced by Airoldi et al. (2008), and recently employed by Xing et al. (2010) and Gopalan et al. (2012). This model has been shown to be eﬀective in many real-world settings, but so far, no learning approach exists with provable guarantees. In this paper, we provide a novel learning approach for learning these models and establish regimes where the communities can be recovered eﬃciently. The mixed membership community model of Airoldi et al.
(2008) has a number of attractive properties. It retains many of the convenient properties of the stochastic block model. For instance, conditional independence of the edges is assumed, given the community memberships of the nodes in the network. At the same time, it allows for communities to overlap, and for every individual to be fractionally involved in diﬀerent communities. It includes the stochastic block model as a special case (corresponding to zero overlap among the diﬀerent communities). This enables us to compare our learning guarantees with existing works for stochastic block models, and also study how the extent of overlap among diﬀerent communities aﬀects the learning performance.
1.1. Summary of Results We now summarize the main contributions of this paper. We propose a novel approach for learning mixed membership community models of Airoldi et al. (2008). Our approach is a method-of-moments estimator and incorporates tensor spectral decomposition techniques.
We provide guarantees for our approach under a set of suﬃcient conditions. Finally, we compare our results to existing ones for the special case of the stochastic block model, where nodes belong to a single community.
Learning general mixed membership models: We present a uniﬁed approach for the mixed membership model of Airoldi et al. (2008). The extent of overlap between diﬀerent communities in this model class is controlled (roughly) through a single scalar parameter, termed as the Dirichlet concentration parameter α0 := i αi, when the community membership vectors are drawn from the Dirichlet distribution Dir(α). When α0 → 0, the mixed membership model degenerates to a stochastic block model. We propose a uniﬁed learning method for the class of mixed membership models. We provide explicit scaling requirements in terms of the extent of community overlaps (through α0 ), the network size n, the number of communities k, and the average edge connectivity across various communities.
For instance, for the special case, where p is the probability of an intra-community edge, A Tensor Spectral Approach to Learning Mixed Membership Community Models and q corresponds to the probability of inter-community connectivity, when the average community sizes are equal, we require that2
Thus, we require n to be large enough compared to the number of communities k, and for the separation p−q to be large enough, so that the learning method can distinguish the diﬀerent communities. Moreover, we see that the scaling requirements become more stringent as α0 increases. This is intuitive since it is harder to learn communities with more overlap, and we quantify this scaling. We also quantify the error bounds for estimating various parameters of the mixed membership model. Lastly, we establish zero-error guarantees for support recovery: our learning method correctly identiﬁes (w.h.p) all the signiﬁcant memberships of a node and also identiﬁes the set of communities where a node does not have a strong presence.
Learning Stochastic Block Models and Comparison with Previous Results:
For the special case of stochastic block models (α0 → 0), the scaling requirements in (2) reduces to p−q k ˜ ˜ n = Ω(k 2 ), √ =Ω, (2) 1/2 p n The above requirements match the best known bounds3 (up to poly-log factors), and were previously achieved by Yudong et al. (2012) via convex optimization. In contrast, we propose an iterative non-convex approach involving tensor power iterations and linear algebraic techniques, and obtain similar guarantees for the stochastic block model. For a detailed comparison of learning guarantees under various methods for learning stochastic block models, see (Yudong et al., 2012).
Thus, we provide guaranteed recovery of the communities under the mixed membership model, and our scaling requirements in (1) explicitly incorporate the extent of community overlaps. Many real-world networks involve sparse community memberships and the total number of communities is typically much larger than the extent of membership of a single individual, e.g. hobbies/interests of a person, university/company networks that a person belongs to, the set of transcription factors regulating a gene, and so on. Thus, we see that in this regime of practical interest, where α0 = Θ(1), the scaling requirements in (1) match those of the stochastic block model in (2) (up to polylog factors) without any degradation in learning performance. Thus, we establish that learning community models with sparse community memberships is akin to learning stochastic block models, and we present a uniﬁed learning approach and analysis for these models. To the best of our knowledge, this work is the ﬁrst to establish polynomial time learning guarantees for probabilistic network models with overlapping communities, and we provide a fast and an iterative learning approach through linear algebraic techniques and tensor power iterations.
˜ ˜ The notation Ω(·), O(·) denotes Ω(·), O(·) up to poly-log factors.
There are many methods which achieve the best known scaling for n in (2), but have worse scaling for the separation p − q. This includes variants of the spectral clustering method, e.g. (Chaudhuri et al., 2012).
See (Yudong et al., 2012) for a detailed comparison.
Anandkumar Ge Hsu Kakade
1.2. Overview of Techniques We now describe the main techniques employed in our learning approach and in establishing the recovery guarantees.
Method of moments and subgraph counts: We propose an eﬃcient learning algorithm based on low order moments, viz., counts of small subgraphs. Speciﬁcally, we employ a third-order tensor which counts the number of 3-stars in the observed network. A 3star is a star graph with three leaves and we count the occurrences of such 3-stars across diﬀerent groups of nodes. We establish that (suitably adjusted) 3-star count tensor has a simple relationship with the model parameters, when the network is drawn from a mixed membership community model. In particular, we propose a multi-linear transformation (termed as whitening) under which the canonical polyadic (CP) decomposition of the tensor yields the model parameters and the community vectors. Note that the decomposition of a general tensor into its rank-one components is referred to as its CP decomposition (Kolda and Bader, 2009) and is in general NP-hard (Hillar and Lim, 2012). However, we reduce our learning problem to an orthogonal symmetric tensor decomposition, for which tractable decomposition exists, as described below.
Tensor spectral decomposition via power iterations: Our tensor decomposition method is based on the popular tensor power iterations, e.g. see (Anandkumar et al., 2012a).
It is a simple iterative method to compute the stable eigen-pairs of a tensor. In this paper, we propose various modiﬁcations to the basic power method to strengthen the recovery guarantees under perturbations. For instance, we introduce a novel adaptive deﬂation techniques. We optimize performance for the regime where the community overlaps are small.
Sample analysis: We establish that our learning approach correctly recovers the model parameters and the community memberships of all nodes under exact moments. We then carry out a careful analysis of the empirical graph moments, computed using the network observations. We establish tensor concentration bounds and also control the perturbation of the various quantities used by our learning algorithm via matrix Bernstein’s inequality (Tropp, 2012, thm. 1.4) and other inequalities. We impose the scaling requirements in (1) for various concentration bounds to hold.
1.3. Related Work Many algorithms provide learning guarantees for stochastic block models. A popular method is based on spectral clustering (McSherry, 2001), where community memberships are inferred through projection onto the spectrum of the Laplacian matrix (or its variants).
This method is fast and easy to implement (via singular value decomposition). There are many variants of this method, e.g. the work by Chaudhuri et al. (2012) employs normalized Laplacian matrix to handle degree heterogeneities. In contrast, the work of (Yudong et al., 2012) uses convex optimization techniques via semi-deﬁnite programming learning block models. For a detailed comparison of learning guarantees under various methods for learning stochastic block models, see Yudong et al. (2012). Recently, some non-probabilistic approaches have been introduced with overlapping community models by Arora et al. (2012) and Balcan et al. (2012). However, their setting is considerably diﬀerent than the one in this A Tensor Spectral Approach to Learning Mixed Membership Community Models paper. We leverage the recent developments from Anandkumar et al. (2012c,a,b) for learning topic models and other latent variable models based on the method of moments. They consider learning these models from second- and third-order observed moments through linear algebraic and tensor-based techniques. We exploit the tensor power iteration method of Anandkumar et al. (2012b) and provide additional improvements to obtain stronger recovery guarantees. Moreover, the sample analysis is quite diﬀerent in the community setting compared to other latent variable models analyzed in the previous works.
where Γ(·) is the Gamma function and the ratio of the Gamma function serves as the normalization constant.
Let α denote the normalized parameter vector α/α0, where α0 := i αi. In particular, note that α is a probability vector: i αi = 1. Intuitively, α denotes the relative expected sizes of the communities (since E[n−1 u∈[n] πu [i]] = αi ). Let αmax be the largest entry in α, and αmin be the smallest entry. Our learning guarantees will depend on these parameters.
The stochastic block model is a limiting case of the mixed membership model when the Dirichlet parameter is α = α0 · α, where the probability vector α is held ﬁxed and α0 → 0.
Our analysis can easily be extended to weighted adjacency matrices with bounded entries.
We consider directed networks in this paper, but note that the results also hold for undirected community models, where P is a symmetric matrix, and an edge (u, v) is formed with probability πu P πv = πv P πu.
Anandkumar Ge Hsu Kakade