FREE ELECTRONIC LIBRARY - Dissertations, online materials

Pages:   || 2 |

«JMLR: Workshop and Conference Proceedings vol 30 (2013) 1–62 A Tensor Spectral Approach to Learning Mixed Membership Community Models Animashree ...»

-- [ Page 1 ] --

JMLR: Workshop and Conference Proceedings vol 30 (2013) 1–62

A Tensor Spectral Approach to Learning Mixed Membership

Community Models

Animashree Anandkumar a.anandkumar@uci.edu

University of California, Irvine

Rong Ge rongge@cs.princeton.edu

Princeton University

Daniel Hsu dahsu@microsoft.com

Microsoft Research Sham M. Kakade skakade@microsoft.com Microsoft Research Abstract Detecting hidden communities from observed interactions is a classical problem. Theoretical analysis of community detection has so far been mostly limited to models with non-overlapping communities such as the stochastic block model. In this paper, we provide guaranteed community detection for a family of probabilistic network models with overlapping communities, termed as the mixed membership Dirichlet model, first introduced in Airoldi et al. (2008). This model allows for nodes to have fractional memberships in multiple communities and assumes that the community memberships are drawn from a Dirichlet distribution. Moreover, it contains the stochastic block model as a special case.

We propose a unified approach to learning communities in these models via a tensor spectral decomposition approach. Our estimator uses low-order moment tensor of the observed network, consisting of 3-star counts. Our learning method is based on simple linear algebraic operations such as singular value decomposition and tensor power iterations. We provide guaranteed recovery of community memberships and model parameters, and present a careful finite sample analysis of our learning method. Additionally, our results match the best known scaling requirements for the special case of the (homogeneous) stochastic block model.

Keywords: Community detection, spectral methods, tensor methods, moment-based estimation, mixed membership models.

1. Introduction1 Studying communities forms an integral part of social network analysis. A community generally refers to a group of individuals with shared interests (e.g. music, sports), or relationships (e.g. friends, co-workers). Various probabilistic and non-probabilistic network models attempt to explain community formation. In addition, they also attempt to quantify interactions and the extent of overlap between different communities, relative sizes among Part of this work was done when AA and RG were visiting MSR New England. AA is supported in part by the NSF Career award CCF-1254106, NSF Award CCF-1219234, AFOSR Award FA9550-10-1-0310 and the ARO Award W911NF-12-1-0404.

–  –  –

the communities, and various other network properties. Studying such community models is also of interest in other domains, e.g. in biological networks.

While there exists a vast literature on community models, learning these models is typically challenging, and various heuristics such as Markov Chain Monte Carlo (MCMC) or variational expectation maximization (EM) are employed in practice. Such heuristics tend to be unreliable and scale poorly for large networks. On the other hand, community models with guaranteed learning methods tend to be restrictive. A popular class of probabilistic models, termed as the stochastic blockmodels, have been widely studied and enjoy strong theoretical learning guarantees, e.g. (White et al., 1976; Holland et al., 1983; Fienberg et al., 1985; Wang and Wong, 1987; Snijders and Nowicki, 1997; McSherry, 2001). However, they posit that an individual belongs to a single community, which does not hold in most real settings (Palla et al., 2005).

In this paper, we consider a class of mixed membership community models, originally introduced by Airoldi et al. (2008), and recently employed by Xing et al. (2010) and Gopalan et al. (2012). This model has been shown to be effective in many real-world settings, but so far, no learning approach exists with provable guarantees. In this paper, we provide a novel learning approach for learning these models and establish regimes where the communities can be recovered efficiently. The mixed membership community model of Airoldi et al.

(2008) has a number of attractive properties. It retains many of the convenient properties of the stochastic block model. For instance, conditional independence of the edges is assumed, given the community memberships of the nodes in the network. At the same time, it allows for communities to overlap, and for every individual to be fractionally involved in different communities. It includes the stochastic block model as a special case (corresponding to zero overlap among the different communities). This enables us to compare our learning guarantees with existing works for stochastic block models, and also study how the extent of overlap among different communities affects the learning performance.

1.1. Summary of Results We now summarize the main contributions of this paper. We propose a novel approach for learning mixed membership community models of Airoldi et al. (2008). Our approach is a method-of-moments estimator and incorporates tensor spectral decomposition techniques.

We provide guarantees for our approach under a set of sufficient conditions. Finally, we compare our results to existing ones for the special case of the stochastic block model, where nodes belong to a single community.

Learning general mixed membership models: We present a unified approach for the mixed membership model of Airoldi et al. (2008). The extent of overlap between different communities in this model class is controlled (roughly) through a single scalar parameter, termed as the Dirichlet concentration parameter α0 := i αi, when the community membership vectors are drawn from the Dirichlet distribution Dir(α). When α0 → 0, the mixed membership model degenerates to a stochastic block model. We propose a unified learning method for the class of mixed membership models. We provide explicit scaling requirements in terms of the extent of community overlaps (through α0 ), the network size n, the number of communities k, and the average edge connectivity across various communities.

For instance, for the special case, where p is the probability of an intra-community edge, A Tensor Spectral Approach to Learning Mixed Membership Community Models and q corresponds to the probability of inter-community connectivity, when the average community sizes are equal, we require that2

–  –  –

Thus, we require n to be large enough compared to the number of communities k, and for the separation p−q to be large enough, so that the learning method can distinguish the different communities. Moreover, we see that the scaling requirements become more stringent as α0 increases. This is intuitive since it is harder to learn communities with more overlap, and we quantify this scaling. We also quantify the error bounds for estimating various parameters of the mixed membership model. Lastly, we establish zero-error guarantees for support recovery: our learning method correctly identifies (w.h.p) all the significant memberships of a node and also identifies the set of communities where a node does not have a strong presence.

Learning Stochastic Block Models and Comparison with Previous Results:

For the special case of stochastic block models (α0 → 0), the scaling requirements in (2) reduces to p−q k ˜ ˜ n = Ω(k 2 ), √ =Ω, (2) 1/2 p n The above requirements match the best known bounds3 (up to poly-log factors), and were previously achieved by Yudong et al. (2012) via convex optimization. In contrast, we propose an iterative non-convex approach involving tensor power iterations and linear algebraic techniques, and obtain similar guarantees for the stochastic block model. For a detailed comparison of learning guarantees under various methods for learning stochastic block models, see (Yudong et al., 2012).

Thus, we provide guaranteed recovery of the communities under the mixed membership model, and our scaling requirements in (1) explicitly incorporate the extent of community overlaps. Many real-world networks involve sparse community memberships and the total number of communities is typically much larger than the extent of membership of a single individual, e.g. hobbies/interests of a person, university/company networks that a person belongs to, the set of transcription factors regulating a gene, and so on. Thus, we see that in this regime of practical interest, where α0 = Θ(1), the scaling requirements in (1) match those of the stochastic block model in (2) (up to polylog factors) without any degradation in learning performance. Thus, we establish that learning community models with sparse community memberships is akin to learning stochastic block models, and we present a unified learning approach and analysis for these models. To the best of our knowledge, this work is the first to establish polynomial time learning guarantees for probabilistic network models with overlapping communities, and we provide a fast and an iterative learning approach through linear algebraic techniques and tensor power iterations.

˜ ˜ The notation Ω(·), O(·) denotes Ω(·), O(·) up to poly-log factors.

There are many methods which achieve the best known scaling for n in (2), but have worse scaling for the separation p − q. This includes variants of the spectral clustering method, e.g. (Chaudhuri et al., 2012).

See (Yudong et al., 2012) for a detailed comparison.

Anandkumar Ge Hsu Kakade

1.2. Overview of Techniques We now describe the main techniques employed in our learning approach and in establishing the recovery guarantees.

Method of moments and subgraph counts: We propose an efficient learning algorithm based on low order moments, viz., counts of small subgraphs. Specifically, we employ a third-order tensor which counts the number of 3-stars in the observed network. A 3star is a star graph with three leaves and we count the occurrences of such 3-stars across different groups of nodes. We establish that (suitably adjusted) 3-star count tensor has a simple relationship with the model parameters, when the network is drawn from a mixed membership community model. In particular, we propose a multi-linear transformation (termed as whitening) under which the canonical polyadic (CP) decomposition of the tensor yields the model parameters and the community vectors. Note that the decomposition of a general tensor into its rank-one components is referred to as its CP decomposition (Kolda and Bader, 2009) and is in general NP-hard (Hillar and Lim, 2012). However, we reduce our learning problem to an orthogonal symmetric tensor decomposition, for which tractable decomposition exists, as described below.

Tensor spectral decomposition via power iterations: Our tensor decomposition method is based on the popular tensor power iterations, e.g. see (Anandkumar et al., 2012a).

It is a simple iterative method to compute the stable eigen-pairs of a tensor. In this paper, we propose various modifications to the basic power method to strengthen the recovery guarantees under perturbations. For instance, we introduce a novel adaptive deflation techniques. We optimize performance for the regime where the community overlaps are small.

Sample analysis: We establish that our learning approach correctly recovers the model parameters and the community memberships of all nodes under exact moments. We then carry out a careful analysis of the empirical graph moments, computed using the network observations. We establish tensor concentration bounds and also control the perturbation of the various quantities used by our learning algorithm via matrix Bernstein’s inequality (Tropp, 2012, thm. 1.4) and other inequalities. We impose the scaling requirements in (1) for various concentration bounds to hold.

1.3. Related Work Many algorithms provide learning guarantees for stochastic block models. A popular method is based on spectral clustering (McSherry, 2001), where community memberships are inferred through projection onto the spectrum of the Laplacian matrix (or its variants).

This method is fast and easy to implement (via singular value decomposition). There are many variants of this method, e.g. the work by Chaudhuri et al. (2012) employs normalized Laplacian matrix to handle degree heterogeneities. In contrast, the work of (Yudong et al., 2012) uses convex optimization techniques via semi-definite programming learning block models. For a detailed comparison of learning guarantees under various methods for learning stochastic block models, see Yudong et al. (2012). Recently, some non-probabilistic approaches have been introduced with overlapping community models by Arora et al. (2012) and Balcan et al. (2012). However, their setting is considerably different than the one in this A Tensor Spectral Approach to Learning Mixed Membership Community Models paper. We leverage the recent developments from Anandkumar et al. (2012c,a,b) for learning topic models and other latent variable models based on the method of moments. They consider learning these models from second- and third-order observed moments through linear algebraic and tensor-based techniques. We exploit the tensor power iteration method of Anandkumar et al. (2012b) and provide additional improvements to obtain stronger recovery guarantees. Moreover, the sample analysis is quite different in the community setting compared to other latent variable models analyzed in the previous works.

–  –  –

where Γ(·) is the Gamma function and the ratio of the Gamma function serves as the normalization constant.

Let α denote the normalized parameter vector α/α0, where α0 := i αi. In particular, note that α is a probability vector: i αi = 1. Intuitively, α denotes the relative expected sizes of the communities (since E[n−1 u∈[n] πu [i]] = αi ). Let αmax be the largest entry in α, and αmin be the smallest entry. Our learning guarantees will depend on these parameters.

The stochastic block model is a limiting case of the mixed membership model when the Dirichlet parameter is α = α0 · α, where the probability vector α is held fixed and α0 → 0.

Our analysis can easily be extended to weighted adjacency matrices with bounded entries.

We consider directed networks in this paper, but note that the results also hold for undirected community models, where P is a symmetric matrix, and an edge (u, v) is formed with probability πu P πv = πv P πu.

Anandkumar Ge Hsu Kakade

Pages:   || 2 |

Similar works:

«THE ASSERT SET OF TOOLS FOR ENGINEERING (TASTE): DEMONSTRATOR, HW/SW CODESIGN, AND FUTURE Marc Pollina (1), Yann Leclerc (1), Eric Conquet (2), Maxime Perrotin(2), Guy Bois (3), Laurent Moss (3) (1) M3Systems, 26, rue du soleil levant, 31410 Lavernose, France, Email: {pollina,leclerc}@m3systems.net (2) ESA-ESTEC, Noordwijk, The Netherlands, Email: {eric.conquet, maxime.perrotin}@esa.int (3) Space Codesign Systems Inc., 450 St-Pierre, suite 1010, Montreal, Quebec, H2Y 2M9 Canada, Email:...»

«The Needles-In-Haystack Problem Katherine Moreland1 and Klaus Truemper2 1 The MITRE Corporation, McLean, VA 22102, U.S.A. 2 Department of Computer Science, University of Texas at Dallas, Richardson, TX 75083, U.S.A. Abstract. We consider a new data mining problem of detecting the members of a rare class of data, the needles, that have been hidden in a set of records, the haystack. Besides the haystack, a single instance of a needle is given. It is assumed that members of the needle class are...»

«B 1981 Suppliment tal-Gazzetta tal-Gvern ta’ Malta, Nru. 17,907, 11 ta’ April, 2006 Taqsima B A.L. 99 ta’ l-2006 ATT DWAR L-EDUKAZZJONI (KAP. 327) Ir-Regolamenti ta’ l-2006 tal-Kors għall-Postgraduate Diploma – P.G. Dip. – u għall-Grad ta’ Master of Science – M.Sc. – in Integrated Product Development taħt il-patroċinju tal-Fakulta’ ta’ l-Inġinerija BIS-SAĦĦA tas-setgħat mogħtija lilu bl-artikoli 30 (5) u 31 (6) ta’ l-Att dwar l-Edukazzjoni (Kap. 327),...»

«BA World Languages Frequently Asked Questions Question: I am currently studying both French and German for my leaving cert and I was wondering if this course will allow me to continue my study of both these languages at advanced level or can only one language be taken from advanced level? I was also wondering if I could possibly study three languages with this course? Answer: In First Arts (World Languages) you can take both 15 credits of French (for which all students must have a pass in LC)...»

«Proyecto PNUD ARG 07/G35 “Manejo Sustentable de Ecosistemas Áridos y Semiáridos para el Control de la Desertificación en la Patagonia” Informe final GEF Proyecto: “Desarrollo del circuito productivo de fibras de Guanaco en la Estepa Patagónica” “Manejo Sustentable de Ecosistemas Áridos y Semiáridos para el Control de la Desertificación en la Patagonia” PNUD-GEF ARG07/G35, Secretaría de Ambiente y Desarrollo sustentable de la Nación. Índice 1. Introducción..2 2. Camélidos...»

«TEN KEY QUESTIONS TO ASK BEFORE JOINING A BOARD CARL E. METZGER AND BRIAN H. MUKHERJEE, GOODWIN PROCTER LLP Congratulations. You have been invited to serve on a company’s board of directors. The invitation no doubt acknowledges the value of your skills and experience, and you are excited about the idea of helping the company find success. There are some important issues to consider, however, before taking up the offer. Simply put, serving on a board of directors has never been more risky....»

«Kalamullah.Com Contents Publisher’s Foreword 10 Introduction 17 They did not benefit 19 1. What are we going to learn? 22 2. Why do we search for skills? 24 3. Improve yourself 28 4. Do not cry over spilt milk 32 5. Be unique 35 6. Who is the most beloved to you? 39 7. Enjoy the skills 48 8. With the poor 52 9. With women 55 10. With children. 61 11. With slaves and servants 66 12. With adversaries 69 13. 5 With animals 78 14. A hundred ways to win people’s hearts 82 15. Purify your...»

«1 WHITE PRIVILEGE AND WHITE GUILT An Analysis of “White Privilege and White Guilt” Simone Kirwan Faculty of Education University of Manitoba WHITE PRIVILEGE AND WHITE GUILT Introduction In the recent sociological literature, there is increasing interest in ‘white privilege’ and ‘white guilt’. Both these concepts have affected me personally and professionally. Thus, when I began reading about white privilege, I became very interested in this idea. However, I was not completed...»

«STRATEGIC PLAN (Revised, May 2007) MISSION STATEMENT The Department of Chemistry and Biochemistry is committed to providing high quality education in the chemical and biochemical sciences for undergraduate and graduate students, producing research contributions that are recognized nationally and internationally, and making service contributions at all levels. VISION STATEMENT The Department of Chemistry and Biochemistry aspires to advance our reputation to top 50 national status, noted for...»

«Building with god: Battling adversity inside (nehemiah 5:1-13) Introduction The last time we were with Nehemiah he was dealing with adversity from outside the camp. The problems were the danger of physical harm from enemies and discouragement due to ridicule, reviling, and the difficulty of working in the midst of tremendous rubble and ongoing threats. Why were their enemies threatened? Rebuilding efforts were bearing fruit and drawing the attention and spiteful commentary of surrounding...»

«What is algebraic in process theory? Luttik, S.P. Published: 01/01/2006 Document Version Publisher’s PDF, also known as Version of Record (includes final page, issue and volume numbers) Please check the document version of this publication: • A submitted manuscript is the author’s version of the article upon submission and before peer-review. There can be important differences between the submitted version and the official published version of record. People interested in the research are...»

«WWW.YAZDANPRESS.COM WWW.YAZDANPRESS.COM er!™ g Everything Easi Makin ging Mana NE ALL-IN-O 10 1 BOOKS IN • Managing • Leadership • Project Management • Leading Business Change • Conflict Resolution At Work • Critical Conversations • The Leadership Brain • Performance Appraisals • Communicating Effectively • Managing Teams WWW.YAZDANPRESS.COM WWW.YAZDANPRESS.COM Get More and Do More at Dummies.com® Start with FREE Cheat Sheets Cheat Sheets include • Checklists • Charts...»

<<  HOME   |    CONTACTS
2016 www.dissertation.xlibx.info - Dissertations, online materials

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.