Clustering Social Networks

download Clustering Social Networks

of 22

  • date post

  • Category


  • view

  • download


Embed Size (px)


Clustering Social Networks. Isabelle Stanton, University of Virginia Joint work with Nina Mishra, Robert Schreiber, and Robert E. Tarjan. Outline. Motivation Previous Work Combinatorial properties ρ -champions An algorithm Evaluation of the algorithm. Motivation. - PowerPoint PPT Presentation

Transcript of Clustering Social Networks

  • Clustering Social NetworksIsabelle Stanton, University of Virginia

    Joint work with Nina Mishra, Robert Schreiber, and Robert E. Tarjan

  • OutlineMotivationPrevious WorkCombinatorial properties-championsAn algorithm Evaluation of the algorithm

  • MotivationMany large social networks:

    A fundamental problem is finding communities automaticallyViral and Targeted MarketingHelp form stronger communities

  • Previous WorkModularity:Compares the edge distribution with the expected distribution of a random graph with the same degreesM.E.J. Newman 2002Spectral Methods:Cuts the graph based on eigenvectors of the matrixKannan, Vempala, Vetta 2000, Spielman and Teng 1996, Shi and Malik 2000, Kempe and McSherry 2004, Karypis and Kumar 1998 and many othersBoth require disjoint partitions of all elements

  • Communities in Social NetworksDisjoint partitionings are not good for social networks

  • (, )-ClustersC is an (, )- cluster if:Internally Dense: Every vertex in the cluster neighbors at least a fraction of the clusterExternally Sparse: Every vertex outside the cluster neighbors at most an fraction of the cluster(1/4, 1)(1/4, 3/4)

  • Previous Work (, )-clustersSolved Areas:

    (1- ,1) Tsukiyama et al, Johnson et al.(0, ) connected components((1-), ) Abello et al, Hartuv and Shamir > + /2 Our work0011

  • Fundamental QuestionsHow many (, )-clusters can a graph contain?Depends on and Can (, )-clusters overlap?Yes, and there are boundsCan (, )-clusters contain other (, )-clusters?Yes, but it can be prevented

  • -ChampionsWes Anderson

  • Intuition behind the AlgorithmLet c be a -championIf v in C, then v and c share at least (2 -1)|C| neighbors

    If v is outside C then v and c share at most ( + )|C| neighbors


  • AlgorithmInput: , , G, s = size of clusterOutput: All (, ) clusters with -champions

    for each c in V doC = 0For each v within two steps of c doIf v and c share (2 1)s neighbors then add v to CIf C is an (, )-cluster then output C

  • Algorithmic GuaranteesClaim: Our algorithm will find all clusters where > + ( + )/2Runs in O(d0.7n1.9+n2+o(1)) time where d is the average degreed is small for social networks so O(n2)

  • EvaluationDo -champions exist in real graphs?

    Tsukiyamas algorithm finds all maximal cliques ((1-, 1)-clusters) in a graphWe compare our algorithms output with Tsukiyamas ground truth

  • HEP Co-Author Dataset ResultsFound 115 of 126 clusters ~ 90%

  • Theory Co-Author Dataset ResultsFound 797 of 854 clusters ~ 93%

  • LiveJournal Dataset ResultsToo big to run Tsukiyama. Found 4289 clusters, 876 have large -champions

  • Future WorkAlgorithms for < Relaxing -champion restrictionWeighted and directed graphsDecentralized algorithmsStreaming algorithms

  • ConclusionsDefined (, )-clustersExplored some combinatorial propertiesIntroduced -championsDeveloped an algorithm for a subset of the problem

  • Timing* Estimated Running Time 25 weeksAll experiments written in Python and run on a machine with 2 dual core 3 GHz Intel Xeons and 16 GB of RAM

  • DatasetsHigh Energy Physics Co-Authorship GraphTheory Co-authorship graphA subset of

    (v) = the neighbors and neighbors neighbors of v

  • Combinatorial Properties - OverlapsLet A and B be (, )-clusters with |A|=|B|Theorem: A and B overlap by at most (1-(-))|A| vertices0011

  • Previous Work - ModularityCompares the edge distribution with the expected distribution of a random graph with the same degreesMany competitive methods developedInherently defined as a partitioning Introduced by Newman (2002)

    Introduce what a community is: internally dense, externally sparse explain viral marketing Graphs are interesting to sociologist, anthropologists etc for studying how societies workPartitions require that every vertex is in exactly 1 clusterUse this to explain why partitionings are badNatural way of describing communitiesAllows overlapping clustersDoes not require that every element be clustered

    If every member of the cluster has more neighbours outside the cluster than inside the cluster, then what defines the cluster. We argue that rho-champions are a natural assumption for communities because they assume there is someone who is more into the cluster than anything else and defines the cluster. Formally, a rho champion is someone who is in an (alpha,beta) cluster, C, and has at most rho times the size of the cluster neighbours outside where rho < 1.If (2b-1)>(r+a) then we can distinguish between these two setsWe run this for the sizes of interestWe filtered Tsukiyamas output so that it was a reasonable comparasion. Only cliques of size greater than 5 with alpha < .5 were considered.Talk about how we didnt find the ones without predicted rho championsSame story hereEstimated 25 weeks. We found 4289 LJ cliques, 876 have -champions beyond the predicted limitStreaming social networks are not static, how does this affect the algorithmThe following can be proved This theorem shows that clusters do not arbitrarily overlapIntuitively, finds areas with more edges than expected internally and fewer than expected internally