Post on 14-Dec-2015
KDD'13 1
Denser than the Densest Subgraph:
Extracting Optimal Quasi-Cliques with Quality Guarantees
Charalampos (Babis) E. Tsourakakis charalampos.tsourakakis@aalto.fi
KDD 2013
KDD'13 2
Francesco BonchiYahoo! Research
Aristides Gionis Aalto University
Francesco GulloYahoo! Research
Maria TsiarliUniversity of
Pittsburgh
KDD'13 3
Denser than the densest
Densest subgraph problem is very popular in practice. However, not what we want for many applications.
δ=edge density,D=diameter,τ=triangle density
KDD'13 4
Graph mining applications Thematic communities and spam link farms
[Gibson, Kumar, Tomkins ‘05] Graph visualization[Alvarez-Hamelin
etal.’05] Real time story identification [Angel et al.
’12] Motif detection [Batzoglou Lab ‘06] Epilepsy prediction [Iasemidis et al. ‘01] Finding correlated genes [Horvath et al.] Many more ..
KDD'13 5
Measures
Clique: each vertex in S connects to every other vertex in S.
α-Quasi-clique: the set S has at least α|S|(|S|-1)/2 edges.
k-core: every vertex connects to at least k other vertices in S.
K4
KDD'13 7
Contributions
General framework which subsumes popular density functions.
Optimal quasi-cliques.
An algorithm with additive error guarantees and a local-search heuristic.
Variants Top-k optimal quasi-cliques Successful team formation
KDD'13 8
Contributions
Experimental evaluation Synthetic graphs Real graphs
Applications Successful team formation of computer
scientists Highly-correlated genes from microarray
datasets First, some related work.
KDD'13 9
Cliques
K4
Maximum clique problem: find clique of maximum possible size.NP-complete problem
Unless P=NP, there cannot be a polynomial time algorithm that approximates the maximum clique problem within a factor better than for any ε>0 [Håstad ‘99].
KDD'13 10
(Some) Density Functions
k)
A single edge achieves always maximum possible δ(S)
Densest subgraph problem
k-Densest subgraph problem
DalkS (Damks)
KDD'13 11
Densest Subgraph Problem
Maximize average degree
Solvable in polynomial time Max flows (Goldberg) LP relaxation (Charikar)
Fast ½-approximation algorithm (Charikar)
KDD'13 12
k-Densest subgraph
k-densest subgraph problem is NP-hard
Feige, Kortsatz, Peleg Bhaskara, Charikar, Chlamtac, VijayraghavanAsahiro et al. AndersenKhuller, Saha [approximation algorithms], Khot [no PTAS].
KDD'13 13
Quasicliques
A set S of vertices is α-quasiclique if
[Uno ’10] introduces an algorithm to enumerate all α-quasicliques.
KDD'13 14
Edge-Surplus Framework
For a set of vertices S define
where g,h are both strictly increasing, α>0.
Optimal (α,g,h)-edge-surplus problemFind S* such that .
KDD'13 15
Edge-Surplus Framework
When g(x)=h(x)=log(x), α=1, thenOptimal (α,g,h)-edge-surplus problem becomes , which is the densest subgraph problem. g(x)=x, h(x)=0 if x=k, o/w +∞ we
get the k-densest subgraph problem.
KDD'13 16
Edge-Surplus Framework
When g(x)=x, h(x)=x(x-1)/2 then we obtain , which we define as the optimal quasiclique (OQC) problem.
Theorem 1: Let g(x)=x, h(x) concave. Then the optimal (α,g,h)-edge-surplus problem is poly-time solvable. However, this family is not well suited for
applications as it returns most of the graph.
KDD'13 17
Hardness of OQC
Conjecture: finding a planted clique C of size in a random binomial graph is hard.
Let . Then,
KDD'13 18
Multiplicative approximation algorithms
Notice that in general the optimal value can be negative.
We can obtain guarantees for a shifted objective but introduces large additive error making the algorithm almost useless, i.e., except for very special graphs.
Other type of guarantees more suitable.
KDD'13 19
Optimal Quasicliques
Additive error approximation algorithm
For downto 1▪ Let v be the smallest degree vertex in .
Output Theorem: Running time: O(n+m). However it would be nice to have running time O(|output|).
KDD'13 20
Optimal QuasicliquesLocal Search Heuristic
1. Initialize S with a random vertex.2. For t=1 to Tmax
1. Keep expanding S by adding at each time a vertex such that .
2. If not possible see whether there exist such that .
1. If yes, remove it. Go back to previous step. 2. If not, stop and output S.
KDD'13 22
Experiments
DS M1 M2 DS M1 M2 DS M1 M2 DS M1 M2
Wiki ‘05
24.5K
451 321 .26 .43 .48 3 3 2 .02 .06 .11
Youtube
1.9K
124 119 0.05
0.46
0.49
4 2 2 .02 .12 .14
KDD'13 24
Constrained Optimal Quasicliques
Given a set of vertices Q Lemma: NP-hard problem. Observation: Easy to adapt our
efficient algorithms to this setting. Local Search: Initialize S with Q and
never remove a vertex if it belongs to Q Greedy: Never peel off a vertex from Q
KDD'13 25
Application 1
Suppose that a set Q of scientists wants to organize a workshop. How do they invite other scientists to participate in the workshop so that the set of all participants, including Q, have similar interests ?
KDD'13 28
Application 2
Given a microarray dataset and a set of genes Q, find a set of genes S that includes Q and they are all highly correlated.
Co-expression network Measure gene expression across multiple
samples Create correlation matrix Edges between genes if their correlation is > ρ.
A dense subgraph in a co-expression network corresponds to a set of highly correlated genes.
KDD'13 30
Future Work
Hardness Analysis of local search algorithm Other algorithms with additive
approximation guarantees Study the natural family of
objectives