Principal component analysis and matrix factorizations for learning (part 2) ding - icml 2005...

44
PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 56 Part 2. Spectral Clustering from Matrix Perspective A brief tutorial emphasizing recent developments (More detailed tutorial is given in ICML’04 )

Transcript of Principal component analysis and matrix factorizations for learning (part 2) ding - icml 2005...

Page 1: Principal component analysis and matrix factorizations for learning (part 2)   ding - icml 2005 tutorial - 2005

PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 56

Part 2. Spectral Clustering from Matrix Perspective

A brief tutorial emphasizing recent developments

(More detailed tutorial is given in ICML’04 )

Page 2: Principal component analysis and matrix factorizations for learning (part 2)   ding - icml 2005 tutorial - 2005

PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 57

From PCA to spectral clusteringusing generalized eigenvectors

∑=j iji wd

In Kernel PCA we compute eigenvector: vWv λ=

Consider the kernel matrix:

Generalized Eigenvector:

)(),( jiij xxW φφ=

DqWq λ=

),,( 1 ndddiagD L=

This leads to Spectral Clustering !

Page 3: Principal component analysis and matrix factorizations for learning (part 2)   ding - icml 2005 tutorial - 2005

PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 58

Indicator Matrix Quadratic ClusteringFramework

Unsigned Cluster indicator Matrix H=(h1, …, hK)

0,..),Tr( max ≥= HIHHtsWHH TTH

;XXW T=

Kernel K-means clustering:

Spectral clustering (normalized cut)

K-means: ))(),(( ><= ji xxW φφKernel K-means

0,..),Tr( max ≥= HIDHHtsWHH TTH

Page 4: Principal component analysis and matrix factorizations for learning (part 2)   ding - icml 2005 tutorial - 2005

PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 59

Brief Introduction to Spectral Clustering(Laplacian matrix based clustering)

Page 5: Principal component analysis and matrix factorizations for learning (part 2)   ding - icml 2005 tutorial - 2005

PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 60

Some historical notes• Fiedler, 1973, 1975, graph Laplacian matrix• Donath & Hoffman, 1973, bounds• Hall, 1970, Quadratic Placement (embedding)• Pothen, Simon, Liou, 1990, Spectral graph

partitioning (many related papers there after)• Hagen & Kahng, 1992, Ratio-cut• Chan, Schlag & Zien, multi-way Ratio-cut• Chung, 1997, Spectral graph theory book• Shi & Malik, 2000, Normalized Cut

Page 6: Principal component analysis and matrix factorizations for learning (part 2)   ding - icml 2005 tutorial - 2005

PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 61

Spectral Gold-Rush of 20019 papers on spectral clustering

• Meila & Shi, AI-Stat 2001. Random Walk interpreation of Normalized Cut

• Ding, He & Zha, KDD 2001. Perturbation analysis of Laplacian matrix on sparsely connected graphs

• Ng, Jordan & Weiss, NIPS 2001, K-means algorithm on the embeded eigen-space

• Belkin & Niyogi, NIPS 2001. Spectral Embedding• Dhillon, KDD 2001, Bipartite graph clustering• Zha et al, CIKM 2001, Bipartite graph clustering• Zha et al, NIPS 2001. Spectral Relaxation of K-means• Ding et al, ICDM 2001. MinMaxCut, Uniqueness of relaxation.• Gu et al, K-way Relaxation of NormCut and MinMaxCut

Page 7: Principal component analysis and matrix factorizations for learning (part 2)   ding - icml 2005 tutorial - 2005

PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 62

Spectral Clustering

min cutsize , without explicit size constraints

Need to balance sizes

But where to cut ?

Page 8: Principal component analysis and matrix factorizations for learning (part 2)   ding - icml 2005 tutorial - 2005

PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 63

Graph Clustering

max within-cluster similarities (weights)

min between-cluster similarities (weights)

∑∑∈ ∈

=Ai Bj

ijw(A,B) sim

∑∑∈ ∈

=Ai Aj

ijw(A,A) sim

Balance weight

Balance size

Balance volume

Page 9: Principal component analysis and matrix factorizations for learning (part 2)   ding - icml 2005 tutorial - 2005

PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 64

Clustering Objective Functions

• Ratio Cut

• Normalized Cut

• Min-Max-Cut

|B|s(A,B)

|A|s(A,B)

(A,B)J Rcut +=

),(),),(

),(),(),(

BAsBs(BBAs

BAsAAsBAs

++

+=

s(B,B)s(A,B)

s(A,A)s(A,B)(A,B)JMMC +=

BANcut d

BAsd

BAsBAJ ),(),(),( +=

∑∑∈ ∈

=Ai Bj

ijws(A,B)

∑∈

=Ai

iA dd

Page 10: Principal component analysis and matrix factorizations for learning (part 2)   ding - icml 2005 tutorial - 2005

PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 65

Normalized Cut (Shi & Malik, 2000)

Min similarity between A & B: ∑∈

∑∈

=Ai Bj

ijws(A,B)

Balance weights

⎪⎩

⎪⎨⎧

∈−∈

=BidddAiddd

iqBA

AB

if if

//

)(Cluster indicator:

BANcut d

BAsd

BAsBAJ ),(),(),( += ∑∈

=Ai

iA dd

∑∈

=Gi

idd

0,1 == DeqDqq TTNormalization: Substitute q leads to qWDq(q)J T

Ncut )( −=

)1()( −+− DqqqWDq TT λqmin

DqqWD λ=− )(Solution is eigenvector of

Page 11: Principal component analysis and matrix factorizations for learning (part 2)   ding - icml 2005 tutorial - 2005

PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 66

A simple example2 dense clusters, with sparse connections between them.

Eigenvector q2Adjacency matrix

Page 12: Principal component analysis and matrix factorizations for learning (part 2)   ding - icml 2005 tutorial - 2005

PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 67

K-way Spectral ClusteringK ≥ 2

Page 13: Principal component analysis and matrix factorizations for learning (part 2)   ding - icml 2005 tutorial - 2005

PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 68

K-way Clustering Objectives

• Ratio Cut

• Normalized Cut

• Min-Max-Cut

∑∑ −=⎟⎟

⎞⎜⎜⎝

⎛+=

>< k k

kk

lk l

lk

k

lkK ||C

C,GCs||C

,CCs||C

,CCsCCJ

)()()(),,(

,1 LRcut

∑∑ −=⎟⎟

⎞⎜⎜⎝

⎛+=

>< k k

kk

lk l

lk

k

lkK d

C,GCsd

,CCsd

,CCsCCJ

)()()(),,(

,1 LNcut

∑∑ −=⎟⎟

⎞⎜⎜⎝

⎛+=

>< k kk

kk

lk ll

lk

kk

lkK CCs

C,GCsCCs,CCs

CCs,CCs

CCJ),(

)(),()(

),()(

),,(,

1 LMMC

Page 14: Principal component analysis and matrix factorizations for learning (part 2)   ding - icml 2005 tutorial - 2005

PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 69

K-way Spectral Relaxation

Tk

T

T

h

h

h

)11,00,00(

)00,11,00(

)00,00,11(

2

1

LLL

LLL

LLL

LLL

=

=

=Unsigned cluster indicators:

kTk

kTk

T

T

k hhhWDh

hhhWDhhhJ

)()(),,(11

111

−++−= LLRcut

Re-write:

kTk

kTk

T

T

k DhhhWDh

DhhhWDhhhJ

)()(),,(11

111

−++−= LLNcut

kTk

kTk

T

T

k WhhhWDh

WhhhWDhhhJ

)()(),,(11

111

−++−= LLMMC

Page 15: Principal component analysis and matrix factorizations for learning (part 2)   ding - icml 2005 tutorial - 2005

PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 70

K-way Normalized Cut Spectral Relaxation

Unsigned cluster indicators:

))~((

)~()~(),,( 111

YWIY

yWIyyWIyyyJT

kTk

Tk

−=

−++−=

TrNcut LL

Re-write:

By K. Fan’s theorem, optimal solution is eigenvectors: Y=(v1,v2, …, vk),

),,(min 11 kk yyJ LL Ncut≤++ λλ (Gu, et al, 2001)

}||||/)00,11,00( 2/12/1

kT

n

k hDDyk

LLL=

IYYYWIY TT

Y=− tosubject Tr:Optimize ),)~((min

2/12/1~ −−= WDDW

kkk vvWI λ=− )~(

kkkkk vDuDuuWD 2/1,)( −==− λ

Page 16: Principal component analysis and matrix factorizations for learning (part 2)   ding - icml 2005 tutorial - 2005

PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 71

K-way Spectral Clustering is difficult

• Spectral clustering is best applied to 2-way clustering – positive entries for one cluster – negative entries for another cluster

• For K-way (K>2) clustering– Positive and negative signs make cluster

assignment difficult– Recursive 2-way clustering– Low-dimension embedding. Project the data to

eigenvector subspace; use another clustering method such as K-means to cluster the data (Ng et al; Zha et al; Back & Jordan, etc)

– Linearized cluster assignment using spectral ordering and cluster crossing

Page 17: Principal component analysis and matrix factorizations for learning (part 2)   ding - icml 2005 tutorial - 2005

PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 72

Scaled PCA: a Unified Framework for clustering and ordering

• Scaled PCA has two optimality properties– Distance sensitive ordering– Min-max principle Clustering

• SPCA on contingency table ⇒ Correspondence Analysis– Simultaneous ordering of rows and columns– Simultaneous clustering of rows and columns

Page 18: Principal component analysis and matrix factorizations for learning (part 2)   ding - icml 2005 tutorial - 2005

PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 73

Scaled PCAsimilarity matrix S=(sij) (generated from XXT)

Nonlinear re-scaling:

DqqDDzzDDSDSk

Tkkk

k

Tkkk ⎥

⎤⎢⎣

⎡=== ∑∑ λλ 21

21

21

21

~

2/1.. )/(~ ,~

21

21

jiijij ssssSDDS == −−

qk = D-1/2 zk is the scaled principal component

Apply SVD on ⇒S~

),,(diag 1 nddD L= .ii sd =

1..,/,1 02/1

00 === qsdzλDqqDsddS

k

Tkkk

T ../ 1∑

==−⇒ λ

Subtract trivial component

(Ding, et al, 2002)

Page 19: Principal component analysis and matrix factorizations for learning (part 2)   ding - icml 2005 tutorial - 2005

PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 74

Scaled PCA on a Rectangle Matrix⇒ Correspondence Analysis

Nonlinear re-scaling: 2/1.. )(~ ,~ /2

121

jiijijcr ppppPDDP == −−

are the scaled row and column principal component (standard coordinates in CA)

Apply SVD on P~

ck

Tkkkr

T DgfDprcP ..1

/ ∑=

=− λ

Subtract trivial component

Tnppr ),,( ..1 L=

Tnppc ),,( .1. L=

kckkrk vDguDf 21

21

, −− ==

Page 20: Principal component analysis and matrix factorizations for learning (part 2)   ding - icml 2005 tutorial - 2005

PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 75

Correspondence Analysis (CA)

• Mainly used in graphical display of data• Popular in France (Benzécri, 1969)• Long history

– Simultaneous row and column regression (Hirschfeld, 1935)

– Reciprocal averaging (Richardson & Kuder, 1933; Horst, 1935; Fisher, 1940; Hill, 1974)

– Canonical correlations, dual scaling, etc.• Formulation is a bit complicated (“convoluted”

Jolliffe, 2002, p.342)• “A neglected method”, (Hill, 1974)

Page 21: Principal component analysis and matrix factorizations for learning (part 2)   ding - icml 2005 tutorial - 2005

PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 76

Clustering of Bipartite Graphs (rectangle matrix)

Simultaneous clustering of rows and columnsof a contingency table (adjacency matrix B )

Examples of bipartite graphs

• Information Retrieval: word-by-document matrix

• Market basket data: transaction-by-item matrix

• DNA Gene expression profiles

• Protein vs protein-complex

Page 22: Principal component analysis and matrix factorizations for learning (part 2)   ding - icml 2005 tutorial - 2005

PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 77

Bipartite Graph Clustering

⎩⎨⎧

∈−∈

=2

1

if1 if1

)(RrRr

ifi

i

⎩⎨⎧

∈−∈

=2

1

if1 if1

)(CcCc

igi

i

Clustering indicators for rows and columns:

⎟⎟⎠

⎞⎜⎜⎝

⎛=

2212

2111

,,

,,

CRCR

CRCR

BBBB

B ⎟⎟⎠

⎞⎜⎜⎝

⎛=

00

TBB

W ⎟⎟⎠

⎞⎜⎜⎝

⎛=

gf

q

)()(

)()(

),;,(22

12

11

122121 Ws

WsWsWs

RRCCJ MMC +=Substitute and obtain

⎟⎟⎠

⎞⎜⎜⎝

⎛⎟⎟⎠

⎞⎜⎜⎝

⎛=⎟⎟

⎞⎜⎜⎝

⎥⎥⎦

⎢⎢⎣

⎡⎟⎟⎠

⎞⎜⎜⎝

⎛−⎟⎟

⎞⎜⎜⎝

⎛gf

DD

gf

BB

DD

c

rT

c

r λ0

0f,g are determined by

Page 23: Principal component analysis and matrix factorizations for learning (part 2)   ding - icml 2005 tutorial - 2005

PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 78

Spectral Clustering of Bipartite Graphs

)(2)()(

)(2)()(

),;,(22

1221

11

1221

,

,,

,

,,2121

CR

CRCR

CR

CRCRMMC Bs

BsBsBs

BsBsRRCCJ

++

+=

Simultaneous clustering of rows and columns(adjacency matrix B )

cut

min between-cluster sum of xyz weights: s(R1,C2), s(R2,C1)

max within-cluster sum of xyz xyz weights: s(R1,C1), s(R2,C2)

(Ding, AI-STAT 2003)

∑ ∑∈ ∈

=1 2

21)( ,

Rr CcijCR

i j

bBs

Page 24: Principal component analysis and matrix factorizations for learning (part 2)   ding - icml 2005 tutorial - 2005

PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 79

Internet Newsgroups

Simultaneous clustering of documents and words

Page 25: Principal component analysis and matrix factorizations for learning (part 2)   ding - icml 2005 tutorial - 2005

PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 80

Embedding in Principal Subspace

Cluster Self-Aggregation(proved in perturbation analysis)

(Hall, 1970, “quadratic placement” (embedding) a graph)

Page 26: Principal component analysis and matrix factorizations for learning (part 2)   ding - icml 2005 tutorial - 2005

PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 81

Spectral Embedding: Self-aggregation

(Ding, 2004)

• Compute K eigenvectors of the Laplacian.• Embed objects in the K-dim eigenspace

Page 27: Principal component analysis and matrix factorizations for learning (part 2)   ding - icml 2005 tutorial - 2005

PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 82

Spectral embedding is not topology preserving

700 3-D data points form 2 interlock rings

In eigenspace, they shrink and separate

Page 28: Principal component analysis and matrix factorizations for learning (part 2)   ding - icml 2005 tutorial - 2005

PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 83

Spectral Embedding

(Ding, 2004)

Simplex Embedding Theorem.Objects self-aggregate to K centroidsCentroids locate on K corners of a simplex

• Simplex consists K basis vectors + coordinate origin• Simplex is rotated by an orthogonal transformation T•T are determined by perturbation analysis

Page 29: Principal component analysis and matrix factorizations for learning (part 2)   ding - icml 2005 tutorial - 2005

PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 84

Perturbation Analysis

1C2C

3C

Assume data has 3 dense clusters sparsely connected.

⎥⎥⎥

⎢⎢⎢

⎡=

33

22

11

WW

WW

3231

2321

1312

WWWWWW

zzWDDzW λ== −− )(ˆ 2/12/1DqWq λ= zDq 2/1−=

Off-diagonal blocks are between-cluster connections, assumed small and are treated as a perturbation

(Ding et al, KDD’01)

Page 30: Principal component analysis and matrix factorizations for learning (part 2)   ding - icml 2005 tutorial - 2005

PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 85

Spectral Perturbation Theorem

kkk tt λ=Γ

21

21 −− ΩΓΩ=Γ

)](,),([ 1 kCC ρρ Ldiag=Ω

∑ ≠=

kpp kpkk sh|

⎥⎥⎥⎥

⎢⎢⎢⎢

−−

−−−−

KKKK

K

K

hss

shsssh

L

MLMM

L

L

21

22221

11211

Spectral Perturbation Matrix

),( qppq CCss =

Orthogonal Transform Matrix )1( KT tt ,,L=

T are determined by:

Page 31: Principal component analysis and matrix factorizations for learning (part 2)   ding - icml 2005 tutorial - 2005

PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 86

Connectivity Network

⎩⎨⎧

=otherwise0

cluster same tobelong if1

ji,Cij

DqqDCK

k

Tkkk ∑

=

≅1

λScaled PCA provides

Green’s function : ∑= −

=≈K

k

Tk

kk qqGC

2 11λ

Projection matrix: ∑=

≡≈K

k

Tkk qqPC

1(Ding et al, 2002)

Page 32: Principal component analysis and matrix factorizations for learning (part 2)   ding - icml 2005 tutorial - 2005

PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 87

1st order Perturbation: Example 1

Between-cluster connections suppressed

Within-cluster connections enhanced

Sim

ilarit

y m

atrix

WCo

nnec

tivity

m

atrix

Effects of self-aggregation

268.0,300.0 22 == λλ

1st order solution

Page 33: Principal component analysis and matrix factorizations for learning (part 2)   ding - icml 2005 tutorial - 2005

PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 88

Scaled principal components have optimality properties:

Ordering– Adjacent objects along the order are similar– Far-away objects along the order are dissimilar– Optimal solution for the permutation index are given by

scaled PCA.

Clustering– Maximize within-cluster similarity– Minimize between-cluster similarity– Optimal solution for cluster membership indicators given

by scaled PCA.

Optimality Properties of Scaled PCA

Page 34: Principal component analysis and matrix factorizations for learning (part 2)   ding - icml 2005 tutorial - 2005

PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 89

Spectral Graph Ordering

(Hall, 1970), “quadratic placement of a graph”:

Solution are eigenvectors of Laplacian

xWDxwxxJ T

ijijji )()( 2 −=−=∑

(Barnard, Pothen, Simon, 1993), envelop reduction of sparse matrix: find ordering such that the envelop is minimized

∑∑ −⇒−ij

ijjii

ijj wxxwji 2)(min ||maxmin

Find coordinate x to minimize

Page 35: Principal component analysis and matrix factorizations for learning (part 2)   ding - icml 2005 tutorial - 2005

PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 90

Distance Sensitive Ordering

∑ −= +

= dnid dii

wJ 1 ,)( πππ

)()(min 11

2 πππ

∑ −== n

d dJdJ

),,(),,1( 1 nn πππ LL =

Given a graph. Find an optimal Ordering of the nodes.

The larger distance, the larger weights, panelity.

∩∩∩∩ ∩∩∩∩∩∩∪∪∪∪∪∪∪∪

:)(2 π=dJ31,ππw

π permutation indexes

Page 36: Principal component analysis and matrix factorizations for learning (part 2)   ding - icml 2005 tutorial - 2005

PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 91

Distance Sensitive Ordering

∑∑ −=−=ji

jijiwjiwjiJ

ij πππππππ

,,

2,

2 )()()(

∑ −− −=ij

jiji w ,211 )( ππ

∑ +−+− −−

−=ij

jinn

nn wjin

,2

2/2/)1(

2/2/)1( )(

11

8

2 ππ

}1,,3,1{2/

2/)1(1

nn

nn

nn

nnq i

i−−−=+−=

Define: shifted and rescaled inverse permutation indexes

qWDqwqqJ Tn

ijijji

n )()()( 42

822 −=−= ∑π

Page 37: Principal component analysis and matrix factorizations for learning (part 2)   ding - icml 2005 tutorial - 2005

PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 92

Distance Sensitive Ordering

Once q2 is computed, since

can be uniquely recovered from q2

1122 )()( −− <⇒< jijqiq ππ

1−iπ

Implementation: sort q2 induces π

Page 38: Principal component analysis and matrix factorizations for learning (part 2)   ding - icml 2005 tutorial - 2005

PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 93

Re-ordering of Genes and Tissues

)()(

randomJJr π=

)random(

)(

1

11

=

== =

d

dd J

Jr π

18.0=r

39.31 ==dr

Page 39: Principal component analysis and matrix factorizations for learning (part 2)   ding - icml 2005 tutorial - 2005

PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 94

Spectral clustering vs Spectral ordering

• Continuous approximation of both integer programming problems are given by the same eigenvector

• Different problems could have the same continuous approximate solution.

• Quality of the approximation:

Ordering: better quality: the solution relax from a set of evenly spaced discrete values

Clustering: less better quality: solution relax from 2 discrete values

Page 40: Principal component analysis and matrix factorizations for learning (part 2)   ding - icml 2005 tutorial - 2005

PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 95

Linearized Cluster Assignment

• Spectral ordering on connectivity network• Cluster crossing

– Sum of similarities along anti-diagonal – Gives 1-D curve with valleys and peaks– Divide valleys and peaks into clusters

Turn spectral clustering to 1D clustering problem

Page 41: Principal component analysis and matrix factorizations for learning (part 2)   ding - icml 2005 tutorial - 2005

PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 96

Cluster overlap and crossing

• Cluster overlap

• Cluster crossing compute a smaller fraction of cluster overlap.

• Cluster crossing depends on an ordering o. It sums weights cross the site i along the order

• This is a sum along anti-diagonals of W.

∑∑∈ ∈

=Ai Bj

ijws(A,B)

Given similarity W, and clusters A,B.

∑=

+−=m

jjiojiowi

1)(),( )(ρ

Page 42: Principal component analysis and matrix factorizations for learning (part 2)   ding - icml 2005 tutorial - 2005

PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 97

cluster crossing

Page 43: Principal component analysis and matrix factorizations for learning (part 2)   ding - icml 2005 tutorial - 2005

PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 98

K-way Clustering Experiments

Accuracy of clustering results:

56.4%67.2%75.7%Data B

75.1%82.8%89.0%Data A

Embedding+ K-means

Recursive 2-way clustering

LinearizedAssignment

Method

Page 44: Principal component analysis and matrix factorizations for learning (part 2)   ding - icml 2005 tutorial - 2005

PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 99

Some Additional Advanced/related Topics

• Random talks and normalized cut• Semi-definite programming • Sub-sampling in spectral clustering• Extending to semi-supervised classification• Green’s function approach• Out-of-sample embeding