Download - Provable Deterministic Leverage Score Sampling€¦ · The Column Subset Selection Problem (CSSP) Deﬁnition Let A 2Rm n and let c

Transcript

Provable Deterministic Leverage Score Sampling

Dimitris Papailiopoulos (UC Berkeley)Anastasios Kyrillidis (EPFL)

Christos Boutsidis (Yahoo Labs)

KDD

New York, New York

August 27th, 2014

Page 2: Provable Deterministic Leverage Score Sampling€¦ · The Column Subset Selection Problem (CSSP) Deﬁnition Let A 2Rm n and let c

Singular Value Decomposition

m × n matrix A

k < ρ = rank(A)

Low-rank matrix approximation problem:

minX∈Rm×n,rank(X)≤k

||A − X||F

Singular Value Decomposition (SVD):

A = U · Σ · VT =(

Uk Uρ−k)︸︷︷︸

m×ρ

(Σk 00 Σρ−k

)︸︷︷︸

ρ×ρ

(VT

VTρ−k

)︸︷︷︸

ρ×n

Uk ∈ Rm×k , Σk ∈ Rk×k , and Vk ∈ Rn×k

Solution via Eckart-Young Theorem

Ak = Uk Σk VTk = AVk VT

k . O(mn min{m,n}) time

The Column Subset Selection Problem (CSSP)

Definition

Let A ∈ Rm×n and let c < n be a sampling parameter. Find ccolumns of A – denoted as C ∈ Rm×c – that minimize

‖A − CC†A‖F or ‖A − CC†A‖2,

where C† denotes the Moore-Penrose pseudo-inverse.

CSSP gives a low-rank matrix factorization to A (X = C†A): A

( X)

Page 4: Provable Deterministic Leverage Score Sampling€¦ · The Column Subset Selection Problem (CSSP) Deﬁnition Let A 2Rm n and let c

Motivation

Consider applying this to date-by-stock matrices.

Returns the most important stocks in the portfolio.

Interpretable matrix decompositions in general.

Page 5: Provable Deterministic Leverage Score Sampling€¦ · The Column Subset Selection Problem (CSSP) Deﬁnition Let A 2Rm n and let c

Prior work on CSSP

c ‖A − CC†A‖2F ≤ Running time

1 k/ε2 ‖A − Ak‖2F + ε‖A‖2

F nnz(A)2 (k log k)/ε2 (1 + ε)‖A − Ak‖2

F mn2

3 (k log k)/ε2 (1 + ε)‖A − Ak‖2F mnk2 log k

4 k/ε (1 + ε)‖A − Ak‖2F mnk/ε

5 k/ε (1 + ε)‖A − Ak‖2F m3nk/ε

References:1 Frieze, Kannan, Vempala. FOCS. 2003.

2 Drineas, Mahoney, and Muthukrishnan. RANDOM, 2006.

3 Deshpande, Rademacher, Vempala, Wang. SODA, 2006.

4 Boutsidis, Drineas, Magdon-Ismail. FOCS, 2011.

5 Guruswami, Sinop. SODA, 2012

There are more results in the linear algebra literature focusing on the spectral norm version of the CSSP.

Page 6: Provable Deterministic Leverage Score Sampling€¦ · The Column Subset Selection Problem (CSSP) Deﬁnition Let A 2Rm n and let c

Leverage scores and randomized samplingDrineas, Mahoney, and Muthukrishnan. RANDOM, 2006.

Definition

[Leverage scores] Let Vk ∈ Rn×k contain the top k right singularvectors of an m × n matrix A with rank ρ = rank(A) ≥ k . Then,the (rank-k ) leverage score of the i-th column of A is defined as

`(k)i = ‖[Vk ]i,:‖22, i = 1,2, . . . ,n.

For a target rank k < rank(A), define a probabilitydistribution over the columns of A, pi = `

(k)i /k ;

In c independent and identically distributed passes,sample with replacement c columns from AFor c = O(k log k/ε2) and with constant probability:‖A − CC†A‖F ≤ (1 + ε) ‖A − Ak‖F.

Page 7: Provable Deterministic Leverage Score Sampling€¦ · The Column Subset Selection Problem (CSSP) Deﬁnition Let A 2Rm n and let c

Deterministic leverage score sampling[Jollife, 1972]

Compute the leverage scores of A w.r.t. some k .

Pick the c columns with the largest leverage scores.

Nice empirical results.

No theoretical analysis.

Contribution of this talk: theoretical analysis of deterministicleverage scores sampling.

Page 8: Provable Deterministic Leverage Score Sampling€¦ · The Column Subset Selection Problem (CSSP) Deﬁnition Let A 2Rm n and let c

Deterministic leverage score sampling[revisited]

Input: A ∈ Rm×n, k , θ (0 < θ < 1)- ComputeVk ∈Rn×k (via SVD).- Compute the leverage scores:for i = 1,2, . . . ,n`(k)i =

∥∥[Vk ]i,:∥∥2

2end forWithout loss of generality, let `(k)i ’s be sorted:

`(k)1 ≥ · · · ≥ `(k)i ≥ `(k)i+1 ≥ · · · ≥ `

(k)n .

Find index c ∈ {1, . . . ,n} such that:

c = argminc

(c∑

i=1

`(k)i > θ

If c < k , set c = k .Output: C ∈ Rm×c containing the first c columns of A.

Page 9: Provable Deterministic Leverage Score Sampling€¦ · The Column Subset Selection Problem (CSSP) Deﬁnition Let A 2Rm n and let c

Main result

Theorem

Letθ = k − ε,

for some ε ∈ (0,1). Then, for ξ = {2,F}, we have

‖A − CC†A‖2ξ < (1 + ε) · ‖A − Ak‖2ξ .

Weak result if the leverage scores are almost uniform.

Page 10: Provable Deterministic Leverage Score Sampling€¦ · The Column Subset Selection Problem (CSSP) Deﬁnition Let A 2Rm n and let c

Main result: leverage scores following a power law

Theorem

Let the leverage scores follow a power-law decay with exponentαk = 1 + η, for η > 0:

`(k)i =

`(k)1iαk

Let θ = k − ε. Then,

c =

(2kε

) 11+η

and‖A − CC†A‖2ξ < (1 + ε) · ‖A − Ak‖2ξ .

Page 11: Provable Deterministic Leverage Score Sampling€¦ · The Column Subset Selection Problem (CSSP) Deﬁnition Let A 2Rm n and let c

Is power law a realistic assumption?

Test leverage scores of large graphs.

Show leverage scores follow power law decays.

Page 12: Provable Deterministic Leverage Score Sampling€¦ · The Column Subset Selection Problem (CSSP) Deﬁnition Let A 2Rm n and let c

Power law is a realistic assumption

1 200 400 600 800 100010−5

100

α 1 0 = 1 .45

amazon

1 200 400 600 800 100010−5

100

105

α 1 0 = 1 .5

citeseer

1 200 400 600 800 100010−10

10−5

100

α 1 0 = 1 .7

foursquare

1 200 400 600 800 100010−5

100

105

α 1 0 = 1 .13

github

1 200 400 600 800 100010−5

100

105

α 1 0 = 2

gnutella

1 200 400 600 800 100010−5

100

105

α 1 0 = 1 .6

google

1 200 400 600 800 100010−4

10−2

100

α 1 0 = 0 .9

gowalla

1 200 400 600 800 100010−3

10−2

10−1

α 1 0 = 0 .2

livejournal

1 200 400 600 800 100010−4

10−2

100

α 1 0 = 0 .9

slashdot

1 200 400 600 800 100010−5

100

105

α 1 0 = 1 .6

nips

1 200 400 600 800 100010−4

10−3

10−2

α 1 0 = 0 .2

skitter

1 200 400 600 800 1000

10−3.6

10−3.3α 1 0 = 0 .12

slice

1 200 400 600 800 100010−5

100

105

α 1 0 = 1 .58

cora

1 200 400 600 800 100010−10

100

1010

α 1 0 = 4

writers

1 200 400 600 800 100010−5

100

105

α 1 0 = 1 .75

youtube groups

1 200 400 600 800 100010−4

10−2

100

α 1 0 = 0 .5

youtube

k = 10Show decay of leverage scores logarithmic scalePlot a fitting power-law curve β · x−αk .True leverage scores are plotted with a red× marker.The fitted curves are denoted with a solid blue line.

Page 13: Provable Deterministic Leverage Score Sampling€¦ · The Column Subset Selection Problem (CSSP) Deﬁnition Let A 2Rm n and let c

Power-law decaying leverage scores

5 5000

0.5

1.5

∥A−CC

† A∥2 2

∥A−A

k∥2 2

c =10

k = 5

10 5000

0.5

1.5

c =38

k = 10

50 5000

c =97

k = 50

100 5000

c =152

k = 100

5 5000

0.5

1.5

∥A−CC

† A∥2 2

∥A−A

k∥2 2

c =7

10 5000

0.5

1.5

c =11

50 5000

c =88

100 5000

c =129

↵k

=0.

5↵

1.5

m = 200, n = 1000.k = 5, 10, 50, 100.c = 1, 2, ..., 1000.αk = 0.5 and αk = 1.5.

Blue curve is the relative error ratio ‖A − CC†A‖22/‖A − Ak‖2

2The vertical cyan line corresponds to the point where k = cThe vertical magenta line indicates the point where the c sampled columns offer a better approximationcompared to the best rank-k matrix Ak

Page 14: Provable Deterministic Leverage Score Sampling€¦ · The Column Subset Selection Problem (CSSP) Deﬁnition Let A 2Rm n and let c

Nearly-uniform leverage scores

5 500 10000

0.5

1.5

∥A−CC

† A∥2 2

∥A−A

k∥2 2

c =473

k = 5

10 500 10000

0.5

1.5

c =404

k = 10

50 500 10000

0.5

1.5

c =629

k = 50

100 500 10000

c =630

k = 100

m = 200, n = 1000.

k = 5, 10, 50, 100.

c = 1, 2, ..., 1000.

Blue curve is the relative error ratio ‖A − CC†A‖22/‖A − Ak‖2

The leftmost vertical cyan line corresponds to the point where k = c.

The rightmost vertical magenta line indicates the point where the c sampled columns offer as good anapproximation as that of the best rank-k matrix Ak