L -S M C A I S -I F - Semantic Scholar€¦ · accelerated inexact soft-impute for fast large-scale...

ACCELERATED INEXACT SOFT-IMPUTE FOR FASTLARGE-SCALE MATRIX COMPLETION

QUANMING YAO AND JAMES T. KWOKDEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

HONG KONG UNIVERSITY OF SCIENCE AND TECHNOLOGY

MATRIX COMPLETION

minX

1

2‖PΩ(X −O)‖2F + λ‖X‖∗,

where [PΩ(A)]ij = Aij if Ωij = 1, and 0 otherwise;and ‖X‖∗ is the nuclear norm of X . An importantapplication is recommendation systems.

It assumes that the underlying fully observed matrixX is low-rank, i.e., X = UV . Missing elements areto be inferred from the known observations.

PROXIMAL GRADIENT DESCENTminx f(x) + λg(x), where (i) both f(·) and g(·) aresmooth, convex; (ii) g(·) can be nonsmooth. Let zt =xt −∇f(xt), it generates xt+1 as

arg minx〈x− xt,∇f(xt)〉+

1

2‖x− xt‖2F + λg(x)

= arg minx

1

2‖x− zt‖2 + µg(x) ≡ proxµg(zt).

• this is called the proximal step, often withclosed-form solutions.

• convergence rate isO(1/T ), but can be acceler-ated to O(1/T 2) by using

yt = (1 + θt)xt − θtxt−1, zt = yt −∇f(yt).

Lemma 1 (SVT, Cai et al. 2010) Let the SVD of amatrix Z be UΣV >. The solution to minX

12‖X−Z‖

2F +

λ‖X‖∗ is a low-rank matrix given by

SVTλ(Z) ≡ U(Σ− λI)+V>,

where [(A)+]ij = max(Aij , 0).

For g(X) = ‖X‖∗, it is known as Singular ValueThresholding (SVT).

SOFT-IMPUTE [MAZUMDER ET AL., 2010]

At iteration t, it generates Xt+1 = SVTλ(Zt) where

Zt = PΩ(O) + P⊥Ω (Xt) = PΩ(O −Xt) +Xt.

"Sparse + Low-Rank" structure in Zt:

• let SVD of Xt be UtΣtVt>.

• for any u ∈ Rn, Ztu (same for Z>t v) can becomputed as

Ztu = PΩ(O −Xt)u︸︷︷︸O(‖Ω‖1)

+UtΣt(Vt>u)︸︷︷︸

O((m+n)k)

. (1)

• a rank-k SVD on Zt ∈ Rm×n takes O(‖Ω‖1k +(m+ n)k2) instead of O(mnk) time.

• Zt = Xt −∇f(Xt) = P⊥Ω (Xt) + PΩ(O), and so

Soft-Impute = Proximal Gradients

• acceleration is possible, but previous resultssuggested it is useless as the sparse + low-rankstructure in Zt is lost.

Proposed AIS-Impute can be download-ed at https://github.com/quanmingyao/AIS-impute.

PROPOSED ALGORITHM (AIS-IMPUTE)Contributions

• efficient acceleration is possible with sparse +low-rank structure;

• speeding up approximate SVT using powermethod.

Fast convergence + Low iterationcomplexity

Sparse + Low-Rank

Zt with acceleration:

Zt = PΩ(O − Yt)︸︷︷︸sparse

+ (1 + θt)Xt − θtXt−1︸︷︷︸sum of two low rank matrices

.

Similar to (1), for any u ∈ Rn, Ztu can be performedas

PΩ(O − Yt)u︸︷︷︸O(‖Ω‖1)

+ (1 + θt)UtΣtV>t u︸︷︷︸

O((m+n)k)

− θtUt−1Σt−1V>t−1u︸︷︷︸

O((m+n)k)

.

• again, takes O(‖Ω‖1k + (m + n)k2) time, butconvergence rate is improved to O(1/T 2).

Approximate SVT

A 1 PowerMethod(Z, R, ε).Require: Z ∈ Rm×n, initial R ∈ Rk×n for warm-start,

tolerance ε;1: Initialize Y1 ← ZR;2: for τ = 1, 2, . . . , do3: Qτ+1 = QR(Yτ ); Yτ+1 = Z(Z>Qτ+1);4: if ‖Qτ+1Qτ+1

> −QτQτ>‖F ≤ ε; break;5: end for6: return Qτ+1.

Since only singular values large than λ are needed,we use power iteration [Halko et al., 2011] to ap-proximate SVT. The error is controlled by ε.

A 2 approx-SVT(Zt, R, λ, ε).

Require: Zt ∈ Rm×n, R ∈ Rn×k, λ and ε;1: Q = PowerMethod(Zt, R, ε);2: [U,Σ, V ] = SVD(Q>Zt);3: U = ui | σi > λ; V = vi | σi > λ;4: Σ = (Σ− λI)+;5: return QU,Σ and V .

A 3 AIS-Impute (proposed algorithm).Require: partially observed matrix O, parameter λ, de-

cay parameter ν ∈ (0, 1), threshold ε;1: [U0, λ0, V0] = rank-1 SVD(PΩ(O));2: c = 1, ε0 = ‖PΩ(O)‖F , X0 = X1 = λ0U0V

>0 ;

3: for t = 1, 2, . . . do4: εt = νtε0; θt = (c− 1)/(c+ 2);5: λt = νt(λ0 − λ) + λ;6: Yt = Xt + θt(Xt −Xt−1);7: Zt = Yt + PΩ(O − Yt);8: Vt−1 = Vt−1 − Vt(Vt>Vt−1), remove zero columns;9: Rt = QR([Vt, Vt−1]);

10: [Ut+1,Σt+1, Vt+1] = approx-SVT(Zt, Rt, λt, εt);11: if F (Ut+1Σt+1V

>t+1) > F (UtΣtV

>t ); c = 1 else c =

c+ 1;12: end for13: return Xt+1 = Ut+1Σt+1V

>t+1.

• core steps are 7–9 (approximate SVT); powermethod is warm-started with last two itera-tions (Vt and Vt−1).

• step 5: continuation strategy [Toh and Yun,2010; Mazumder et al., 2010];

• step 11: restarts if F (X) starts to increase.

CONVERGENCE ANALYSISApproximation quality of approximate SVT:Proposition 2 Let hλ‖·‖∗(X; Zt) ≡ 1

2‖X − Zt‖2 + λ‖X‖∗. Assume k ≥ k, ε ≥ αtηjt

√1 + η2

t . Then,

hλ‖·‖∗(Xt; Zt)≤hλ‖·‖∗(SVTλ(Zt); Zt)+ηt

1− ηtεβtγt.

Theorem 3 (Convergence of AIS-Impute) Assume the conditions in Proposition 2, Algorithm 3 converges to theoptimal with a rate of O(1/T 2).

SYNTHETIC DATA

Methods compared:(i) accelerated proximal gradient algorithm(“APG") [Ji and Ye, 2009];(ii) Soft-Impute [Mazumder et al., 2010]; (iii) pro-posed accelerated inexact Soft-Impute algorithm(“AIS-Impute")Results clearly indicate "fast rate + low iterationcomplexity" of AIS-Impute. (a) objective vs #iterations. (b) objective vs time.

MOVIELENS DATABesides proximal algorithms, we also compare with (i) active subspace selection (“active") [Hsieh and Olsen,2014]; (ii) Frank-Wolfe algorithm (“boost") [Zhang et al., 2012]; (iii) a recent variant of Soft-Impute [Hastie etal., 2014]; and (iv) a trust-region method (“TR") [Mishra et al., 2013].

(c) MovieLens-100K. (d) MovieLens-1M. (e) MovieLens-10M.

All algorithms are equally good at recovering the missing entries, but AIS-Impute is the fastest.

L -S M C A I S -I F - Semantic Scholar€¦ · accelerated inexact soft-impute for fast large-scale...

Documents

Transcript of L -S M C A I S -I F - Semantic Scholar€¦ · accelerated inexact soft-impute for fast large-scale...