L -S M C A I S -I F - Semantic Scholar€¦ · accelerated inexact soft-impute for fast large-scale...
Transcript of L -S M C A I S -I F - Semantic Scholar€¦ · accelerated inexact soft-impute for fast large-scale...
ACCELERATED INEXACT SOFT-IMPUTE FOR FASTLARGE-SCALE MATRIX COMPLETION
QUANMING YAO AND JAMES T. KWOKDEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
HONG KONG UNIVERSITY OF SCIENCE AND TECHNOLOGY
MATRIX COMPLETION
minX
1
2‖PΩ(X −O)‖2F + λ‖X‖∗,
where [PΩ(A)]ij = Aij if Ωij = 1, and 0 otherwise;and ‖X‖∗ is the nuclear norm of X . An importantapplication is recommendation systems.
It assumes that the underlying fully observed matrixX is low-rank, i.e., X = UV . Missing elements areto be inferred from the known observations.
PROXIMAL GRADIENT DESCENTminx f(x) + λg(x), where (i) both f(·) and g(·) aresmooth, convex; (ii) g(·) can be nonsmooth. Let zt =xt −∇f(xt), it generates xt+1 as
arg minx〈x− xt,∇f(xt)〉+
1
2‖x− xt‖2F + λg(x)
= arg minx
1
2‖x− zt‖2 + µg(x) ≡ proxµg(zt).
• this is called the proximal step, often withclosed-form solutions.
• convergence rate isO(1/T ), but can be acceler-ated to O(1/T 2) by using
yt = (1 + θt)xt − θtxt−1, zt = yt −∇f(yt).
Lemma 1 (SVT, Cai et al. 2010) Let the SVD of amatrix Z be UΣV >. The solution to minX
12‖X−Z‖
2F +
λ‖X‖∗ is a low-rank matrix given by
SVTλ(Z) ≡ U(Σ− λI)+V>,
where [(A)+]ij = max(Aij , 0).
For g(X) = ‖X‖∗, it is known as Singular ValueThresholding (SVT).
SOFT-IMPUTE [MAZUMDER ET AL., 2010]
At iteration t, it generates Xt+1 = SVTλ(Zt) where
Zt = PΩ(O) + P⊥Ω (Xt) = PΩ(O −Xt) +Xt.
"Sparse + Low-Rank" structure in Zt:
• let SVD of Xt be UtΣtVt>.
• for any u ∈ Rn, Ztu (same for Z>t v) can becomputed as
Ztu = PΩ(O −Xt)u︸ ︷︷ ︸O(‖Ω‖1)
+UtΣt(Vt>u)︸ ︷︷ ︸
O((m+n)k)
. (1)
• a rank-k SVD on Zt ∈ Rm×n takes O(‖Ω‖1k +(m+ n)k2) instead of O(mnk) time.
• Zt = Xt −∇f(Xt) = P⊥Ω (Xt) + PΩ(O), and so
Soft-Impute = Proximal Gradients
• acceleration is possible, but previous resultssuggested it is useless as the sparse + low-rankstructure in Zt is lost.
Proposed AIS-Impute can be download-ed at https://github.com/quanmingyao/AIS-impute.
PROPOSED ALGORITHM (AIS-IMPUTE)Contributions
• efficient acceleration is possible with sparse +low-rank structure;
• speeding up approximate SVT using powermethod.
Fast convergence + Low iterationcomplexity
Sparse + Low-Rank
Zt with acceleration:
Zt = PΩ(O − Yt)︸ ︷︷ ︸sparse
+ (1 + θt)Xt − θtXt−1︸ ︷︷ ︸sum of two low rank matrices
.
Similar to (1), for any u ∈ Rn, Ztu can be performedas
PΩ(O − Yt)u︸ ︷︷ ︸O(‖Ω‖1)
+ (1 + θt)UtΣtV>t u︸ ︷︷ ︸
O((m+n)k)
− θtUt−1Σt−1V>t−1u︸ ︷︷ ︸
O((m+n)k)
.
• again, takes O(‖Ω‖1k + (m + n)k2) time, butconvergence rate is improved to O(1/T 2).
Approximate SVT
A 1 PowerMethod(Z, R, ε).Require: Z ∈ Rm×n, initial R ∈ Rk×n for warm-start,
tolerance ε;1: Initialize Y1 ← ZR;2: for τ = 1, 2, . . . , do3: Qτ+1 = QR(Yτ ); Yτ+1 = Z(Z>Qτ+1);4: if ‖Qτ+1Qτ+1
> −QτQτ>‖F ≤ ε; break;5: end for6: return Qτ+1.
Since only singular values large than λ are needed,we use power iteration [Halko et al., 2011] to ap-proximate SVT. The error is controlled by ε.
A 2 approx-SVT(Zt, R, λ, ε).
Require: Zt ∈ Rm×n, R ∈ Rn×k, λ and ε;1: Q = PowerMethod(Zt, R, ε);2: [U,Σ, V ] = SVD(Q>Zt);3: U = ui | σi > λ; V = vi | σi > λ;4: Σ = (Σ− λI)+;5: return QU,Σ and V .
A 3 AIS-Impute (proposed algorithm).Require: partially observed matrix O, parameter λ, de-
cay parameter ν ∈ (0, 1), threshold ε;1: [U0, λ0, V0] = rank-1 SVD(PΩ(O));2: c = 1, ε0 = ‖PΩ(O)‖F , X0 = X1 = λ0U0V
>0 ;
3: for t = 1, 2, . . . do4: εt = νtε0; θt = (c− 1)/(c+ 2);5: λt = νt(λ0 − λ) + λ;6: Yt = Xt + θt(Xt −Xt−1);7: Zt = Yt + PΩ(O − Yt);8: Vt−1 = Vt−1 − Vt(Vt>Vt−1), remove zero columns;9: Rt = QR([Vt, Vt−1]);
10: [Ut+1,Σt+1, Vt+1] = approx-SVT(Zt, Rt, λt, εt);11: if F (Ut+1Σt+1V
>t+1) > F (UtΣtV
>t ); c = 1 else c =
c+ 1;12: end for13: return Xt+1 = Ut+1Σt+1V
>t+1.
• core steps are 7–9 (approximate SVT); powermethod is warm-started with last two itera-tions (Vt and Vt−1).
• step 5: continuation strategy [Toh and Yun,2010; Mazumder et al., 2010];
• step 11: restarts if F (X) starts to increase.
CONVERGENCE ANALYSISApproximation quality of approximate SVT:Proposition 2 Let hλ‖·‖∗(X; Zt) ≡ 1
2‖X − Zt‖2 + λ‖X‖∗. Assume k ≥ k, ε ≥ αtηjt
√1 + η2
t . Then,
hλ‖·‖∗(Xt; Zt)≤hλ‖·‖∗(SVTλ(Zt); Zt)+ηt
1− ηtεβtγt.
Theorem 3 (Convergence of AIS-Impute) Assume the conditions in Proposition 2, Algorithm 3 converges to theoptimal with a rate of O(1/T 2).
SYNTHETIC DATA
Methods compared:(i) accelerated proximal gradient algorithm(“APG") [Ji and Ye, 2009];(ii) Soft-Impute [Mazumder et al., 2010]; (iii) pro-posed accelerated inexact Soft-Impute algorithm(“AIS-Impute")Results clearly indicate "fast rate + low iterationcomplexity" of AIS-Impute. (a) objective vs #iterations. (b) objective vs time.
MOVIELENS DATABesides proximal algorithms, we also compare with (i) active subspace selection (“active") [Hsieh and Olsen,2014]; (ii) Frank-Wolfe algorithm (“boost") [Zhang et al., 2012]; (iii) a recent variant of Soft-Impute [Hastie etal., 2014]; and (iv) a trust-region method (“TR") [Mishra et al., 2013].
(c) MovieLens-100K. (d) MovieLens-1M. (e) MovieLens-10M.
All algorithms are equally good at recovering the missing entries, but AIS-Impute is the fastest.