Low rank aproximation in traditional and novel tensor formats€¦ · Icanonical rank r ]DOF for a...

Post on 27-Jul-2020

9 views 0 download

Transcript of Low rank aproximation in traditional and novel tensor formats€¦ · Icanonical rank r ]DOF for a...

Low rank aproximation in traditional and noveltensor formats

R. Schneider (TUB Matheon)

MPI Munich, 2012

Our motivationsEquations describing complex systems with multi-variatesolution spaces, e.g.

B stationary/instationary Schrodinger type equations

i~∂

∂tΨ(t , x) = (−1

2∆ + V )︸ ︷︷ ︸H

Ψ(t , x), HΨ(x) = EΨ(x)

describing quantum-mechanical many particle systemsB stochastic DEs and the Fokker-Planck equation,

∂p(t , x)∂t

=d∑

i=1

∂xi

(fi(t , x)p(t , x)

)+

12

d∑i,j=1

∂2

∂xi∂xj

(Bi,j(t , x)p(t , x)

)describing mechanical systems in stochastic environment,

B chemical master equations, parametric PDEs, machinelearning, . . .

Solutions depend on x = (x1, . . . , xd ), where usually, d >> 3!

Setting - Tensors of order dGoal: Generic perspective on methods for high-dimensionalproblems, i.e. problems posed on tensor spaces,

V :=⊗d

i=1 Vi , today: V =⊗d

i=1 Rn = R(nd )

Notation: (x1, . . . , xd ) 7→ U = U(x1, . . . , xd ) ∈ V

Main problem:

dimV = O (nd ) – Curse of dimensionality!

e.g. n = 100,d = 10 10010 basis functions, coefficient vectors of 800× 1018 Bytes = 800 Exabytes

Approach: Some higher order tensors can be constructed(data-) sparsely from lower order quantities.

As for matrices, incomplete SVD:

A(x1, x2) ≈r∑

k=1

σk(uk (x1)⊗ vk (x2)

)

Canonical format

H ' {x = (x1, . . . , xd ) 7→ U(x1, . . . , xd ) ∈ R, xi = 1, . . . ,ni} .

Single tensor product

(x1, . . . , xd ) 7→ U(x1, . . . , xd ) = Πdi=1ui,k (xi) = Πd

i=1ui(xi , k) ,

U =d⊗

i=1

uk ,i .

Canonical (CP) format, PARAFAC or r -term expansion,

U(x1, . . . , xd ) =

rC∑k=1

Uk (x) =

rC∑k=1

Πdi=1ui,k (xi).

or

U =

rC∑k=1

Uk =

rC∑k=1

d⊗i=1

ui,k

Canonical format - prosDefinition (Canonical format)

U(x1, . . . , xd ) =r∑

k=1

d⊗i=1

ui,k (xi) =r∑

i=1

d⊗ν=1

ui(xi , k).

r - canonical rank (?)Let n := max{ni : 1 ≤ i ≤ d}.I Number of terms r = rc (canonical rank (?)) is invariant

w.r.t. basis transformationsI canonical rank r ≤ ] DOF for a given tensor product basis

- best N-term approximation (super adaptivity)!I there is an additional cost storing the components ui,ki

I degrees of freedom (DOF) or better strorage complexity:O(drn)

I complexity scaling is O(drn) instead of O(nd ) for the fulltensor!

Counter example of Silva and LimLet A =

⊗di=1 ai , B =

⊗di=1 bi ∈ H, (possibly 〈ai ,bi〉 = 0 ∀i).

Let U(x1, . . . , xd )

:= U1(x1, . . . , xd ) + . . .+ Ud (x1, . . . , xd ) , ( Ui ⊥ Uj )

:= b1(x1)a2(x2) . . . ad (xd ) + · · ·+ a1(x1) . . . ad−1(xd−1)bd (xd )

=1ε

(a1(x1) + εb1(x1)

)· · ·(ad (xd ) + εbd (xd )

)−1ε

a1(x1) · · · ad (xd ) +O(ε)

=: Vε(x1, . . . , xd ) +O(ε) , product -rule for A′ .

Vε → U as ε→ 0, but rank rc(U) = d 6= rc(Vε) = 2 if d ≥ 3!!!

I K≤r := {U ∈ H : U =∑r

k=1 Uk} is not closed.(nor weakly closed)

I The notion of canonical rank is not well defined! Borderrank problem.

Remark: The above example shows the product rule for the directional derivative

Tensor formats

Tensor formatsB Tucker format (Q: MCTDH(F))

But complexity O(rd + ndr)

U(x1, .., xd ) =

r1∑k1=1

. . .

rd∑kd =1

B(k1, .., kd )d⊗

i=1

Ui(ki , xi)

{1,2,3,4,5}

1 2 3 54

Tensor formatsB Hierarchical Tucker format

(HT; Hackbusch/Kuhn, Grasedyck, Kressner, Q: Tree-tensor networks)

B Tucker format (Q: MCTDH(F))But complexity O(rd + ndr)

B Tensor Train (TT-)format(Oseledets/Tyrtyshnikov, ' MPS-format of quantum physics)

U(x) =

r1∑k1=1

. . .

rd−1∑kd−1=1

d∏i=1

Bi(ki−1, xi , ki) = B1(x1) · · ·Bd (xd )

{1,2,3,4,5}

{1} {2,3,4,5}

{2} {3,4,5}

{4,5}

{5}

{3}

{4}

U1 U2 U3 U4 U5

r1 r2 r3 r4

n1 n2 n3 n4 n5

Tensor formatsB Canonical decompositionB Subspace approach (Hackbusch/Kuhn, 2009)

(Example: d = 5,Ui ∈ Rn×ki ,Bt ∈ Rkt×kt1×kt2 )

Tensor formatsB Canonical decomposition not closed, no embedded

manifold!B Subspace approach (Hackbusch/Kuhn, 2009)

(Example: d = 5,Ui ∈ Rn×ki ,Bt ∈ Rkt×kt1×kt2 )

Tensor formatsB Canonical decomposition not closed, no embedded

manifold!B Subspace approach (Hackbusch/Kuhn, 2009)

(Example: d = 5,Ui ∈ Rn×ki ,Bt ∈ Rkt×kt1×kt2 )

Tensor formatsB Canonical decomposition not closed, no embedded

manifold!B Subspace approach (Hackbusch/Kuhn, 2009)

{1,2,3,4,5}B

{4,5}

U4 5 U

B

B

B

U

UU

3

2 1

{1,2,3}

{1,2}

(Example: d = 5,Ui ∈ Rn×ki ,Bt ∈ Rkt×kt1×kt2 )

Tensor formatsB Canonical decomposition not closed, no embedded

manifold!B Subspace approach (Hackbusch/Kuhn, 2009)

{1,2,3,4,5}B

{4,5}

U4 5 U

B

B

B

U

UU

3

2 1

{1,2,3}

{1,2}

U{1,2}

(Example: d = 5,Ui ∈ Rn×ki ,Bt ∈ Rkt×kt1×kt2 )

Tensor formatsB Canonical decomposition not closed, no embedded

manifold!B Subspace approach (Hackbusch/Kuhn, 2009)

{1,2,3,4,5}B

{4,5}

U4 5 U

B

B

B

U

UU

3

2 1

{1,2,3}

{1,2}

U{1,2}

U{1,2,3}

(Example: d = 5,Ui ∈ Rn×ki ,Bt ∈ Rkt×kt1×kt2 )

Tensor formatsB Canonical decomposition not closed, no embedded

manifold!B Subspace approach (Hackbusch/Kuhn, 2009)

{1,2,3,4,5}B

{4,5}

U4 5 U

B

B

B

U

UU

3

2 1

{1,2,3}

{1,2}

U{1,2}

U{1,2,3}

(Example: d = 5,Ui ∈ Rn×ki ,Bt ∈ Rkt×kt1×kt2 )

Optimization Problems

Problem (Generic optimization problem (OP))Given a cost functional J : H → R and an admissible setM⊂ H finding

argmin {J (W ) : W ∈M} .

Problem (Tensor product optimization problem (TOP))

U := argmin {J (W ) : W ∈M∩K≤r} (1)

Here we consider a modified optimization problem where theoriginal admissible set is confined to tensors of rank at most r .Most problems can cast into this form.

Example1. Approximation: for given U ∈ H minimize

F (W ) = ‖U −W‖2H == ‖U −W‖2 , W ∈ Kr

2. solving equations: where A, g : V → H,

AU = B or g(U) = 0

hereF (W ) := ‖AW − B‖2 resp. F (W ) := ‖g(W )‖2 .

3. or, if A : V → V ′ is symmetric and B ∈ V ′, V ⊂ H ⊂ V ′,

F (W ) :=12〈AW ,W 〉 − 〈B,W 〉

4. computing the lowest eigenvalue of a symmetric operator A : V → V ′,

U = argmin {F (W ) = 〈AW ,W 〉 : 〈W ,W 〉 = 1} .

In many casesM∩K≤r = K≤r . and most F are quadratic.

Approximation on low-rank manifoldM⊆ VB for optimisation tasks J (U)→ min:

Solve first order condition J ′(U) = 0 on tangent space,

〈J ′(U),V 〉 = 0 ∀V ∈ TU .

(Dirac-Frenkel variational principle, Absil et al., Q.Chem.: MCSCF, . . . )

J’(U) = X − U

X

U M

T UM

Approximation on low-rank manifoldM⊆ VB for differential equations X = f (X ),X (0) = X0:

Solve projected DE, U = PU f (U),U(0) = U0 ∈M,

〈U(t),V 〉 = 〈f (U(t)),V 〉 ∀V ∈ TU(t) .

(Dirac-Frenkel variational principle, Lubich et al., Q.Chem.: TDMCH . . . )

U M

X = F(X).

U = PUF(U).

F(U) TUM

Unique representation of the tangent space of Tr

Theorem (Holtz, R. Schneider, 2010)For all U ∈ T, and for

δU ∈ TUT '{γ′(t)|t=0 | γ ∈ C1(]− δ, δ[,T),

γ(t) = U1(x1, t) · . . . · Ud (xd , t), γ(0) = U(x)}

there is a unique vector (W1 . . . ,Wd ) of component functionsWi(·) : Ii → Rri−1×ri , such that

δU = δU1 + . . .+ δUd

withδUi := U1(x1) . . .Ui−1(xi−1)Wi(xi)Ui+1(xi+1) . . .Ud (xd ),

and s.t. Wi(·) fulfil the (left-orthogonality) gauge conditions

L(Ui)T L(Wi) = 0 ∈ Rri×ri for i = 1, ..,d − 1.

Sketch of proofI Existence:

δU ' U′1(x1,0)U2(x2) · .. · Ud (xd ) + U1(x1)U′2(x2,0) · .. · Ud (xd )

+ . . . + U1(x1) · .. · Ud−1(xd−1)U′d (xd ,0).

Left orthogonal decomposition: there ex. unique Λ1 s.t.

U′(x1,0) = U1(x1)Λ1 + W1(x1),

iterate.

I Uniqueness: (Idea from Lubich et al., Tucker format)I Testing δU with

Vi (x) := U1(x1) · .. · Ui−1(xi−1)Vi (xi )Ui+1(xi+1).. · Ud (xd ),

for i = 1, ..,d , gauge condition (in other inner product)gives upper block-∆-system with SPD matrices.

I L(Wi), i = d ,d − 1, ..,1 can uniquely be computedrecursively.

Parametrization of TUTI Ci spaces of component functions Ui , (i = 1, ..,d)

I Left-orthonormal spaces of Ui :

U`i := {Wi(xi) ∈ Ci , L(Ui)

T L(Wi) = 0}.

I Parameter space X := U`1 × . . .× U`

d−1 × Cd .

Corollary (Holtz, R., Schneider, 2010)The mapping τ : X → TUT,

τ(W1, . . . ,Wd ) =d∑

i=1

U1 · . . . · Ui−1WiUi+1 · . . . · Ud

is a linear bijection between X and TUT. In particular,

dim T =d∑

i=1

ri−1ni ri −d−1∑i=1

r2i .

Local parametrization for T

Theorem (Holtz, R., Schneider, 2010)Let U ∈ T, Ψ : X → H defined by

Ψ(W1, . . . ,Wd ) = (U1 + W1)(x1) · .. · (Ud + Wd )(xd ).

I There exists open Nδ(0) ⊆ Rdim X such that

Ψ|Nδ : Nδ(0) 7→ Ψ(Nδ(0))open

⊆ T

is an embedding (i.e. an immersion that is ahomeomorphism onto its image),that is, NU ∩ T is a regular submanifold of H.

I There exists Nδ ⊂ H open, a constraint functiong = gU : Nδ → Rc , c =

∑d−1i=1 r2

i , such that

Nδ ∩ T = {U ∈ Rnd: g(U) = 0} = ψ(Nδ(0))

Proof: Inv. mapping theorem for manifolds, tang. space par. τ

Manifolds and gauge conditionsLubich et al. (2009), Holtz/Rohwedder/S. (2011a), Uschmajew/Vandereycken (2012),Lubich/Rohwedder/S./Vandereycken (in prep.)

B The sets of above tree (HT, TT or Tucker) tensors of fixedrank r each provide embedded submanifoldsMr of R(nd ).

B Canonical tangent space parametrization via componentfunctions Wt ∈ Ct is redundant, but unique via gaugeconditions for nodes t 6= tr , e.g.

Gt ={

Wt ∈ Ct | 〈WTt ,Bt〉 resp. 〈WT

t ,Ut〉 = 0 ∈ Rkt×kt }

B Linear isomorphism

E : ×t∈T Gt → TUM, E =∑t∈T

Et

Et : “node-t embedding operators”, defined via currentiterate (Ut ,Bt ).

Projector onto TUM: P = EE+.

Manifolds and gauge conditionsLubich et al. (2009), Holtz/Rohwedder/S. (2011a), Uschmajew/Vandereycken (2012),Lubich/Rohwedder/S./Vandereycken (in prep.)

B The sets of above tree (HT, TT or Tucker) tensors of fixedrank r each provide embedded submanifoldsMr of R(nd ).

B Canonical tangent space parametrization via componentfunctions Wt ∈ Ct is redundant, but unique via gaugeconditions for nodes t 6= tr , e.g.

Gt ={

Wt ∈ Ct | 〈WTt ,Bt〉 resp. 〈WT

t ,Ut〉 = 0 ∈ Rkt×kt }

B Linear isomorphism

E : ×t∈T Gt → TUM, E =∑t∈T

Et

Et : “node-t embedding operators”, defined via currentiterate (Ut ,Bt ).

Projector onto TUM: P = EE+.

Manifolds and gauge conditionsLubich et al. (2009), Holtz/Rohwedder/S. (2011a), Uschmajew/Vandereycken (2012),Lubich/Rohwedder/S./Vandereycken (in prep.)

B The sets of above tree (HT, TT or Tucker) tensors of fixedrank r each provide embedded submanifoldsMr of R(nd ).

B Canonical tangent space parametrization via componentfunctions Wt ∈ Ct is redundant, but unique via gaugeconditions for nodes t 6= tr , e.g.

Gt ={

Wt ∈ Ct | 〈WTt ,Bt〉 resp. 〈WT

t ,Ut〉 = 0 ∈ Rkt×kt }

B Linear isomorphism

E : ×t∈T Gt → TUM, E =∑t∈T

Et

Et : “node-t embedding operators”, defined via currentiterate (Ut ,Bt ).

Projector onto TUM: P = EE+.

Manifolds and gauge conditionsLubich et al. (2009), Holtz/Rohwedder/S. (2011a), Uschmajew/Vandereycken (2012),Lubich/Rohwedder/S./Vandereycken (in prep.)

B The sets of above tree (HT, TT or Tucker) tensors of fixedrank r each provide embedded submanifoldsMr of R(nd ).

B Canonical tangent space parametrization via componentfunctions Wt ∈ Ct is redundant, but unique via gaugeconditions for nodes t 6= tr , e.g.

Gt ={

Wt ∈ Ct | 〈WTt ,Bt〉 resp. 〈WT

t ,Ut〉 = 0 ∈ Rkt×kt }

B Linear isomorphism

E : ×t∈T Gt → TUM, E =∑t∈T

Et

Et : “node-t embedding operators”, defined via currentiterate (Ut ,Bt ).

Projector onto TUM: P = EE+.

Manifolds and gauge conditionsLinear isomorphism

E = E(U) : ×t∈T Gt → TUM, E(U) =∑t∈T

Et (U)

E+ Moore Penrose inverse of E

Projector onto TUM: P(U) = EE+.

Theorem (Lubich/Rohwedder/S./Vandereycken (in prep.))

For tensor B,U,V; ‖U − V‖ ≤ cρ; there exists C dependingonly on n,d, such that there holds

‖(P(U)− P(V )

)B‖ ≤ Cρ−1‖U − V‖‖B‖

‖(I − P(U)

)(U − V )‖ ≤ Cρ−1‖U − V‖2 .

These are estimates for the curvature ofMr at U.

Optimization problems/differential flow

The problems

〈J ′(U),V 〉 = 0 resp. 〈U,V 〉 = 〈f (U),V 〉 ∀V ∈ TU

onM can now be re-cast into equations for components(Ut ,Bt ) representing low-rank tensor

U = τ(Ut ,Bt ) :

With P⊥t projector to Gt , embedding operator Et = EUt as

above, solve

P⊥t ETt J ′(U) = 0 resp. Ut = P⊥t E+

t f (U),

for t 6= tr , and

ETtr J′(U) = 0 resp. Ut = E+

t f (U).

for the “root” (e.g. by standard methods for nonlinear eqs.).

Differential equations for components under gradientflow

Let U(t) = U1(t) · · ·Ui(t) · · ·Ud (t) ∈ Tr be fixed.Then

U′d (t) = −ETd (t)

(f (U(t))

)∈ Rrd−1×nd .

provided that Ui(t), i = 1, . . . ,d − 1, are left orthogonal, sinceDd (t) = I.The other components U′i(t), , i = 1, . . . ,d − 1, are given by

U′i(t) =

((I− Pi(t))⊗ Di

−1(t))

ETi (t)

(f (U(t))

).

with the orthogonal projection Pi(t) onto the parameter space,

Pi(t)W(ki−1, xi , ki) =

=∑

k ′i1,x ′i ,k

′i

Ui(t , k ′i−1, x′i , k′i )W(k ′i−1, x

′i , ki)Ui(t , ki−1, xi , k ′i ) .

Stabilization and preconditioning

In time step t → t + ∆t compute the componentsVi(τ) ≈ (I⊗ Di(t))Ui(τ), , i = 1, . . . ,d − 1, t ≤ τ < t + ∆t by

V′i(τ) = (I⊗ Di(t))U′i(τ) =

((I− Pi(τ))⊗ I

)ET

i (τ)(f (U(τ))

).

and Ui(t + δ) = left-orth(Vi(t + ∆t)

)I. Oseledets et al. : Strang

splitting and alternating directions ALS, (compare TD DMRG)Generalization of HOSVD bases of Hackbusch.For non-leaf vertices α ∈ TD , α 6= D, we have

∑rα`=1(σ

(α)` )2 C(α,`)C(α,`)H = Σ2

α1,∑rα

`=1(σ(α)` )2 C(α,`)TC(α,`) = Σ2

α2,

where α1, α2 are the first and second son of α ∈ TD and Σαi the diagonal of the

singular values ofMαi (v).

Closedness

Let Ai with ranks rankAi ≤ r .If limi→∞ ‖Ai − A‖2 = 0 then rankA ≤ r :⇒ closedness of Tucker and HT tensor in T≤r (Falco &Hackbsuch).

T≤r =⋃s≤r

Ts ⊂ H is closed! due to Hackbusch & Falco

(Weak) closedness implies the existence of minimers of convexoptimization problems constraint to T≤r .Landsberg & Ye: If a tensor network has not a tree structure,the set of all tensor of this form need not to be closed!

SummaryFor Tucker and HT redundancy can be removed (see next talk)

Table: Comparison

canonical Tucker HTcomplexity O(ndr) O(rd + ndr) O(ndr + dr3)

TT- O(ndr2)++ – +

rank no defined definedrc ≥ rT rHT , rT ≤ rHT

closedness no yes yesessential redundancy yes no no

recovery ?? yes yesquasi best approx. no yes yes

best approx. no exist existbut NP hard but NP hard

Some current results and trends

Optimization:

B Alternating optimization of components for TT robustpractical algorithm (ALS/MALS, Holtz/Rohwedder/S., SISC 2012)

B DMRG = MALS sees boost of interest in quantumphysics/quantum chemistry community

I Gradient methods — gradient flow (see below)B (Quasi-) Newton methods onM (Rohwedder et al. , in prep.)

Time-dependent equations:

B Quasi-optimal error bounds(Lubich/Rohwedder/S./Vandereycken, in prep.)solution X (t) with approx. U(t) ∈Mr , X (0) = U(0),

‖U(t)− Ubest(t)‖ . t · maxs∈[0,t]

‖Ubest(s)− X (s)‖.

Some current results and trends

Optimization:

B Alternating optimization of components for TT robustpractical algorithm (ALS/MALS, Holtz/Rohwedder/S., SISC 2012)

B DMRG = MALS sees boost of interest in quantumphysics/quantum chemistry community

I Gradient methods — gradient flow (see below)B (Quasi-) Newton methods onM (Rohwedder et al. , in prep.)

Time-dependent equations:

B Quasi-optimal error bounds(Lubich/Rohwedder/S./Vandereycken, in prep.)solution X (t) with approx. U(t) ∈Mr , X (0) = U(0),

‖U(t)− Ubest(t)‖ . t · maxs∈[0,t]

‖Ubest(s)− X (s)‖.

TT approximations of Friedman data sets

f2(x1, x2, x3, x4) =

√(x2

1 + (x2x3 −1

x2x4)2,

f3(x1, x2, x3, x4) = tan−1(x2x3 − (x2x4)−1

x1

)

on 4− D grid, n points per dim. n4 tensor, n ∈ {3, . . . ,50}.

full to tt (Oseledets, successive SVDs)and MALS (with A = I) (Holtz & Rohwedder & S.)

Solution of −∆U = b using MALS/DMRG

I Dimension d = 4, . . . ,128 varyingI Gridsize n = 10I Right-hand-side b of rank 1I Solution U has rank 13

Example: Eigenvalue problem in QTTin courtesy of B. Khoromskij, I. Oseledets, QTT: Toward bridginghigh-dimensional quantum molecular dynamics and DMRG methods,

HΨ = (−12

∆ + V )Ψ = EΨ

with potential energy surface given by Henon-Heiles potential

V (q1, . . . , qf ) =12

f∑k=1

q2k + λ

f−1∑k=1

(q2

k qk+1 −13

q3k

).

Dimensions f = 4, . . . ,256; 1-D grid size n = 128 = 27 = 2d ;

QTT-tensors ∈⊗7f

i=1 R2 = R2× ..× 2︸ ︷︷ ︸7f =1792 .

QC-DMRG and TT resp, MPS approximations

In courtesy of O Legeza (Hess & Legeza & ..)LiF dissoziation, 1st + 2nd eigenvalue