Low rank aproximation in traditional and novel tensor formats€¦ · Icanonical rank r ]DOF for a...

Low rank aproximation in traditional and noveltensor formats

R. Schneider (TUB Matheon)

MPI Munich, 2012

Our motivationsEquations describing complex systems with multi-variatesolution spaces, e.g.

B stationary/instationary Schrodinger type equations

∂tΨ(t , x) = (−1

2∆ + V )︸︷︷︸H

Ψ(t , x), HΨ(x) = EΨ(x)

describing quantum-mechanical many particle systemsB stochastic DEs and the Fokker-Planck equation,

∂p(t , x)∂t

(fi(t , x)p(t , x)

d∑i,j=1

∂xi∂xj

(Bi,j(t , x)p(t , x)

)describing mechanical systems in stochastic environment,

B chemical master equations, parametric PDEs, machinelearning, . . .

Solutions depend on x = (x1, . . . , xd ), where usually, d >> 3!

Setting - Tensors of order dGoal: Generic perspective on methods for high-dimensionalproblems, i.e. problems posed on tensor spaces,

V :=⊗d

i=1 Vi , today: V =⊗d

i=1 Rn = R(nd )

Notation: (x1, . . . , xd ) 7→ U = U(x1, . . . , xd ) ∈ V

Main problem:

dimV = O (nd ) – Curse of dimensionality!

e.g. n = 100,d = 10 10010 basis functions, coefficient vectors of 800× 1018 Bytes = 800 Exabytes

Approach: Some higher order tensors can be constructed(data-) sparsely from lower order quantities.

As for matrices, incomplete SVD:

A(x1, x2) ≈r∑

σk(uk (x1)⊗ vk (x2)

Canonical format

H ' {x = (x1, . . . , xd ) 7→ U(x1, . . . , xd ) ∈ R, xi = 1, . . . ,ni} .

Single tensor product

(x1, . . . , xd ) 7→ U(x1, . . . , xd ) = Πdi=1ui,k (xi) = Πd

i=1ui(xi , k) ,

U =d⊗

uk ,i .

Canonical (CP) format, PARAFAC or r -term expansion,

U(x1, . . . , xd ) =

rC∑k=1

Uk (x) =

rC∑k=1

Πdi=1ui,k (xi).

rC∑k=1

d⊗i=1

Canonical format - prosDefinition (Canonical format)

U(x1, . . . , xd ) =r∑

d⊗i=1

ui,k (xi) =r∑

d⊗ν=1

ui(xi , k).

r - canonical rank (?)Let n := max{ni : 1 ≤ i ≤ d}.I Number of terms r = rc (canonical rank (?)) is invariant

w.r.t. basis transformationsI canonical rank r ≤ ] DOF for a given tensor product basis

- best N-term approximation (super adaptivity)!I there is an additional cost storing the components ui,ki

I degrees of freedom (DOF) or better strorage complexity:O(drn)

I complexity scaling is O(drn) instead of O(nd ) for the fulltensor!

Counter example of Silva and LimLet A =

⊗di=1 ai , B =

⊗di=1 bi ∈ H, (possibly 〈ai ,bi〉 = 0 ∀i).

Let U(x1, . . . , xd )

:= U1(x1, . . . , xd ) + . . .+ Ud (x1, . . . , xd ) , ( Ui ⊥ Uj )

:= b1(x1)a2(x2) . . . ad (xd ) + · · ·+ a1(x1) . . . ad−1(xd−1)bd (xd )

(a1(x1) + εb1(x1)

)· · ·(ad (xd ) + εbd (xd )

)−1ε

a1(x1) · · · ad (xd ) +O(ε)

=: Vε(x1, . . . , xd ) +O(ε) , product -rule for A′ .

Vε → U as ε→ 0, but rank rc(U) = d 6= rc(Vε) = 2 if d ≥ 3!!!

I K≤r := {U ∈ H : U =∑r

k=1 Uk} is not closed.(nor weakly closed)

I The notion of canonical rank is not well defined! Borderrank problem.

Remark: The above example shows the product rule for the directional derivative

Tensor formats

Tensor formatsB Tucker format (Q: MCTDH(F))

But complexity O(rd + ndr)

U(x1, .., xd ) =

r1∑k1=1

rd∑kd =1

B(k1, .., kd )d⊗

Ui(ki , xi)

{1,2,3,4,5}

1 2 3 54

Tensor formatsB Hierarchical Tucker format

(HT; Hackbusch/Kuhn, Grasedyck, Kressner, Q: Tree-tensor networks)

B Tucker format (Q: MCTDH(F))But complexity O(rd + ndr)

B Tensor Train (TT-)format(Oseledets/Tyrtyshnikov, ' MPS-format of quantum physics)

U(x) =

r1∑k1=1

rd−1∑kd−1=1

d∏i=1

Bi(ki−1, xi , ki) = B1(x1) · · ·Bd (xd )

{1,2,3,4,5}

{1} {2,3,4,5}

{2} {3,4,5}

U1 U2 U3 U4 U5

r1 r2 r3 r4

n1 n2 n3 n4 n5

Tensor formatsB Canonical decompositionB Subspace approach (Hackbusch/Kuhn, 2009)

(Example: d = 5,Ui ∈ Rn×ki ,Bt ∈ Rkt×kt1×kt2 )

Tensor formatsB Canonical decomposition not closed, no embedded

manifold!B Subspace approach (Hackbusch/Kuhn, 2009)

{1,2,3,4,5}B

U4 5 U

{1,2,3}

{1,2,3,4,5}B

U4 5 U

{1,2,3}

U{1,2}

{1,2,3,4,5}B

U4 5 U

{1,2,3}

U{1,2}

U{1,2,3}

{1,2,3,4,5}B

U4 5 U

{1,2,3}

U{1,2}

U{1,2,3}

Optimization Problems

Problem (Generic optimization problem (OP))Given a cost functional J : H → R and an admissible setM⊂ H finding

argmin {J (W ) : W ∈M} .

Problem (Tensor product optimization problem (TOP))

U := argmin {J (W ) : W ∈M∩K≤r} (1)

Here we consider a modified optimization problem where theoriginal admissible set is confined to tensors of rank at most r .Most problems can cast into this form.

Example1. Approximation: for given U ∈ H minimize

F (W ) = ‖U −W‖2H == ‖U −W‖2 , W ∈ Kr

2. solving equations: where A, g : V → H,

AU = B or g(U) = 0

hereF (W ) := ‖AW − B‖2 resp. F (W ) := ‖g(W )‖2 .

3. or, if A : V → V ′ is symmetric and B ∈ V ′, V ⊂ H ⊂ V ′,

F (W ) :=12〈AW ,W 〉 − 〈B,W 〉

4. computing the lowest eigenvalue of a symmetric operator A : V → V ′,

U = argmin {F (W ) = 〈AW ,W 〉 : 〈W ,W 〉 = 1} .

In many casesM∩K≤r = K≤r . and most F are quadratic.

Approximation on low-rank manifoldM⊆ VB for optimisation tasks J (U)→ min:

Solve first order condition J ′(U) = 0 on tangent space,

〈J ′(U),V 〉 = 0 ∀V ∈ TU .

(Dirac-Frenkel variational principle, Absil et al., Q.Chem.: MCSCF, . . . )

J’(U) = X − U

Approximation on low-rank manifoldM⊆ VB for differential equations X = f (X ),X (0) = X0:

Solve projected DE, U = PU f (U),U(0) = U0 ∈M,

〈U(t),V 〉 = 〈f (U(t)),V 〉 ∀V ∈ TU(t) .

(Dirac-Frenkel variational principle, Lubich et al., Q.Chem.: TDMCH . . . )

X = F(X).

U = PUF(U).

F(U) TUM

Unique representation of the tangent space of Tr

Theorem (Holtz, R. Schneider, 2010)For all U ∈ T, and for

δU ∈ TUT '{γ′(t)|t=0 | γ ∈ C1(]− δ, δ[,T),

γ(t) = U1(x1, t) · . . . · Ud (xd , t), γ(0) = U(x)}

there is a unique vector (W1 . . . ,Wd ) of component functionsWi(·) : Ii → Rri−1×ri , such that

δU = δU1 + . . .+ δUd

withδUi := U1(x1) . . .Ui−1(xi−1)Wi(xi)Ui+1(xi+1) . . .Ud (xd ),

and s.t. Wi(·) fulfil the (left-orthogonality) gauge conditions

L(Ui)T L(Wi) = 0 ∈ Rri×ri for i = 1, ..,d − 1.

Sketch of proofI Existence:

δU ' U′1(x1,0)U2(x2) · .. · Ud (xd ) + U1(x1)U′2(x2,0) · .. · Ud (xd )

+ . . . + U1(x1) · .. · Ud−1(xd−1)U′d (xd ,0).

Left orthogonal decomposition: there ex. unique Λ1 s.t.

U′(x1,0) = U1(x1)Λ1 + W1(x1),

iterate.

I Uniqueness: (Idea from Lubich et al., Tucker format)I Testing δU with

Vi (x) := U1(x1) · .. · Ui−1(xi−1)Vi (xi )Ui+1(xi+1).. · Ud (xd ),

for i = 1, ..,d , gauge condition (in other inner product)gives upper block-∆-system with SPD matrices.

I L(Wi), i = d ,d − 1, ..,1 can uniquely be computedrecursively.

Parametrization of TUTI Ci spaces of component functions Ui , (i = 1, ..,d)

I Left-orthonormal spaces of Ui :

U`i := {Wi(xi) ∈ Ci , L(Ui)

T L(Wi) = 0}.

I Parameter space X := U`1 × . . .× U`

d−1 × Cd .

Corollary (Holtz, R., Schneider, 2010)The mapping τ : X → TUT,

τ(W1, . . . ,Wd ) =d∑

U1 · . . . · Ui−1WiUi+1 · . . . · Ud

is a linear bijection between X and TUT. In particular,

dim T =d∑

ri−1ni ri −d−1∑i=1

Local parametrization for T

Theorem (Holtz, R., Schneider, 2010)Let U ∈ T, Ψ : X → H defined by

Ψ(W1, . . . ,Wd ) = (U1 + W1)(x1) · .. · (Ud + Wd )(xd ).

I There exists open Nδ(0) ⊆ Rdim X such that

Ψ|Nδ : Nδ(0) 7→ Ψ(Nδ(0))open

is an embedding (i.e. an immersion that is ahomeomorphism onto its image),that is, NU ∩ T is a regular submanifold of H.

I There exists Nδ ⊂ H open, a constraint functiong = gU : Nδ → Rc , c =

∑d−1i=1 r2

i , such that

Nδ ∩ T = {U ∈ Rnd: g(U) = 0} = ψ(Nδ(0))

Proof: Inv. mapping theorem for manifolds, tang. space par. τ

Manifolds and gauge conditionsLubich et al. (2009), Holtz/Rohwedder/S. (2011a), Uschmajew/Vandereycken (2012),Lubich/Rohwedder/S./Vandereycken (in prep.)

B The sets of above tree (HT, TT or Tucker) tensors of fixedrank r each provide embedded submanifoldsMr of R(nd ).

B Canonical tangent space parametrization via componentfunctions Wt ∈ Ct is redundant, but unique via gaugeconditions for nodes t 6= tr , e.g.

Wt ∈ Ct | 〈WTt ,Bt〉 resp. 〈WT

t ,Ut〉 = 0 ∈ Rkt×kt }

B Linear isomorphism

E : ×t∈T Gt → TUM, E =∑t∈T

Et : “node-t embedding operators”, defined via currentiterate (Ut ,Bt ).

Projector onto TUM: P = EE+.

t ,Ut〉 = 0 ∈ Rkt×kt }

Manifolds and gauge conditionsLinear isomorphism

E = E(U) : ×t∈T Gt → TUM, E(U) =∑t∈T

Et (U)

E+ Moore Penrose inverse of E

Projector onto TUM: P(U) = EE+.

Theorem (Lubich/Rohwedder/S./Vandereycken (in prep.))

For tensor B,U,V; ‖U − V‖ ≤ cρ; there exists C dependingonly on n,d, such that there holds

‖(P(U)− P(V )

)B‖ ≤ Cρ−1‖U − V‖‖B‖

‖(I − P(U)

)(U − V )‖ ≤ Cρ−1‖U − V‖2 .

These are estimates for the curvature ofMr at U.

Optimization problems/differential flow

The problems

〈J ′(U),V 〉 = 0 resp. 〈U,V 〉 = 〈f (U),V 〉 ∀V ∈ TU

onM can now be re-cast into equations for components(Ut ,Bt ) representing low-rank tensor

U = τ(Ut ,Bt ) :

With P⊥t projector to Gt , embedding operator Et = EUt as

above, solve

P⊥t ETt J ′(U) = 0 resp. Ut = P⊥t E+

t f (U),

for t 6= tr , and

ETtr J′(U) = 0 resp. Ut = E+

t f (U).

for the “root” (e.g. by standard methods for nonlinear eqs.).

Differential equations for components under gradientflow

Let U(t) = U1(t) · · ·Ui(t) · · ·Ud (t) ∈ Tr be fixed.Then

U′d (t) = −ETd (t)

(f (U(t))

)∈ Rrd−1×nd .

provided that Ui(t), i = 1, . . . ,d − 1, are left orthogonal, sinceDd (t) = I.The other components U′i(t), , i = 1, . . . ,d − 1, are given by

U′i(t) =

((I− Pi(t))⊗ Di

−1(t))

ETi (t)

(f (U(t))

with the orthogonal projection Pi(t) onto the parameter space,

Pi(t)W(ki−1, xi , ki) =

k ′i1,x ′i ,k

Ui(t , k ′i−1, x′i , k′i )W(k ′i−1, x

′i , ki)Ui(t , ki−1, xi , k ′i ) .

Stabilization and preconditioning

In time step t → t + ∆t compute the componentsVi(τ) ≈ (I⊗ Di(t))Ui(τ), , i = 1, . . . ,d − 1, t ≤ τ < t + ∆t by

V′i(τ) = (I⊗ Di(t))U′i(τ) =

((I− Pi(τ))⊗ I

i (τ)(f (U(τ))

and Ui(t + δ) = left-orth(Vi(t + ∆t)

)I. Oseledets et al. : Strang

splitting and alternating directions ALS, (compare TD DMRG)Generalization of HOSVD bases of Hackbusch.For non-leaf vertices α ∈ TD , α 6= D, we have

∑rα`=1(σ

(α)` )2 C(α,`)C(α,`)H = Σ2

α1,∑rα

`=1(σ(α)` )2 C(α,`)TC(α,`) = Σ2

where α1, α2 are the first and second son of α ∈ TD and Σαi the diagonal of the

singular values ofMαi (v).

Closedness

Let Ai with ranks rankAi ≤ r .If limi→∞ ‖Ai − A‖2 = 0 then rankA ≤ r :⇒ closedness of Tucker and HT tensor in T≤r (Falco &Hackbsuch).

T≤r =⋃s≤r

Ts ⊂ H is closed! due to Hackbusch & Falco

(Weak) closedness implies the existence of minimers of convexoptimization problems constraint to T≤r .Landsberg & Ye: If a tensor network has not a tree structure,the set of all tensor of this form need not to be closed!

SummaryFor Tucker and HT redundancy can be removed (see next talk)

Table: Comparison

canonical Tucker HTcomplexity O(ndr) O(rd + ndr) O(ndr + dr3)

TT- O(ndr2)++ – +

rank no defined definedrc ≥ rT rHT , rT ≤ rHT

closedness no yes yesessential redundancy yes no no

recovery ?? yes yesquasi best approx. no yes yes

best approx. no exist existbut NP hard but NP hard

Some current results and trends

Optimization:

B Alternating optimization of components for TT robustpractical algorithm (ALS/MALS, Holtz/Rohwedder/S., SISC 2012)

B DMRG = MALS sees boost of interest in quantumphysics/quantum chemistry community

I Gradient methods — gradient flow (see below)B (Quasi-) Newton methods onM (Rohwedder et al. , in prep.)

Time-dependent equations:

B Quasi-optimal error bounds(Lubich/Rohwedder/S./Vandereycken, in prep.)solution X (t) with approx. U(t) ∈Mr , X (0) = U(0),

‖U(t)− Ubest(t)‖ . t · maxs∈[0,t]

‖Ubest(s)− X (s)‖.

Some current results and trends

Optimization:

B Alternating optimization of components for TT robustpractical algorithm (ALS/MALS, Holtz/Rohwedder/S., SISC 2012)

B DMRG = MALS sees boost of interest in quantumphysics/quantum chemistry community

I Gradient methods — gradient flow (see below)B (Quasi-) Newton methods onM (Rohwedder et al. , in prep.)

Time-dependent equations:

B Quasi-optimal error bounds(Lubich/Rohwedder/S./Vandereycken, in prep.)solution X (t) with approx. U(t) ∈Mr , X (0) = U(0),

‖U(t)− Ubest(t)‖ . t · maxs∈[0,t]

‖Ubest(s)− X (s)‖.

TT approximations of Friedman data sets

f2(x1, x2, x3, x4) =

√(x2

1 + (x2x3 −1

x2x4)2,

f3(x1, x2, x3, x4) = tan−1(x2x3 − (x2x4)−1

on 4− D grid, n points per dim. n4 tensor, n ∈ {3, . . . ,50}.

full to tt (Oseledets, successive SVDs)and MALS (with A = I) (Holtz & Rohwedder & S.)

Solution of −∆U = b using MALS/DMRG

I Dimension d = 4, . . . ,128 varyingI Gridsize n = 10I Right-hand-side b of rank 1I Solution U has rank 13

Example: Eigenvalue problem in QTTin courtesy of B. Khoromskij, I. Oseledets, QTT: Toward bridginghigh-dimensional quantum molecular dynamics and DMRG methods,

HΨ = (−12

∆ + V )Ψ = EΨ

with potential energy surface given by Henon-Heiles potential

V (q1, . . . , qf ) =12

f∑k=1

q2k + λ

f−1∑k=1

k qk+1 −13

Dimensions f = 4, . . . ,256; 1-D grid size n = 128 = 27 = 2d ;

QTT-tensors ∈⊗7f

i=1 R2 = R2× ..× 2︸︷︷︸7f =1792 .

QC-DMRG and TT resp, MPS approximations

In courtesy of O Legeza (Hess & Legeza & ..)LiF dissoziation, 1st + 2nd eigenvalue

Low rank aproximation in traditional and novel tensor formats€¦ · Icanonical rank r ]DOF for a...

Documents

Transcript of Low rank aproximation in traditional and novel tensor formats€¦ · Icanonical rank r ]DOF for a...

Isospin dependent interactions in RMF & ρ NN tensor coupling

Optimal Succinct Rank Data Structure via Approximate …theory.stanford.edu/~yuhch123/files/succinct_rank.pdf · Optimal Succinct Rank Data Structure via Approximate Nonnegative Tensor

Lecture 3: Adaptivity II: General Operators and Extensions

TENSOR ALGEBRA - PRESENTACIOmmc.rmee.upc.edu/documents/Slides/Ch0-Algebra_v22.pdf · Concept of Tensor A TENSOR is an algebraic entity with various components which generalizes the

RANK FULL MODEL (VARIANCE ESTIMATION)

1.13 The Levi-Civita Tensor and Hodge Dualisationphys.columbia.edu/~cyr/notes/Electrodynamics/CPope-DiffForms-p56... · Levi-Civita tensor . Somefurtherremarksareinorderatthispoint.First,weshall

The Weyl Tensor - Universität zu Köln · I Algebraic equations for the traces of the Riemann Tensor ... Strong gravitational lensing for the photons coupled to Weyl tensor in ...

Classifying higher rank analytic Toeplitz algebrasnyjm.albany.edu/j/2007/13-14v.pdfClassifying higher rank analytic Toeplitz algebras Stephen C. Power Abstract. To a higher rank directed

Tensors, ranks, and varieties. - areeweb.polito.it off/Kick off... · with vi 2Vi is called elementary tensor. 2/15 E. Carlini Tensors, ranks, and varieties. Tensor rank Note that

Fast Matrix Rank Algorithms and Applications

Lecture 10 & 11 { Tensor Decompositionverma/classes/uml/lec/uml_lec10_tensors.pdf · Lecture 10 & 11 { Tensor Decomposition Instructor: Geelon So Scribes: Wenbo Gao, Xuefeng Hu 1

Tensor topology - homepages.inf.ed.ac.ukhomepages.inf.ed.ac.uk/cheunen/cvqt/slides/Enriquemoliner.pdf · Tensor topology Pau Enrique Moliner1 Chris Heunen 1 Sean Tull 2 1University

Tensor Methods for Feature Learning

Low-Rank Factorization Models for Matrix Completion and ...bicmr.pku.edu.cn/~wenzw/paper/FactMatTalk.pdf · Low-Rank Factorization Models for Matrix Completion and Matrix Separation

Tensor Analysis - Universität Wien · Tensor Analysis Author: Harald Höller ... the simple Euclidean vector space won't provide the ... The nabla operator applied on a tensor of

Finish Categ. Bib Name Categ. Nation. Finish …...Rank Gener. Rank Gend. Rank Categ. Bib Name Categ. Nation. Finish Finish Net Club 1 1 1 51039 ΑΠΟΣΤΟΛΙ∆ΗΣ ΠΑΥΛΟΣ

Rank 3 Inhabitation of Intersection Types Revisited · “Inhabitation of low-rank intersection types”. Typed Lambda Calculi and Applications. Springer Berlin Heidelberg, 2009.

Lecture 4 3 d stress tensor and equilibrium equations

Tensor dan Operasinya Skalar,Vektor dan Tensorrepository.unair.ac.id/25584/16/16. Lampiran.pdf · Lampiran 1 Tensor dan Operasinya Skalar,Vektor dan Tensor Misalkan terdapat N buah

The rank of Mazur's Eisenstein ideal - UCLAwake/EisensteinRank2.pdf · THE RANK OF MAZUR’S EISENSTEIN IDEAL PRESTON WAKE AND CARL WANG-ERICKSON Abstract. We use pseudodeformation