Randomized Sketching for Low-Memory Dynamic Optimization

46
Randomized Sketching for Low-Memory Dynamic Optimization Drew P. Kouri Ramchandran Muthukumar, Madeleine Udell Center for Mathematics and Artificial Intelligence Colloquium George Mason University, Fairfax, VA October 2, 2020 SAND2020-10685 PE

Transcript of Randomized Sketching for Low-Memory Dynamic Optimization

Page 1: Randomized Sketching for Low-Memory Dynamic Optimization

Randomized Sketching for Low-MemoryDynamic Optimization

Drew P. KouriRamchandran Muthukumar, Madeleine Udell

Center for Mathematics and Artificial Intelligence ColloquiumGeorge Mason University, Fairfax, VA

October 2, 2020SAND2020-10685 PE

Page 2: Randomized Sketching for Low-Memory Dynamic Optimization

2 Outline

1. Motivating Applications

2. General Problem Formulation

3. Randomize Sketching for State Compression

4. Fixed Rank Analysis

5. Adaptive Rank Algorithm

6. Numerical Results

Drew Kouri Sandia National Laboratories Sketching for Dynamic Optimization

Page 3: Randomized Sketching for Low-Memory Dynamic Optimization

3 Operational Applications Secondary Oil Recovery

maxv,p,s,q

NPV(v, p, s, q)

subject tov = −Kλ(s)(∇p− ρG)

∇ · v = q/ρφ ∂ts +∇ · ( f (s)v ) = q

1. Goal: Determine injection rates that maximize the net present value of the reservior.

2. Problem Size: The reservoir can span 10+ km per side =⇒ 1000km3

A regular grid with 5m×5m×5m cells =⇒ 8 billion cells!

3. Similar Applications: Gas pipeline and energy network operations, financial portfoliooptimization, optimal control of hypersonic vehicles, etc.

Drew Kouri Sandia National Laboratories Sketching for Dynamic Optimization

Page 4: Randomized Sketching for Low-Memory Dynamic Optimization

4 Optimal Design Convolute Design for the Z-Machine (Inertial Confinement Fusion)

Goal: Minimize power loss – current de-sign experiences 20% power loss

Similar Applications: Designing elas-tic structures with topology optimization,shape optimization for airplane wing de-sign, etc.

Drew Kouri Sandia National Laboratories Sketching for Dynamic Optimization

Page 5: Randomized Sketching for Low-Memory Dynamic Optimization

5 Inverse Problems Seismic Exploration

Courtesy of Schlumberger

minu,θ

12‖u− d‖2 + α℘(θ)

subject to∂t (%(θ) ∂tu)−∇ · (K(θ) : ε) = s

ε = 12 (∇u +∇u>)

1. Goal: Determine subsurface rock properties that minimize data misfit.

2. Problem Size: Typical marine surveys require many TBs of memory!

3. Similar Applications: Parameter calibration, data assimilation, supervised learningfor neural networks, etc.

Drew Kouri Sandia National Laboratories Sketching for Dynamic Optimization

Page 6: Randomized Sketching for Low-Memory Dynamic Optimization

6 Discrete Dynamic Optimization Full-Space Formulation

minzn∈Rm, un∈RM

N∑n=1

fn(un−1,un, zn)

subject to cn(un−1,un, zn) = 0, n = 1, . . . ,N

where fn : RM × RM × Rm → R is the cost function at time tn,

cn : RM × RM × Rm → RM is the dynamic constraint at time tn,

un ∈ RM is the state/trajectory at time tn, and

zn ∈ Rm is the control/design/parameters at time tn.

E.g., continuous dynamic problem discretized in time (and space).

Our methods also work for time-independent control/design/parameters.

Drew Kouri Sandia National Laboratories Sketching for Dynamic Optimization

Page 7: Randomized Sketching for Low-Memory Dynamic Optimization

7 Discrete Dynamic Optimization Optimality Conditions

The KKT conditions for the full-space dynamic optimization problem are

cn(un−1, un, zn) = 0

(d2cN(uN−1, uN, zN))>λN =− d2fN(uN−1, uN, zN)

(d2cn(un−1, un, zn))>λn + (d1cn+1(un, un+1, zn+1))

>λn+1 =− d2fN(un−1, un, zn)− d1fn+1(un, un+1, zn+1)

d3fn(un−1, un, zn) + (d3cn(un−1, un, zn))>λn = 0, n = 1, . . . ,N.

The new variables λ1, . . . , λN are Lagrange multipliers for the dynamic constraint.

Most algorithms (e.g., SQP, AL, etc.) require storage of unNn=1, λnN

n=1 and znNn=1.

Storage Requirement: O(N(2M + m)) floating point numbers=⇒ Can be huge (e.g., TBs of memory)!

Drew Kouri Sandia National Laboratories Sketching for Dynamic Optimization

Page 8: Randomized Sketching for Low-Memory Dynamic Optimization

8 Discrete Dynamic Optimization Reduced-Space Formulation

If cn(un−1,un, zn) = 0 has a unique solution un = Sn(un−1, zn) for each z ∈ Zad and fixedinitial state u0 ∈ RM, n = 1, . . . ,N, then we can equivalently solve

minZ=znN

n=1∈RmNF(Z)

where the reduced objective function F is given by

F(Z) := f1(u0,S1(u0, z1), z1) +

N∑n=2

fn(Sn−1(un−2, zn−1),Sn(un−1, zn), zn).

Storage Requirement: O(Nm) floating point numbers — typically, mM.

Drew Kouri Sandia National Laboratories Sketching for Dynamic Optimization

Page 9: Randomized Sketching for Low-Memory Dynamic Optimization

9 Discrete Dynamic Optimization Reduced-Space Formulation

Gradient Computation:

(∇F(Z))n = d3fn(un−1,un, zn) + (d3cn(un−1,un, zn))>λn

where λn ∈ RM, n = 1, . . . ,N solves the backward-in-time adjoint equaiton

(d2cN(uN−1,uN, zN))>λN =− d2fN(uN−1,uN, zN)

(d2cn(un−1,un, zn))>λn =− d2fN(un−1,un, zn)− d1fn+1(un,un+1, zn+1)

− (d1cn+1(un,un+1, zn+1))>λn+1

Gradient computations requires entire state trajectory to solve adjoint equation!

Drew Kouri Sandia National Laboratories Sketching for Dynamic Optimization

Page 10: Randomized Sketching for Low-Memory Dynamic Optimization

10 Discrete Dynamic Optimization Existing Methods

1. Checkpointing: Store a handful of state snapshots in memory or to hard disk.Recompute state from snapshots during adjoint computation.Griewank and Walther, Algorithm 799: Revolve: An implementation of checkpointing for scientificcomputing and optimal control, 2000.

Significant increase in computational cost!

2. Model Reduction: Project dynamic equation onto a low-order basis of the solutionspace (e.g., balanced truncation, POD, reduced basis, etc.).Benner, Sachs and Volkwein, Model order reduction for PDE constrained optimization, 2014.

Difficult for nonlinear dynamics and requires significant code modification!

3. State Compression: Compress dynamic trajectory using numerical compressiontechniques including nested meshes, PCA, Gramm-Schmidt and DFT.Cyr, Shadid and Wildey, Towards efficient backward-in-time adjoint computations using data compressiontechniques, 2015.

Can be physics-dependent and often compresses time/space independently!

Drew Kouri Sandia National Laboratories Sketching for Dynamic Optimization

Page 11: Randomized Sketching for Low-Memory Dynamic Optimization

10 Discrete Dynamic Optimization Existing Methods

1. Checkpointing: Store a handful of state snapshots in memory or to hard disk.Recompute state from snapshots during adjoint computation.Griewank and Walther, Algorithm 799: Revolve: An implementation of checkpointing for scientificcomputing and optimal control, 2000.

Significant increase in computational cost!

2. Model Reduction: Project dynamic equation onto a low-order basis of the solutionspace (e.g., balanced truncation, POD, reduced basis, etc.).Benner, Sachs and Volkwein, Model order reduction for PDE constrained optimization, 2014.

Difficult for nonlinear dynamics and requires significant code modification!

3. State Compression: Compress dynamic trajectory using numerical compressiontechniques including nested meshes, PCA, Gramm-Schmidt and DFT.Cyr, Shadid and Wildey, Towards efficient backward-in-time adjoint computations using data compressiontechniques, 2015.

Can be physics-dependent and often compresses time/space independently!

Drew Kouri Sandia National Laboratories Sketching for Dynamic Optimization

Page 12: Randomized Sketching for Low-Memory Dynamic Optimization

10 Discrete Dynamic Optimization Existing Methods

1. Checkpointing: Store a handful of state snapshots in memory or to hard disk.Recompute state from snapshots during adjoint computation.Griewank and Walther, Algorithm 799: Revolve: An implementation of checkpointing for scientificcomputing and optimal control, 2000.

Significant increase in computational cost!

2. Model Reduction: Project dynamic equation onto a low-order basis of the solutionspace (e.g., balanced truncation, POD, reduced basis, etc.).Benner, Sachs and Volkwein, Model order reduction for PDE constrained optimization, 2014.

Difficult for nonlinear dynamics and requires significant code modification!

3. State Compression: Compress dynamic trajectory using numerical compressiontechniques including nested meshes, PCA, Gramm-Schmidt and DFT.Cyr, Shadid and Wildey, Towards efficient backward-in-time adjoint computations using data compressiontechniques, 2015.

Can be physics-dependent and often compresses time/space independently!Drew Kouri Sandia National Laboratories Sketching for Dynamic Optimization

Page 13: Randomized Sketching for Low-Memory Dynamic Optimization

11 Randomized Sketching State Compression

We apply randomized sketching to compress the M×N state trajectory matrix

U =(u1 | · · · | uN

)∈ RM×N.

Goal: Generate an accurate rank-r approximation of U using O(r(M + N)) storage.

Matrix Sketching: Given the four random linear dimension reduction maps

Υ ∈ Rk×M, Ω ∈ Rk×N, Φ ∈ Rs×M, and Ψ ∈ Rs×N,

where k := 2r + 1 and s = 2k + 1, we define the sketch as the following three matrices

X := ΥU ∈ Rk×N the co-range sketch;

Y := UΩ> ∈ RM×k the range sketch;

Z := ΦUΨ> ∈ Rs×s the core sketch.

Tropp, Yurtserver, Udell and Cevher, Streaming low-rank matrix approximation with an application to scientificsimulation, 2019.Drew Kouri Sandia National Laboratories Sketching for Dynamic Optimization

Page 14: Randomized Sketching for Low-Memory Dynamic Optimization

12 Randomized Sketching Details and Intuition

Choice of Random Matrices: Standard normal, scrambled subsampled randomizedFourier transform, sparse sign, . . .

Intuition:I The range sketch Y = UΩ> consists of columns Uωi, which are independent random

samples from range(U).

Roughly speaking, Y captures the action of the top k left singular vectors.

I Similarly, the co-range sketch X = ΥU consists of rows υ>i U, which are independentrandom samples from range(U>).

Roughly speaking, X captures the action of the top k right singular vectors.

I Roughly speaking, the core sketch Z = ΦUΨ> captures the top k singular values.

Drew Kouri Sandia National Laboratories Sketching for Dynamic Optimization

Page 15: Randomized Sketching for Low-Memory Dynamic Optimization

13 Randomized Sketching Online State Compression and Recovery

Online State Sketching:Require: X = 0, Y = 0, Z = 0

1: for n=1,. . . ,N do2: Given un−1 and zn, solve cn(un−1,un, zn) = 0 for un3: Update X← X + Υune>n4: Update Y← Y + un(Ωen)>

5: Update Z← Z + (Φun)(Ψen)>

6: end forStorage Requirement: Sketching requires k(M + N) + s2 floating point numbers.

Recovery:1: (Q,R2)← qr(Y, 0) Q ∈ RM×k, R2 ∈ Rk×k

2: (P,R1)← qr(X>, 0) P ∈ RN×k, R1 ∈ Rk×k

3: C← (ΦQ)†Z((ΨP)†)> C ∈ Rk×k

4: W ← CP> W ∈ Rk×N

5: un ≈ QWen

Storage Requirement: Recovery requires k(M + N) + k2 floating point numbers.

Drew Kouri Sandia National Laboratories Sketching for Dynamic Optimization

Page 16: Randomized Sketching for Low-Memory Dynamic Optimization

13 Randomized Sketching Online State Compression and Recovery

Online State Sketching:Require: X = 0, Y = 0, Z = 0

1: for n=1,. . . ,N do2: Given un−1 and zn, solve cn(un−1,un, zn) = 0 for un3: Update X← X + Υune>n4: Update Y← Y + un(Ωen)>

5: Update Z← Z + (Φun)(Ψen)>

6: end forStorage Requirement: Sketching requires k(M + N) + s2 floating point numbers.Recovery:

1: (Q,R2)← qr(Y, 0) Q ∈ RM×k, R2 ∈ Rk×k

2: (P,R1)← qr(X>, 0) P ∈ RN×k, R1 ∈ Rk×k

3: C← (ΦQ)†Z((ΨP)†)> C ∈ Rk×k

4: W ← CP> W ∈ Rk×N

5: un ≈ QWen

Storage Requirement: Recovery requires k(M + N) + k2 floating point numbers.

Drew Kouri Sandia National Laboratories Sketching for Dynamic Optimization

Page 17: Randomized Sketching for Low-Memory Dynamic Optimization

14 Randomized Sketching State Compression Error

Theorem: Sketching Error BoundLet Ur denote the sketch of U associated with the rank parameter r. Then,

EΥ,Ω,Φ,Ψ‖U − Ur‖F ≤√

6( ∑

i≥r+1

σ2i (U)

) 12

.

The error is expected to be slightly larger than the best rank-r approximation!

Similar results exist for the probability of large deviation.

Tropp, Yurtserver, Udell and Cevher, Streaming low-rank matrix approximation with an application to scientificsimulation, 2019.

Drew Kouri Sandia National Laboratories Sketching for Dynamic Optimization

Page 18: Randomized Sketching for Low-Memory Dynamic Optimization

15 Randomized Sketching Gradient Approximation

Solve Adjoint Equation and Compute Gradient:Require: Sketched state Ur.1: Reconstruct uN−1 and uN from Ur.2: Compute λN that solves (d2cN(uN−1, uN, zN))

>λN = −d2fN(uN−1, uN, zN).3: Set gN = d3fN(uN−1, uN, zN) + (d3cN(uN−1, uN, zN))

>λN.4: for n = N − 1, . . . , 1 do5: Reconstruct un−1 from Ur.6: Compute λn that solves

(d2cn(un−1, un, zn))>λn =− d2fn(un−1, un, zn)− d1fn+1(un, un+1, zn+1)

− (d1cn+1(un, un+1, zn+1))>λn+1.

7: Set gn = d3fn(un−1, un, zn) + (d3cn(un−1, un, zn))>λn.

8: end for

Drew Kouri Sandia National Laboratories Sketching for Dynamic Optimization

Page 19: Randomized Sketching for Low-Memory Dynamic Optimization

16 Randomized Sketching Assumptions and Gradient Error

Define the combined objective function and constraint by

f (U,Z) =

N∑i=1

fi(ui−1,ui, zi) and c(U,Z) =

c1(u0,u1, z1)...

cN(uN−1,uN, zN)

.

Assumptions:

1. For any bounded set Z ⊂ RNm, the set U = U | c(U,Z) = 0, Z ∈ Z is bounded;

2. There exists 0 < σ0 ≤ σ1 <∞ such that for all U ∈ U and Z ∈ Zσ0 ≤ σmin(d1c(U,Z)) ≤ σmax(d1c(U,Z)) ≤ σ1;

3. The functions d1c(U,Z), d2c(U,Z), d1f (U,Z) and d2f (U,Z) are Lipschitz continuouson U × Z with respect to U and the Lipschitz modulii are independent of Z ∈ Z.

Under these assumptions, we can show that

‖g−∇F(Z)‖ ≤ κ1 ‖c(Ur,Z)‖︸ ︷︷ ︸state residual

and E‖g−∇F(Z)‖ ≤√

6κ( ∑

i≥r+1

σ2i (U)

) 12

.

Drew Kouri Sandia National Laboratories Sketching for Dynamic Optimization

Page 20: Randomized Sketching for Low-Memory Dynamic Optimization

16 Randomized Sketching Assumptions and Gradient Error

Define the combined objective function and constraint by

f (U,Z) =

N∑i=1

fi(ui−1,ui, zi) and c(U,Z) =

c1(u0,u1, z1)...

cN(uN−1,uN, zN)

.

Assumptions:1. For any bounded set Z ⊂ RNm, the set U = U | c(U,Z) = 0, Z ∈ Z is bounded;

2. There exists 0 < σ0 ≤ σ1 <∞ such that for all U ∈ U and Z ∈ Zσ0 ≤ σmin(d1c(U,Z)) ≤ σmax(d1c(U,Z)) ≤ σ1;

3. The functions d1c(U,Z), d2c(U,Z), d1f (U,Z) and d2f (U,Z) are Lipschitz continuouson U × Z with respect to U and the Lipschitz modulii are independent of Z ∈ Z.

Under these assumptions, we can show that

‖g−∇F(Z)‖ ≤ κ1 ‖c(Ur,Z)‖︸ ︷︷ ︸state residual

and E‖g−∇F(Z)‖ ≤√

6κ( ∑

i≥r+1

σ2i (U)

) 12

.

Drew Kouri Sandia National Laboratories Sketching for Dynamic Optimization

Page 21: Randomized Sketching for Low-Memory Dynamic Optimization

16 Randomized Sketching Assumptions and Gradient Error

Define the combined objective function and constraint by

f (U,Z) =

N∑i=1

fi(ui−1,ui, zi) and c(U,Z) =

c1(u0,u1, z1)...

cN(uN−1,uN, zN)

.

Assumptions:1. For any bounded set Z ⊂ RNm, the set U = U | c(U,Z) = 0, Z ∈ Z is bounded;

2. There exists 0 < σ0 ≤ σ1 <∞ such that for all U ∈ U and Z ∈ Zσ0 ≤ σmin(d1c(U,Z)) ≤ σmax(d1c(U,Z)) ≤ σ1;

3. The functions d1c(U,Z), d2c(U,Z), d1f (U,Z) and d2f (U,Z) are Lipschitz continuouson U × Z with respect to U and the Lipschitz modulii are independent of Z ∈ Z.

Under these assumptions, we can show that

‖g−∇F(Z)‖ ≤ κ1 ‖c(Ur,Z)‖︸ ︷︷ ︸state residual

and E‖g−∇F(Z)‖ ≤√

6κ( ∑

i≥r+1

σ2i (U)

) 12

.

Drew Kouri Sandia National Laboratories Sketching for Dynamic Optimization

Page 22: Randomized Sketching for Low-Memory Dynamic Optimization

16 Randomized Sketching Assumptions and Gradient Error

Define the combined objective function and constraint by

f (U,Z) =

N∑i=1

fi(ui−1,ui, zi) and c(U,Z) =

c1(u0,u1, z1)...

cN(uN−1,uN, zN)

.

Assumptions:1. For any bounded set Z ⊂ RNm, the set U = U | c(U,Z) = 0, Z ∈ Z is bounded;

2. There exists 0 < σ0 ≤ σ1 <∞ such that for all U ∈ U and Z ∈ Zσ0 ≤ σmin(d1c(U,Z)) ≤ σmax(d1c(U,Z)) ≤ σ1;

3. The functions d1c(U,Z), d2c(U,Z), d1f (U,Z) and d2f (U,Z) are Lipschitz continuouson U × Z with respect to U and the Lipschitz modulii are independent of Z ∈ Z.

Under these assumptions, we can show that

‖g−∇F(Z)‖ ≤ κ1 ‖c(Ur,Z)‖︸ ︷︷ ︸state residual

and E‖g−∇F(Z)‖ ≤√

6κ( ∑

i≥r+1

σ2i (U)

) 12

.

Drew Kouri Sandia National Laboratories Sketching for Dynamic Optimization

Page 23: Randomized Sketching for Low-Memory Dynamic Optimization

16 Randomized Sketching Assumptions and Gradient Error

Define the combined objective function and constraint by

f (U,Z) =

N∑i=1

fi(ui−1,ui, zi) and c(U,Z) =

c1(u0,u1, z1)...

cN(uN−1,uN, zN)

.

Assumptions:1. For any bounded set Z ⊂ RNm, the set U = U | c(U,Z) = 0, Z ∈ Z is bounded;

2. There exists 0 < σ0 ≤ σ1 <∞ such that for all U ∈ U and Z ∈ Zσ0 ≤ σmin(d1c(U,Z)) ≤ σmax(d1c(U,Z)) ≤ σ1;

3. The functions d1c(U,Z), d2c(U,Z), d1f (U,Z) and d2f (U,Z) are Lipschitz continuouson U × Z with respect to U and the Lipschitz modulii are independent of Z ∈ Z.

Under these assumptions, we can show that

‖g−∇F(Z)‖ ≤ κ1 ‖c(Ur,Z)‖︸ ︷︷ ︸state residual

and E‖g−∇F(Z)‖ ≤√

6κ( ∑

i≥r+1

σ2i (U)

) 12

.

Drew Kouri Sandia National Laboratories Sketching for Dynamic Optimization

Page 24: Randomized Sketching for Low-Memory Dynamic Optimization

17 Fixed Rank Approach

I Idea: Use approximate gradient within a derivative-based optimization algorithm.For example, gradient descent, nonlinear CG, Newton–Krylov method, etc.

I If the rank parameter r is sufficiently large, then gradient error may be negligible.

I For Newton–Krylov, applying the Hessian to a vector requires a linearizedforward-in-time solve and a linearized backward-in-time solve.

I Can approximately apply the Hessian to a vector by sketching the adjoint as wellas the trajectories from the additional solves.

Drew Kouri Sandia National Laboratories Sketching for Dynamic Optimization

Page 25: Randomized Sketching for Low-Memory Dynamic Optimization

18 Fixed Rank Approach Basic Trust-Region Method

Require: Initial guess Z(0) ∈ RNm and target rank r. Algorithmic parameters: ∆(0) > 0 and0 < η0 ≤ η1 < η2 < 1.

1: for ` = 0, 1, 2, . . . do2: Solve state equation and sketch solution U(`)r.3: Compute gradient approximation g(`) ≈ ∇F(Z(`)) using sketched state U(`)r.4: Choose B(`) ∈ RNm×Nm to be an approximation of the Hessian ∇2F(Z(`)).5: Compute S(`) ∈ RNm that approximately solves the trust-region subproblem

minS∈RNm

12 〈S,B

(`)S〉+ 〈S, g(`)〉 subject to ‖S‖ ≤ ∆(`).

6: Compute the ratio of actual and predicted reduction

ρ(`) =F(Z(`))− F(Z(`) + S(`))

− 12 〈S

(`),B(`)S(`)〉 − 〈S(`), g(`)〉.

7: Set Z(`+1) =

Z(`) if ρ(`) ≤ η0

Z(`) + S(`) otherwise.

8: Choose ∆(`+1) ∈

(0,∆(`)) if ρ(`) ≤ η1

∆(`) if ρ(`) ∈ (η1, η2)

(∆(`),∞) otherwise.9: end for

Drew Kouri Sandia National Laboratories Sketching for Dynamic Optimization

Page 26: Randomized Sketching for Low-Memory Dynamic Optimization

18 Fixed Rank Approach Basic Trust-Region Method

Require: Initial guess Z(0) ∈ RNm and target rank r. Algorithmic parameters: ∆(0) > 0 and0 < η0 ≤ η1 < η2 < 1.

1: for ` = 0, 1, 2, . . . do2: Solve state equation and sketch solution U(`)r.3: Compute gradient approximation g(`) ≈ ∇F(Z(`)) using sketched state U(`)r.4: Choose B(`) ∈ RNm×Nm to be an approximation of the Hessian ∇2F(Z(`)).5: Compute S(`) ∈ RNm that approximately solves the trust-region subproblem

minS∈RNm

12 〈S,B

(`)S〉+ 〈S, g(`)〉 subject to ‖S‖ ≤ ∆(`).

6: Compute the ratio of actual and predicted reduction

ρ(`) =F(Z(`))− F(Z(`) + S(`))

− 12 〈S

(`),B(`)S(`)〉 − 〈S(`), g(`)〉.

7: Set Z(`+1) =

Z(`) if ρ(`) ≤ η0

Z(`) + S(`) otherwise.

8: Choose ∆(`+1) ∈

(0,∆(`)) if ρ(`) ≤ η1

∆(`) if ρ(`) ∈ (η1, η2)

(∆(`),∞) otherwise.9: end for

Algorithm may not converge because gradient is inexact!Drew Kouri Sandia National Laboratories Sketching for Dynamic Optimization

Page 27: Randomized Sketching for Low-Memory Dynamic Optimization

19 Adaptive Rank Algorithm Inexact Trust-Region Method

1. Fixed-rank algorithm may not converge unless the rank r is chosen sufficiently large.A priori choice of r is often difficult for practical applications!

2. To ensure convergence of the TR method, gradient inexactness must be controlled

‖g(`) −∇F(Z(`))‖ ≤ κ0 min‖g(`)‖,∆(`).

Heinkenschloss and Vicente, Analysis of incexact SQP algorithms, 2001.

3. Satisfy gradient inexactness condition by monitoring state residual

‖g(`) −∇F(Z(`))‖ ≤ κ1‖c(U(`),Z(`))‖ =⇒ ‖c(U(`),Z(`))‖ ≤ κ′0 min‖g(`)‖,∆(`).

4. Evaluate state residual while solving adjoint equation.If bound is violated, increase r, re-solve state equation and re-sketch state!

5. Hopefully sufficiently large r is determined after a handful of updates.May run out of memory if rank of state is too large!

Drew Kouri Sandia National Laboratories Sketching for Dynamic Optimization

Page 28: Randomized Sketching for Low-Memory Dynamic Optimization

19 Adaptive Rank Algorithm Inexact Trust-Region Method

1. Fixed-rank algorithm may not converge unless the rank r is chosen sufficiently large.A priori choice of r is often difficult for practical applications!

2. To ensure convergence of the TR method, gradient inexactness must be controlled

‖g(`) −∇F(Z(`))‖ ≤ κ0 min‖g(`)‖,∆(`).

Heinkenschloss and Vicente, Analysis of incexact SQP algorithms, 2001.

3. Satisfy gradient inexactness condition by monitoring state residual

‖g(`) −∇F(Z(`))‖ ≤ κ1‖c(U(`),Z(`))‖ =⇒ ‖c(U(`),Z(`))‖ ≤ κ′0 min‖g(`)‖,∆(`).

4. Evaluate state residual while solving adjoint equation.If bound is violated, increase r, re-solve state equation and re-sketch state!

5. Hopefully sufficiently large r is determined after a handful of updates.May run out of memory if rank of state is too large!

Drew Kouri Sandia National Laboratories Sketching for Dynamic Optimization

Page 29: Randomized Sketching for Low-Memory Dynamic Optimization

19 Adaptive Rank Algorithm Inexact Trust-Region Method

1. Fixed-rank algorithm may not converge unless the rank r is chosen sufficiently large.A priori choice of r is often difficult for practical applications!

2. To ensure convergence of the TR method, gradient inexactness must be controlled

‖g(`) −∇F(Z(`))‖ ≤ κ0 min‖g(`)‖,∆(`).

Heinkenschloss and Vicente, Analysis of incexact SQP algorithms, 2001.

3. Satisfy gradient inexactness condition by monitoring state residual

‖g(`) −∇F(Z(`))‖ ≤ κ1‖c(U(`),Z(`))‖ =⇒ ‖c(U(`),Z(`))‖ ≤ κ′0 min‖g(`)‖,∆(`).

4. Evaluate state residual while solving adjoint equation.If bound is violated, increase r, re-solve state equation and re-sketch state!

5. Hopefully sufficiently large r is determined after a handful of updates.May run out of memory if rank of state is too large!

Drew Kouri Sandia National Laboratories Sketching for Dynamic Optimization

Page 30: Randomized Sketching for Low-Memory Dynamic Optimization

19 Adaptive Rank Algorithm Inexact Trust-Region Method

1. Fixed-rank algorithm may not converge unless the rank r is chosen sufficiently large.A priori choice of r is often difficult for practical applications!

2. To ensure convergence of the TR method, gradient inexactness must be controlled

‖g(`) −∇F(Z(`))‖ ≤ κ0 min‖g(`)‖,∆(`).

Heinkenschloss and Vicente, Analysis of incexact SQP algorithms, 2001.

3. Satisfy gradient inexactness condition by monitoring state residual

‖g(`) −∇F(Z(`))‖ ≤ κ1‖c(U(`),Z(`))‖ =⇒ ‖c(U(`),Z(`))‖ ≤ κ′0 min‖g(`)‖,∆(`).

4. Evaluate state residual while solving adjoint equation.If bound is violated, increase r, re-solve state equation and re-sketch state!

5. Hopefully sufficiently large r is determined after a handful of updates.May run out of memory if rank of state is too large!

Drew Kouri Sandia National Laboratories Sketching for Dynamic Optimization

Page 31: Randomized Sketching for Low-Memory Dynamic Optimization

19 Adaptive Rank Algorithm Inexact Trust-Region Method

1. Fixed-rank algorithm may not converge unless the rank r is chosen sufficiently large.A priori choice of r is often difficult for practical applications!

2. To ensure convergence of the TR method, gradient inexactness must be controlled

‖g(`) −∇F(Z(`))‖ ≤ κ0 min‖g(`)‖,∆(`).

Heinkenschloss and Vicente, Analysis of incexact SQP algorithms, 2001.

3. Satisfy gradient inexactness condition by monitoring state residual

‖g(`) −∇F(Z(`))‖ ≤ κ1‖c(U(`),Z(`))‖ =⇒ ‖c(U(`),Z(`))‖ ≤ κ′0 min‖g(`)‖,∆(`).

4. Evaluate state residual while solving adjoint equation.If bound is violated, increase r, re-solve state equation and re-sketch state!

5. Hopefully sufficiently large r is determined after a handful of updates.May run out of memory if rank of state is too large!

Drew Kouri Sandia National Laboratories Sketching for Dynamic Optimization

Page 32: Randomized Sketching for Low-Memory Dynamic Optimization

20 Adaptive Rank Algorithm Convergence Theory

Theorem: Convergence of Adaptive-Rank Trust-Region MethodSuppose there exists a bounded, open, convex set Z ⊂ RNm such that Z(`) ∈ Z for all `.Suppose F is bounded below and twice continuously differentiable on Z. Moreover,assume that ∇2F(·) is uniformly bounded on Z and that B(`) is bounded independently of`. Then,

lim inf`→∞

‖g(`)‖ = lim inf`→∞

‖∇F(Z(`))‖ = 0.

For detailed analysis of inexact trust-region methods see:

Carter, On the global convergence of trust region algorithms using inexact gradient information, 1991.

Conn, Gould and Toint, Trust Region Methods, 2000.

Heinkenschloss and Vicente, Analysis of incexact SQP algorithms, 2001.

Kouri and Ridzal, Inexact trust-region methods for PDE-constrained optimization, 2018.

Drew Kouri Sandia National Laboratories Sketching for Dynamic Optimization

Page 33: Randomized Sketching for Low-Memory Dynamic Optimization

21 Adaptive Rank Algorithm Constrained Optimization

“Simple” Control Constraints: Constraints with easy-to-compute projections such as

a0 ≤ CZ ≤ a1 and DZ = b.

Only need to modify trust-region subproblem to account for constraints in the set Zad, i.e.,

minS∈RNm

12 〈S,B

(`)S〉+ 〈S, g(`)〉 subject to Z(`) + S ∈ Zad, ‖S‖ ≤ ∆(`).

Garreis and Ulbrich, An inexact trust-region algorith for constrained problems in Hilbert space and its applicationto the adaptive solution of optimal control problems with PDEs, 2019.

General Constraints: Use augmented Lagrangian, log barrier, etc. to penalize generalnonlinear constraints such as

g(U,Z) = 0 and h(U,Z) ≤ 0.

Apply the adaptive rank algorithm to solve the penalized subproblems

minZ∈RNm

F(Z) + ℘(`)(Z).

Drew Kouri Sandia National Laboratories Sketching for Dynamic Optimization

Page 34: Randomized Sketching for Low-Memory Dynamic Optimization

21 Adaptive Rank Algorithm Constrained Optimization

“Simple” Control Constraints: Constraints with easy-to-compute projections such as

a0 ≤ CZ ≤ a1 and DZ = b.

Only need to modify trust-region subproblem to account for constraints in the set Zad, i.e.,

minS∈RNm

12 〈S,B

(`)S〉+ 〈S, g(`)〉 subject to Z(`) + S ∈ Zad, ‖S‖ ≤ ∆(`).

Garreis and Ulbrich, An inexact trust-region algorith for constrained problems in Hilbert space and its applicationto the adaptive solution of optimal control problems with PDEs, 2019.

General Constraints: Use augmented Lagrangian, log barrier, etc. to penalize generalnonlinear constraints such as

g(U,Z) = 0 and h(U,Z) ≤ 0.

Apply the adaptive rank algorithm to solve the penalized subproblems

minZ∈RNm

F(Z) + ℘(`)(Z).

Drew Kouri Sandia National Laboratories Sketching for Dynamic Optimization

Page 35: Randomized Sketching for Low-Memory Dynamic Optimization

22 Example Application Linear Parabolic Control

Let L be a uniformly bounded and coercive linear operator, B a bounded linear operatorand β a continuous linear functional. Consider the optimal control problem

minu, z

12

∫ T

0‖u− w‖2 dt +

α

2

∫ T

0‖z‖2 dt

subject to ∂tu + L(t)u = β(t) + Bzu(0) = u0.

Discretization1. Continuous finite elements in space and implicit Euler in time.2. Similar results hold for trapezoidal (Crank-Nicolson) and explicit Euler.

Stability Estimates: C1n, C2

n are positive constants that depend on the time step δtn

‖un − un‖ ≤ C1n

n∑i=1

‖ci(ui−1, ui, zi)‖ and ‖λn − λn‖ ≤ C2n

n∑i=1

‖ci(ui−1, ui, zi)‖.

Using these estimates, we can prove convergence of adaptive rank algorithm!Drew Kouri Sandia National Laboratories Sketching for Dynamic Optimization

Page 36: Randomized Sketching for Low-Memory Dynamic Optimization

23 Numerical Results Linear Parabolic Control

Let D = (0, 0.5)× (0, 0.2) with boundary ∂D and consider the optimal control problem

minu, z

12

∫ T

0

∫D|u− 1|2 dxdt +

α

2

∫ T

0

∫D|z|2 dxdt

subject to ∂tu− γ∆u + v · ∇u + u = β + z in (0,T)×Dγ∇u · n = 0 on (0,T)× ∂Du(0) = 0 in D,

where α = 10−4, γ = 0.1, v(x) = (7.5− 2.5x1, 2.5x2)>, and

β(x) =

1 if ‖x− (0.1, 0.1)>‖ ≤ 0.070 otherwise.

Spatial Discretization: Q1 finite elements on a uniform mesh of 60×20 quadrilaterals.

Temporal Discretization: Implicit Euler with 500 equal time steps.

Drew Kouri Sandia National Laboratories Sketching for Dynamic Optimization

Page 37: Randomized Sketching for Low-Memory Dynamic Optimization

24 Numerical Results Linear Parabolic Control

Uncontrolled

0 50 100

10 -10

10 -5

10 0

Sketch Error

Tail Energy

Optimal

0 50 100

10 -10

10 -5

10 0

Sketch Error

Tail Energy

Sketching error averaged over 20 realizations and tail energy for the uncontrolled state(left) and the optimal state (right). Recall that the rank of the sketch is k = 2r + 1.

Drew Kouri Sandia National Laboratories Sketching for Dynamic Optimization

Page 38: Randomized Sketching for Low-Memory Dynamic Optimization

25 Numerical Results Linear Parabolic Control

Termination Criterion: ‖g(`)‖ ≤ 10−7 or ` > 20.

Rank Increase Function: r← maxr + 2, d(b− log τ)/ae with a = 2.6125, b = 2.4841.

rank objective iter nstate nadjoint iterCG compression∗1 5.544040e-4 20 21 11 196 118.792 5.528490e-4 5 6 6 151 70.963 5.528490e-4 4 5 5 78 50.464 5.528490e-4 4 5 5 67 38.085 5.528490e-4 4 5 5 59 31.83

Adaptive 5.528490e-4 4 8 5 65 23.14Full 5.528490e-4 4 5 5 53 1.00

∗The rank 1 experiment exceeded the maximum number of iterations.

Drew Kouri Sandia National Laboratories Sketching for Dynamic Optimization

Page 39: Randomized Sketching for Low-Memory Dynamic Optimization

26 Numerical Results Linear Parabolic Control

iter value gnorm snorm delta iterCG rank rnorm

Adaptive 0 5.446e-2 5.990e-3 --- 1.000e+1 --- 1 2.707e-4

1 1.375e-2 2.205e-3 1.000e+1 2.500e+1 1 1 1.990e-42 1.475e-3 1.408e-4 2.499e+1 6.250e+1 5 5 6.700e-73 5.531e-4 6.431e-7 4.077e+1 1.563e+1 27 7 3.893e-94 5.528e-4 5.039e-9 1.059e+0 3.906e+2 32 7 1.508e-9

Full

0 5.446e-2 5.989e-3 --- 1.000e+1 --- --- ---1 1.375e-2 2.201e-3 1.000e+1 2.500e+1 1 --- ---2 1.472e-3 1.401e-4 2.500e+1 6.250e+1 5 --- ---3 5.538e-4 1.361e-6 4.051e+1 1.563e+1 19 --- ---4 5.528e-4 8.416e-9 2.178e+0 3.906e+2 28 --- ---

Adpative-rank algorithm reduced memory requirement 23.14-fold!

Drew Kouri Sandia National Laboratories Sketching for Dynamic Optimization

Page 40: Randomized Sketching for Low-Memory Dynamic Optimization

27 Numerical Results Flow Control

We consider the optimal flow control problem

minz

∫ T

0

∫∂C

(1

Re∂v∂n− pn

)· (zτ − v∞) dx +

α

2z(t)2

dt,

where the velocity/pressure (v, p) : [0,T]×D→ R2×R solves the Navier-Stokes equations

∂v∂t− 1

Re∆v + (v · ∇)v +∇p = 0 in (0,T)×D \ C

∇ · v = 0 in (0,T)×D \ C1

Re∂v∂n− pn = 0 on (0,T)× Γout

v = v∞ on (0,T)× ∂D \ Γout

v = zτ on (0,T)× ∂C

Goal: Minimize the power required to overcome the drag on C by rotating C.

Drew Kouri Sandia National Laboratories Sketching for Dynamic Optimization

Page 41: Randomized Sketching for Low-Memory Dynamic Optimization

28 Numerical Results Flow Control

v = (0, 1)>

v2 = 0

v2 = 0

Γout

Problem Data: D = (−15, 45)× (−15, 15), C = B1/2, Re = 200, T = 20, and tangent vector

τ(x) =

(0 1−1 0

)(x− x0), x ∈ ∂C.

Spatial Discretization: Q2–Q1 finite elements.

Temporal Discretization: Implicit Euler with 800 equal time steps.Drew Kouri Sandia National Laboratories Sketching for Dynamic Optimization

Page 42: Randomized Sketching for Low-Memory Dynamic Optimization

29 Numerical Results Flow Control

Initial Vorticity (t = 0)

Controlled Vorticity (t = 20)

Initial Velocity (t = 0)

Controlled Velocity (t = 20)

Drew Kouri Sandia National Laboratories Sketching for Dynamic Optimization

Page 43: Randomized Sketching for Low-Memory Dynamic Optimization

30 Numerical Results Flow Control

Uncontrolled

0 50 100 150 200

10 -10

10 -5

10 0

10 5

Sketch Error

Tail Energy

Optimal

0 50 100 150 200

10 -10

10 -5

10 0

10 5

Sketch Error

Tail Energy

Sketching error averaged over 20 realizations and tail energy for the uncontrolled state(left) and the optimal state (right). Recall that the rank of the sketch is k = 2r + 1.

Drew Kouri Sandia National Laboratories Sketching for Dynamic Optimization

Page 44: Randomized Sketching for Low-Memory Dynamic Optimization

31 Numerical Results Flow Control

Termination Criterion: ‖g(`)‖ ≤ 10−5 or ` > 40.

Rank Increase Function: r← 2r.

rank objective iter nstate nadjoint iterCG compression∗ 8 18.35919 40 41 15 136 45.44∗16 18.20003 40 41 33 897 23.35∗32 18.19779 40 41 31 236 11.8064 18.19779 29 41 34 110 5.88

Adaptive 18.19779 23 27 24 121 5.88Full 18.19779 29 30 24 107 ---

∗The rank 8, 16, and 32 experiments exceeded the maximum number of iterations.

Drew Kouri Sandia National Laboratories Sketching for Dynamic Optimization

Page 45: Randomized Sketching for Low-Memory Dynamic Optimization

32 Numerical Results Flow Control

0 5 10 15 20

10

20

30

40

50

60

70

0 5 10 15 2010

-6

10-4

10-2

100

Tolerance

Residual Norm

Left: Sketch rank as a function of iteration of adaptive-rank algorithm. Right: Requiredgradient inexactness tolerance and computed residual norm as functions of iteration.

Adpative-rank algorithm reduced memory requirement 5.88-fold!

Drew Kouri Sandia National Laboratories Sketching for Dynamic Optimization

Page 46: Randomized Sketching for Low-Memory Dynamic Optimization

33 Conclusions

I Numerical solution of dynamic optimization is expensive and requires carefulmemory management.

I We compress the dynamic state using randomized matrix sketching.

I State compression using sketching is lossy and results in inexact gradients.

I Introduced provably convergent algorithm to handle gradient inexactness.Algorithm adatively learns the rank of the optimal state — May run out of memoryfor non-low-rank states!

I Numerical examples suggest ∼5–70-fold reduction in memory requirement.

R. Muthukumar, D. P. Kouri and M. Udell, Randomized sketching Algorithms for low-memory dynamicoptimization, 2019. http://www.optimization-online.org/DB_FILE/2019/11/7475.pdf

Drew Kouri Sandia National Laboratories Sketching for Dynamic Optimization