Download - Subgradient Methods - Stanford University · Subgradient Methods †subgradientmethodandstepsizerules †convergenceresultsandproof †projectedsubgradientmethod †projectedsubgradientmethodfordual

Transcript

Subgradient Methods

• subgradient method and stepsize rules

• convergence results and proof

• projected subgradient method

• projected subgradient method for dual

• optimal network flow

Prof. S. Boyd, EE392o, Stanford University

Subgradient method

subgradient method is simple algorithm to minimize nondifferentiableconvex function f

x(k+1) = x(k) − αkg(k)

• x(k) is the kth iterate

• g(k) is any subgradient of f at x(k)

• αk > 0 is the kth step size

Prof. S. Boyd, EE392o, Stanford University 1

Step size rules

step sizes are fixed ahead of time

• constant step size: αk = h (constant)

• constant step length: αk = h/‖g(k)‖2 (so ‖x(k+1) − x(k)‖2 = h)

• square summable but not summable: step sizes satisfy

∞∑

k=1

α2k <∞,

∞∑

k=1

αk =∞

• nonsummable diminishing: step sizes satisfy

limk→∞

αk = 0,

∞∑

k=1

αk =∞

Prof. S. Boyd, EE392o, Stanford University 2

Convergence results

• we assume ‖g‖2 ≤ G for all g ∈ ∂f (equivalent to Lipschitz conditionon f), p? = infx f(x) > −∞

• define f (k)best = mini=0,...,k f(x

(i)), best value found in k iterations, and

f̄ = limk→∞ f(k)best

• constant step size: f̄ − p? ≤ G2h/2, i.e.,converges to G2h/2-suboptimal

(converges to p? if f differentiable, h small enough)

• constant step length: f̄ − p? ≤ Gh/2, i.e.,converges to Gh/2-suboptimal

• diminishing step size rule: f̄ = p?, i.e., converges

Prof. S. Boyd, EE392o, Stanford University 3

Page 5: Subgradient Methods - Stanford University · Subgradient Methods †subgradientmethodandstepsizerules †convergenceresultsandproof †projectedsubgradientmethod †projectedsubgradientmethodfordual

Convergence proof

key quantity: Euclidean distance to the optimal set, not the function value

let x? be any minimizer of f

‖x(k+1) − x?‖22 = ‖x(k) − αkg(k) − x?‖22

= ‖x(k) − x?‖22 − 2αkg(k)T (x(k) − x?) + α2

k‖g(k)‖22≤ ‖x(k) − x?‖22 − 2αk(f(x

(k))− p?) + α2k‖g(k)‖22

using p? = f(x?) ≥ f(x(k)) + g(k)T (x? − x(k))

Prof. S. Boyd, EE392o, Stanford University 4

Page 6: Subgradient Methods - Stanford University · Subgradient Methods †subgradientmethodandstepsizerules †convergenceresultsandproof †projectedsubgradientmethod †projectedsubgradientmethodfordual

applying recursively, and using ‖g(i)‖2 ≤ G, we get

‖x(k+1) − x?‖22 ≤ ‖x(1) − x?‖22 − 2

k∑

i=1

αi(f(x(i))− p?) +G2

k∑

i=1

α2i

now we use

k∑

i=1

αi(f(x(i))− p?) ≥ (f

(k)best − p?)

(

k∑

i=1

αi

)

to get

f(k)best − p? ≤ ‖x

(1) − x?‖22 +G2∑k

i=1α2i

2∑k

i=1αi

Prof. S. Boyd, EE392o, Stanford University 5

Page 7: Subgradient Methods - Stanford University · Subgradient Methods †subgradientmethodandstepsizerules †convergenceresultsandproof †projectedsubgradientmethod †projectedsubgradientmethodfordual

constant step size: for αk = h we get

f(k)best − p? ≤ ‖x

(1) − x?‖22 +G2kh2

2kh

righthand side converges to G2h/2 as k →∞

Prof. S. Boyd, EE392o, Stanford University 6

Page 8: Subgradient Methods - Stanford University · Subgradient Methods †subgradientmethodandstepsizerules †convergenceresultsandproof †projectedsubgradientmethod †projectedsubgradientmethodfordual

square summable but not summable step sizes:

suppose step sizes satisfy

∞∑

k=1

α2k <∞,

∞∑

k=1

αk =∞

then

f(k)best − p? ≤ ‖x

(1) − x?‖22 +G2∑k

i=1α2i

2∑k

i=1αi

as k →∞, numerator converges to a finite number, denominatorconverges to ∞, so f (k)

best → p?

Prof. S. Boyd, EE392o, Stanford University 7

Page 9: Subgradient Methods - Stanford University · Subgradient Methods †subgradientmethodandstepsizerules †convergenceresultsandproof †projectedsubgradientmethod †projectedsubgradientmethodfordual

Example: Piecewise linear minimization

minimize f(x) = maxi=1,...,m(aTi x+ bi)

to find a subgradient of f : find index j for which

aTj x+ bj = max

i=1,...,m(aT

i x+ bi)

and take g = aj

subgradient method: x(k+1) = x(k) − αkaj

Prof. S. Boyd, EE392o, Stanford University 8

Page 10: Subgradient Methods - Stanford University · Subgradient Methods †subgradientmethodandstepsizerules †convergenceresultsandproof †projectedsubgradientmethod †projectedsubgradientmethodfordual

problem instance with n = 10 variables, m = 100 terms

constant step length, h = 0.05, 0.02, 0.005

0 100 200 300 400 50010

−2

10−1

100

h = 0.05h = 0.02h = 0.005

PSfrag replacements

f(x

(k) )−p

Prof. S. Boyd, EE392o, Stanford University 9

Page 11: Subgradient Methods - Stanford University · Subgradient Methods †subgradientmethodandstepsizerules †convergenceresultsandproof †projectedsubgradientmethod †projectedsubgradientmethodfordual

f(k)best − p? and upper bound, constant step length h = 0.02

0 100 200 300 400 50010

−2

10−1

100

101

102

PSfrag replacements

boundandf

(k)

best−p

Prof. S. Boyd, EE392o, Stanford University 10

Page 12: Subgradient Methods - Stanford University · Subgradient Methods †subgradientmethodandstepsizerules †convergenceresultsandproof †projectedsubgradientmethod †projectedsubgradientmethodfordual

constant step size h = 0.05, 0.02, 0.005

0 100 200 300 400 50010

−2

10−1

100

h = 0.05h = 0.02h = 0.005

PSfrag replacements

f(x

(k) )−p

Prof. S. Boyd, EE392o, Stanford University 11

Page 13: Subgradient Methods - Stanford University · Subgradient Methods †subgradientmethodandstepsizerules †convergenceresultsandproof †projectedsubgradientmethod †projectedsubgradientmethodfordual

diminishing step rule α = 0.1/√k and square summable step size rule

α = 0.1/k.

0 50 100 150 200 25010

−2

10−1

100

a = 0.1a = 0.1, b = 0

PSfrag replacements

f(x

(k) )−p

Prof. S. Boyd, EE392o, Stanford University 12

Page 14: Subgradient Methods - Stanford University · Subgradient Methods †subgradientmethodandstepsizerules †convergenceresultsandproof †projectedsubgradientmethod †projectedsubgradientmethodfordual

f(k)best − p? and upper bound, diminishing step size rule α = 0.1/

√k

0 100 200 300 400 50010

−3

10−2

10−1

100

101

PSfrag replacements

boundandf

(k)

best−p

Prof. S. Boyd, EE392o, Stanford University 13

Page 15: Subgradient Methods - Stanford University · Subgradient Methods †subgradientmethodandstepsizerules †convergenceresultsandproof †projectedsubgradientmethod †projectedsubgradientmethodfordual

constant step length h = 0.02, diminishing step size rule α = 0.1/√k, and

square summable step rule α = 0.1/k

0 500 1000 1500 2000 2500 3000 350010

−3

10−2

10−1

100

h = 0.02a = 0.1a = 0.1, b = 0

PSfrag replacements

f(k

)best−p

Prof. S. Boyd, EE392o, Stanford University 14

Page 16: Subgradient Methods - Stanford University · Subgradient Methods †subgradientmethodandstepsizerules †convergenceresultsandproof †projectedsubgradientmethod †projectedsubgradientmethodfordual

Projected subgradient method

solves constrained optimization problem

minimize f(x)subject to x ∈ C,

where f : Rn → R, C ⊆ Rn are convex

projected subgradient method is given by

x(k+1) = P (x(k) − αkg(k)),

P is (Euclidean) projection on C, and g(k) ∈ ∂f(x(k))

Prof. S. Boyd, EE392o, Stanford University 15

Page 17: Subgradient Methods - Stanford University · Subgradient Methods †subgradientmethodandstepsizerules †convergenceresultsandproof †projectedsubgradientmethod †projectedsubgradientmethodfordual

same convergence results:

• for constep step size, converges to neighborhood of optimal(for f differentiable and h small enough, converges)

• for diminishing nonsummable step sizes, convergeskey idea: projection does not increase distance to x?

approximate projected subgradient: P only needs to satisfy

P (u) ∈ C, ‖P (u)− z‖2 ≤ ‖u− z‖2 for any z ∈ C

Prof. S. Boyd, EE392o, Stanford University 16

Page 18: Subgradient Methods - Stanford University · Subgradient Methods †subgradientmethodandstepsizerules †convergenceresultsandproof †projectedsubgradientmethod †projectedsubgradientmethodfordual

Projected subgradient for dual problem

(convex) primal:

minimize f0(x)subject to fi(x) ≤ 0, i = 1, . . . ,m

solve dual problemmaximize g(λ)subject to λ º 0

via projected subgradient method:

λ(k+1) =(

λ(k) − αkh)

+, h ∈ ∂(−g)(λ(k))

Prof. S. Boyd, EE392o, Stanford University 17

Page 19: Subgradient Methods - Stanford University · Subgradient Methods †subgradientmethodandstepsizerules †convergenceresultsandproof †projectedsubgradientmethod †projectedsubgradientmethodfordual

Subgradient of negative dual function

assume f0 is strictly convex, and denote, for λ º 0,

x∗(λ) = argminz

(f0(z) + λ1f1(z) + · · ·+ λmfm(z))

so g(λ) = f0(x∗(λ)) + λ1f1(x

∗(λ)) + · · ·+ λmfm(x∗(λ))

a subgradient of −g at λ is given by hi = −fi(x∗(λ))

projected subgradient method for dual:

x(k) = x∗(λ(k)), λ(k+1)i =

(

λ(k)i + αkfi(x

(k)))

Prof. S. Boyd, EE392o, Stanford University 18

Page 20: Subgradient Methods - Stanford University · Subgradient Methods †subgradientmethodandstepsizerules †convergenceresultsandproof †projectedsubgradientmethod †projectedsubgradientmethodfordual

note:

• primal iterates x(k) are not feasible, but become feasible in limit

• subgradient of −g directly gives the violation of the primal constraints

• dual function values g(λ(k)) converge to p?

Prof. S. Boyd, EE392o, Stanford University 19

Page 21: Subgradient Methods - Stanford University · Subgradient Methods †subgradientmethodandstepsizerules †convergenceresultsandproof †projectedsubgradientmethod †projectedsubgradientmethodfordual

Example: Optimal network flow

• connected directed graph with n links, p nodes• variable xj denotes the flow or traffic on arc j (can be < 0)

• given external source (or sink) flow si at node i, 1Ts = 0

• flow conservation: Ax = s, where A ∈ Rp×n is node incidence matrix

• φj : R→ R convex flow cost function for link j

optimal (single commodity) network flow problem:

minimize∑n

j=1 φj(xj)

subject to Ax = s

Prof. S. Boyd, EE392o, Stanford University 20

Page 22: Subgradient Methods - Stanford University · Subgradient Methods †subgradientmethodandstepsizerules †convergenceresultsandproof †projectedsubgradientmethod †projectedsubgradientmethodfordual

Dual network flow problem

Langrangian is

L(x, ν) =

n∑

j=1

φj(xj) + νT (s−Ax)

=n∑

j=1

(φj(xj)−∆νjxj) + νTs

• we interpret νi as potential at node i

• ∆νj denotes potential difference across link j

Prof. S. Boyd, EE392o, Stanford University 21

Page 23: Subgradient Methods - Stanford University · Subgradient Methods †subgradientmethodandstepsizerules †convergenceresultsandproof †projectedsubgradientmethod †projectedsubgradientmethodfordual

dual function is

q(ν) = infxL(x, ν) =

n∑

j=1

infxj

(φj(xj)−∆νjxj) + νTs

= −n∑

j=1

φ∗j(∆νj) + νTs

(φ∗j is conjugate function of φj)

dual network flow problem:

maximize q(ν) = −∑n

j=1 φ∗j(∆νj) + νTs

Prof. S. Boyd, EE392o, Stanford University 22

Page 24: Subgradient Methods - Stanford University · Subgradient Methods †subgradientmethodandstepsizerules †convergenceresultsandproof †projectedsubgradientmethod †projectedsubgradientmethodfordual

Optimal network flow via dual

assume φi strictly convex, and denote

x∗j(∆νj) = argminxj

(φj(xj)−∆νjxj)

if ν? is optimal solution of the dual network flow problem,

x?j = x∗j(∆ν

?j )

is optimal flow

Prof. S. Boyd, EE392o, Stanford University 23

Page 25: Subgradient Methods - Stanford University · Subgradient Methods †subgradientmethodandstepsizerules †convergenceresultsandproof †projectedsubgradientmethod †projectedsubgradientmethodfordual

Electrical network analogy

• electrical network with node incidence matrix A, nonlinear resistors inbranches

• variable xj is the current flow in branch j

• source si is external current injected at node i (must sum to zero)

• flow conservation equation Ax = s is Kirkhoff Current Law (KCL)

• dual variables are node potentials; ∆νj is jth branch voltage

• branch current-voltage characteristic is xj = x∗j(∆νj)

then, current and potentials in circuit are optimal flows and dual variables

Prof. S. Boyd, EE392o, Stanford University 24

Page 26: Subgradient Methods - Stanford University · Subgradient Methods †subgradientmethodandstepsizerules †convergenceresultsandproof †projectedsubgradientmethod †projectedsubgradientmethodfordual

Subgradient of negative dual function

a subgradient of the negative dual function −q at ν is

g = Ax∗(∆ν)− s

ith component is gi = aTi x

∗(∆ν)− si, which is flow excess at node i

Prof. S. Boyd, EE392o, Stanford University 25

Page 27: Subgradient Methods - Stanford University · Subgradient Methods †subgradientmethodandstepsizerules †convergenceresultsandproof †projectedsubgradientmethod †projectedsubgradientmethodfordual

Subgradient method for dual

subgradient method applied to dual can be expressed as:

xj := x∗j(∆νj)

gi := aTi x− si

νi := νi − αgi

interpretation:

• optimize each flow, given potential difference, without regard for flowconservation

• evaluate flow excess• update potentials to correct flow excesses

Prof. S. Boyd, EE392o, Stanford University 26

Page 28: Subgradient Methods - Stanford University · Subgradient Methods †subgradientmethodandstepsizerules †convergenceresultsandproof †projectedsubgradientmethod †projectedsubgradientmethodfordual

Example: Minimum queueing delay

flow cost function

φj(xj) =|xj|

cj − |xj|, domφj = (−cj, cj)

where cj > 0 are given link capacities

(φj(xj) gives expected waiting time in queue with exponential arrivals atrate xj, exponential service at rate cj)conjugate is

φ∗j(y) =

{

0 |y| ≤ 1/cj(

√

|cjy| − 1)2

, |y| > 1/cj

Prof. S. Boyd, EE392o, Stanford University 27

Page 29: Subgradient Methods - Stanford University · Subgradient Methods †subgradientmethodandstepsizerules †convergenceresultsandproof †projectedsubgradientmethod †projectedsubgradientmethodfordual

cost function φ(x) (left) and its conjugate φ∗(y) (right), c = 1

−1 −0.5 0 0.5 10

PSfrag replacements

φ(x)

yφ∗(y) −6 −4 −2 0 2 4 6

0.5

1.5

PSfrag replacements

xφ(x)

φ∗(y)

(note conjugate is differentiable)

Prof. S. Boyd, EE392o, Stanford University 28

Page 30: Subgradient Methods - Stanford University · Subgradient Methods †subgradientmethodandstepsizerules †convergenceresultsandproof †projectedsubgradientmethod †projectedsubgradientmethodfordual

x∗j(∆νj), for cj = 1

−10 −5 0 5 10−1

−0.5

0.5

PSfrag replacements

∆νj

gives flow as function of potential difference across link

Prof. S. Boyd, EE392o, Stanford University 29

Page 31: Subgradient Methods - Stanford University · Subgradient Methods †subgradientmethodandstepsizerules †convergenceresultsandproof †projectedsubgradientmethod †projectedsubgradientmethodfordual

A specific example

network with 5 nodes, 7 links, capacities cj = 1PSfrag replacements

Prof. S. Boyd, EE392o, Stanford University 30

Page 32: Subgradient Methods - Stanford University · Subgradient Methods †subgradientmethodandstepsizerules †convergenceresultsandproof †projectedsubgradientmethod †projectedsubgradientmethodfordual

Optimal flow

optimal flows shown as width of arrows; optimal dual variables shown innodes; potential differences shown on links

PSfrag replacements

4.74

4.90

3.18

2.45

0−0.16

1.56

−1.72

2.45

0.72

3.18

2.45

Prof. S. Boyd, EE392o, Stanford University 31

Page 33: Subgradient Methods - Stanford University · Subgradient Methods †subgradientmethodandstepsizerules †convergenceresultsandproof †projectedsubgradientmethod †projectedsubgradientmethodfordual

Convergence of dual function

constant stepsize rule, α = 0.1, 1, 2, 3

0 20 40 60 80 1000

0.5

1.5

2.5

PSfrag replacements

q(ν

(k) )

α = 0.1α = 1α = 2α = 3

for α = 1, 2, converges to p? = 2.48 in around about 40 iterations

Prof. S. Boyd, EE392o, Stanford University 32

Page 34: Subgradient Methods - Stanford University · Subgradient Methods †subgradientmethodandstepsizerules †convergenceresultsandproof †projectedsubgradientmethod †projectedsubgradientmethodfordual

Convergence of primal residual

0 20 40 60 80 1000

0.2

0.4

0.6

0.8

1.2

1.4

PSfrag replacements

‖Ax

(k)−s‖

α = 0.1α = 1α = 2α = 3

Prof. S. Boyd, EE392o, Stanford University 33

Page 35: Subgradient Methods - Stanford University · Subgradient Methods †subgradientmethodandstepsizerules †convergenceresultsandproof †projectedsubgradientmethod †projectedsubgradientmethodfordual

convergence of dual function, nonsummable diminishing stepsize rules

0 20 40 60 80 1000

0.5

1.5

2.5

PSfrag replacements

q(ν

(k) )

α = 1/kα = 5/kα = 1/

√k

α = 5/√k

Prof. S. Boyd, EE392o, Stanford University 34

Page 36: Subgradient Methods - Stanford University · Subgradient Methods †subgradientmethodandstepsizerules †convergenceresultsandproof †projectedsubgradientmethod †projectedsubgradientmethodfordual

convergence of primal residual, nonsummable diminishing stepsize rules

0 20 40 60 80 1000

0.2

0.4

0.6

0.8

1.2

1.4

PSfrag replacements

‖Ax

(k)−s‖

α = 1/kα = 5/kα = 1/

√k

α = 5/√k

Prof. S. Boyd, EE392o, Stanford University 35

Page 37: Subgradient Methods - Stanford University · Subgradient Methods †subgradientmethodandstepsizerules †convergenceresultsandproof †projectedsubgradientmethod †projectedsubgradientmethodfordual

Convergence of dual variables

ν(k) versus iteration number k, constant stepsize rule α = 2

0 20 40 60 80 1000

PSfrag replacements

ν(k

)

ν1ν2ν3ν4

(ν5 is fixed as zero)

Prof. S. Boyd, EE392o, Stanford University 36

Top Related

Complexity Analysis beyond Convex Optimization - Stanford University

Sensation & Perception. -Discussion Section- Session 2 – Methods Psychophysical Methods Electrophysiological Methods.

OUTLINE - Stanford University

Igor V. Moskalenko (Stanford) with S. Digel (SLAC) T. Porter (UCSC) O. Reimer (Stanford) O. Reimer (Stanford) A. W. Strong (MPE) A. W. Strong (MPE) Diffuse.

PDE Solvers for Fluid Flow - Stanford University

Hidden Markov Models - Stanford University

Educational Trip to Stanford - Berkeley

Color-Octet J/ψ Production at Low p · Color-Octet J/ψProduction at Low p⊥ Wai-Keung Tang∗ Stanford Linear Accelerator Center, Stanford University, Stanford, CA 94309 M. V¨anttinen†