Lecture 5 - Pennsylvania State University · Uday V. Shanbhag Lecture 5 A function fis said to be...

42
Lecture 5 Two-stage stochastic convex programs February 4, 2015

Transcript of Lecture 5 - Pennsylvania State University · Uday V. Shanbhag Lecture 5 A function fis said to be...

Page 1: Lecture 5 - Pennsylvania State University · Uday V. Shanbhag Lecture 5 A function fis said to be lsc if it is lsc at every x0 2Rn andlsc fis the largest lower semicontinuous function

Lecture 5

Two-stage stochastic convex programs

February 4, 2015

Page 2: Lecture 5 - Pennsylvania State University · Uday V. Shanbhag Lecture 5 A function fis said to be lsc if it is lsc at every x0 2Rn andlsc fis the largest lower semicontinuous function

Uday V. Shanbhag Lecture 5

Convex two-stage problems

• Consider a convex two-stage stochastic program of the form

SP minimizex

f(x) , E[f(x, ω)]

subject to x ∈ X,

where f(x, ω) is the optimal value of the second-stage problem

SecStage(ω) minimizey∈Y

q(y, ω)

subject to gi(y, ω) + χi ≤ 0, i = 1, . . . ,m

and χi = ti(x, ω).

•We assume throughout this section that q(y, ω), gi(y, ω), and ti(x, ω)

Stochastic Optimization 1

Page 3: Lecture 5 - Pennsylvania State University · Uday V. Shanbhag Lecture 5 A function fis said to be lsc if it is lsc at every x0 2Rn andlsc fis the largest lower semicontinuous function

Uday V. Shanbhag Lecture 5

are real-valued convex functions∗ for a.e. ω, and both X and Y are

convex sets.

• The second-stage constraints can be absorbed into the objective func-

tions by defining a suitable indicator function. The resulting second-stage

problem is given by the following:

SecStage(ω) minimizey∈Rm

q(y, χ, ω)

wwhere q(y, χ, ω) = q(y, χ, ω) + 1lY (y).

•We refer to the optimal value of the second stage problem by θ(χ, ω).

∗Recall that real-valued convex functions are continuous (in fact they are locally Lipschitz continuous).

Stochastic Optimization 2

Page 4: Lecture 5 - Pennsylvania State University · Uday V. Shanbhag Lecture 5 A function fis said to be lsc if it is lsc at every x0 2Rn andlsc fis the largest lower semicontinuous function

Uday V. Shanbhag Lecture 5

Conjugate duality: An introduction

•We now provide a brief aside regarding conjugate duality (Fenchel, 1951)

based on Veinott (1989).

• Consider the primal problem that requires choosing an x ∈ Rn that

minimizes c(x, p) where c is an extended real-valued convex function on

Rm+n and p ∈ Rm. Let C(q) be defined as follows for q ∈ Rm:

C(q) , infxc(x, q). (Primal)

It is well known that C is convex.

• The dual program requires choosing a (π, µ) ∈ Rm+1 such that the

linear function πTp− µ is maximized subject to the inequalities

πTq − µ ≤ C(q), ∀q ∈ Rm.

Stochastic Optimization 3

Page 5: Lecture 5 - Pennsylvania State University · Uday V. Shanbhag Lecture 5 A function fis said to be lsc if it is lsc at every x0 2Rn andlsc fis the largest lower semicontinuous function

Uday V. Shanbhag Lecture 5

• More specifically, for a given p, we have the following problem:

maxπ,µ

πTp− µ

πTq − µ ≤ C(q), ∀q ∈ Rm.(1)

• Geometric insight: Choosing the affine function πTp − µ that minorize

C(p), the one whose value πTp− µ is maximized.

• In fact, C is closed at p if and only if the primal infimum is equal to the

dual supremum (the main duality relationship).

• Now consider Rockafellar’s conjugate duality relationship:

• For each fixed π, one can maximize πTp−µ subject to πTq−µ ≤ C(q)

by putting µ = C∗(π), where

C∗(π) = supq

[qTπ − C(q)

].

Stochastic Optimization 4

Page 6: Lecture 5 - Pennsylvania State University · Uday V. Shanbhag Lecture 5 A function fis said to be lsc if it is lsc at every x0 2Rn andlsc fis the largest lower semicontinuous function

Uday V. Shanbhag Lecture 5

• The dual program then reduces to the following:

C∗∗(p) = supπ∈Rm

[pTπ − C∗(π)

].

In effect, the supremum is given by the conjugate of C∗. Furthermore,

C(p) = C∗∗(p) if and only if C is closed at p.

• Let us now return to our original problem given by (Primal).

• By Rockafellar (1968, 1970), the conjugate of c is denoted by c∗, we

have that C∗(π) = c∗(0, π) based on noting that

C∗(π) = supq∈Rm

[πTq − C(q)

]= sup

x∈Rn,q∈Rm

[0Tx+ πTq − c(x, q)

]= c∗(0, π).

Stochastic Optimization 5

Page 7: Lecture 5 - Pennsylvania State University · Uday V. Shanbhag Lecture 5 A function fis said to be lsc if it is lsc at every x0 2Rn andlsc fis the largest lower semicontinuous function

Uday V. Shanbhag Lecture 5

• It follows from C(p) = C∗ ∗ (p) and by putting p = 0, we have that

infxc(x,0) = C(0) = C∗∗(0) = sup

π

[0Tπ − C∗(π)

]= sup

π−c∗(0, π).

Stochastic Optimization 6

Page 8: Lecture 5 - Pennsylvania State University · Uday V. Shanbhag Lecture 5 A function fis said to be lsc if it is lsc at every x0 2Rn andlsc fis the largest lower semicontinuous function

Uday V. Shanbhag Lecture 5

664 ARTHUR F. VEINOTT, JR.

PRIMAL PROGRAM

Consider the primal program of choosing x E !H fl that minimizes

c(x, P>>

where c is a + w or real-valued convex function on $3 n +“’ and p E 3 n’. Let C be the projection of c, i.e.,

C(9) = i;fc(x,y) for 9 E S”‘.

It is well known that C is convex.

DUAL PROGRAM

The dual program is that of choosing (7~, p) E !B n’+l to maximize the linear function

CT> P> -lJ (1)

subject to the linear inequalities

(77>9)-pGCc(9) for all 9 E % I”, (2)

where (. , . > is the usual inner product on 3 nr. The geometric interpretation of the problem, illustrated in Figure 1, is that of choosing, from among all affine functions (v, .) - p minorizing C( .), one whose value at p is maxi- mum.

FIG. 1 Figure 1: A schematic

Stochastic Optimization 7

Page 9: Lecture 5 - Pennsylvania State University · Uday V. Shanbhag Lecture 5 A function fis said to be lsc if it is lsc at every x0 2Rn andlsc fis the largest lower semicontinuous function

Uday V. Shanbhag Lecture 5

Application of conjugate duality to linear programming

Consider the standard form LP:

minx

cTx

subject to Ax = b

x ≥ 0.

(LP)

Consider the parametrized form of this LP:

minx

cTx

subject to Ax = b+ q

x ≥ 0.

(LP)

Note that when q ≡ 0, we recover the original form.

Stochastic Optimization 8

Page 10: Lecture 5 - Pennsylvania State University · Uday V. Shanbhag Lecture 5 A function fis said to be lsc if it is lsc at every x0 2Rn andlsc fis the largest lower semicontinuous function

Uday V. Shanbhag Lecture 5

We define c(x, q) of our primal problem as follows:

c(x, q) ,

cTx if Ax = b+ q, x ≥ 0

+∞ otherwise .

Then the original LP corresponds to the primal problem:

infxc(x,0).

Stochastic Optimization 9

Page 11: Lecture 5 - Pennsylvania State University · Uday V. Shanbhag Lecture 5 A function fis said to be lsc if it is lsc at every x0 2Rn andlsc fis the largest lower semicontinuous function

Uday V. Shanbhag Lecture 5

By the conjugate duality framework, the dual problem is given by

c∗(q, π) = supx

[qTπ − c(x, q)

]= sup

x:x≥0,Ax=b+q

[πTq − cTx

]= sup

x:x≥0

[πT(Ax− b)− cTx

]= sup

x:x≥0

[(ATπ − c)Tx

]− πTb.

Next we note that

supx:x≥0

[(ATπ − c)Tx

]− πTb =

−πTb, ATπ ≤ c

+∞. Otherwise.

Stochastic Optimization 10

Page 12: Lecture 5 - Pennsylvania State University · Uday V. Shanbhag Lecture 5 A function fis said to be lsc if it is lsc at every x0 2Rn andlsc fis the largest lower semicontinuous function

Uday V. Shanbhag Lecture 5

But this is the dual problem

minπ

− bTπ

subject to ATπ ≤ c.(2)

This problem is equivalent to the following:

maxπ

bTπ

subject to ATπ ≤ c.(Dual)

Stochastic Optimization 11

Page 13: Lecture 5 - Pennsylvania State University · Uday V. Shanbhag Lecture 5 A function fis said to be lsc if it is lsc at every x0 2Rn andlsc fis the largest lower semicontinuous function

Uday V. Shanbhag Lecture 5

Return to convex two-stage problems

• Consider a convex two-stage stochastic program of the form

SP minimizex

f(x) , E[f(x, ω)]

subject to x ∈ X,

where f(x, ω) is the optimal value of the second-stage problem

SecStage(ω) minimizey∈Y

q(y, ω)

subject to gi(y, ω) + χi(ω) ≤ 0, i = 1, . . . ,m.

where χi(ω) = ti(x, ω).

•We assume throughout this section that q(y, ω), gi(y, ω), and ti(x, ω)

Stochastic Optimization 12

Page 14: Lecture 5 - Pennsylvania State University · Uday V. Shanbhag Lecture 5 A function fis said to be lsc if it is lsc at every x0 2Rn andlsc fis the largest lower semicontinuous function

Uday V. Shanbhag Lecture 5

are real-valued convex functions† for a.e. ω, X,Y are convex sets.

• Suppose ψ(y, χ, ω) is defined as

ψ(y, χ, ω) , q(y, ω) + 1lRn−G(y, ω) + χ(x, ω)),

where

q(y, ω) , q(y, ω) + 1lY (y),

G(y, ω) , (g1(y, ω); . . . ; gm(y;ω), and 1lRn−(•) is the indicator func-

tion for the negative orthant or

1lRn−(z) =

0, z ≤ 0

+∞. otherwise

†Recall that real-valued convex functions are continuous (in fact they are locally Lipschitz continuous).

Stochastic Optimization 13

Page 15: Lecture 5 - Pennsylvania State University · Uday V. Shanbhag Lecture 5 A function fis said to be lsc if it is lsc at every x0 2Rn andlsc fis the largest lower semicontinuous function

Uday V. Shanbhag Lecture 5

• In the remainder of this discussion, we suppress ω are refer to ψ(y, χ, ω)

by ψ(y, χ).

•We may now compute the conjugate function of ψ(χ, y):

ψ∗(y∗, χ∗)

= sup(y,χ)∈Rm×Rn

((χ∗)Tχ+ (y∗)Ty)− q(y, ω)− 1lRn−(G(y, ω) + χ)

= sup

(y,χ)∈Rm×Rn

((χ∗)T(G(y) + χ)− (χ∗)TG(y) + (y∗)Ty)

− q(y, ω)− 1lRn−(G(y, ω) + χ)

= supy∈Rm

(y∗)Ty − q(y, ω)− (χ∗)TG(y)

+ supχ∈Rn

[((χ∗)T(G(y) + χ)− 1lRn−(G(y, ω) + χ)

].

Stochastic Optimization 14

Page 16: Lecture 5 - Pennsylvania State University · Uday V. Shanbhag Lecture 5 A function fis said to be lsc if it is lsc at every x0 2Rn andlsc fis the largest lower semicontinuous function

Uday V. Shanbhag Lecture 5

• Suppose z = G(y) + χ, we have that

supχ∈Rn

[((χ∗)T(G(y) + χ)− 1lRn−(G(y, ω) + χ)

]= sup

z∈Rn

[(χ∗)Tz − 1lRn−(z)

]= sup

z∈Rn−

[(χ∗)Tz

]

=

0, χ∗ ≥ 0

+∞, otherwise.

= 1lRn+(χ∗), where 1lR+m

(u) =

0, u ≥ 0

+∞, otherwise .

Stochastic Optimization 15

Page 17: Lecture 5 - Pennsylvania State University · Uday V. Shanbhag Lecture 5 A function fis said to be lsc if it is lsc at every x0 2Rn andlsc fis the largest lower semicontinuous function

Uday V. Shanbhag Lecture 5

• Consequently, we obtain

ψ∗(χ∗, y∗, ω) = supy∈Rm

(y∗)Ty − L(y, χ∗)

+ 1lRm+(χ∗),

where L(y, χ∗) , q(y) +∑mi=1 χ

∗igi(y, ω).

• Recall that

θ∗(χ∗) = ψ∗(0, χ∗) = supy∈Rm

−L(y, χ∗)+ 1lR+m

(χ∗)

= − infy∈Rm

L(y, χ∗) + 1lR+m

(χ∗).

Stochastic Optimization 16

Page 18: Lecture 5 - Pennsylvania State University · Uday V. Shanbhag Lecture 5 A function fis said to be lsc if it is lsc at every x0 2Rn andlsc fis the largest lower semicontinuous function

Uday V. Shanbhag Lecture 5

As a result, the dual of the second-stage problem is given by

θ∗∗(χ) = maxλ∈Rm

λTχ− θ∗(λ)

= max

λ∈Rm

λTχ+ inf

y∈RmL(y, χ)− 1lRm+(λ)

= max

λ∈Rm+

λTχ+ inf

y∈RmL(y, λ)

.

• By the Fenchel-Moreau theorem, we have that either θ∗∗(•) = −∞ or

θ∗∗(y) = lsc(conv θ)(y), ∀y ∈ Rm.

• Recall that a function f is lower semicontinuous at x0 if

lim infx→x0

f(x) ≥ f(x0).

Stochastic Optimization 17

Page 19: Lecture 5 - Pennsylvania State University · Uday V. Shanbhag Lecture 5 A function fis said to be lsc if it is lsc at every x0 2Rn andlsc fis the largest lower semicontinuous function

Uday V. Shanbhag Lecture 5

A function f is said to be lsc if it is lsc at every x0 ∈ Rn and lsc f is

the largest lower semicontinuous function that is less than equal to f .

• Consequently, θ∗∗(y) ≤ θ(y) for any y ∈ Rm and there is said to be no

duality gap or θ∗∗(y) = θ(y).

• Consider a setting where ψ(x, y) is convex over (x, y) ∈ Rn × Rn.

It is relatively straightforward to ascertain that θ(y) is convex and

conv θ(•) = θ(•).

• Furthermore, it is said that (**) is subconsistent if for a given value

of y, lsc θ(y) < +∞.

• Note that if (**) is feasible or dom ψ(•, y) 6= ∅, then θ(y) < +∞and (**) is subconsistent.

Proposition 1 Suppose that ψ(•, •) is convex. Then the following

holds:

Stochastic Optimization 18

Page 20: Lecture 5 - Pennsylvania State University · Uday V. Shanbhag Lecture 5 A function fis said to be lsc if it is lsc at every x0 2Rn andlsc fis the largest lower semicontinuous function

Uday V. Shanbhag Lecture 5

1. The optimal value function θ(•) is convex;

2. If (**) is subconsistent, then θ∗∗(y) = θ(y) if and only if θ(•) is

lsc at y;

3. If θ∗∗(y) is finite, then the set of optimal solutions of the dual

problem (***) coincides with ∂θ∗∗(y);

4. The set of optimal solutions of (***) is nonempty and bounded if

and only if θ(y) is finite and θ(•) is continuous at y.

Remark: Some quick observations:

(2.) follows from the Fenchel-Moreau theorem;

(3.) follows from

∂f∗∗(x) = arg maxz∈Rn

zTx− f∗(z)

.

Stochastic Optimization 19

Page 21: Lecture 5 - Pennsylvania State University · Uday V. Shanbhag Lecture 5 A function fis said to be lsc if it is lsc at every x0 2Rn andlsc fis the largest lower semicontinuous function

Uday V. Shanbhag Lecture 5

If θ(•) is continuous at y then it is lsc at y and θ∗∗(y) = θ(y).

Moreover, it follows that ∂θ∗∗(y) = ∂θ(y) and is nonempty and

bounded if θ(y) is finite. But by hypothesis of (iii) (θ∗∗(y) is finite), we

have that the set of optimal solutions of (***) is bounded and nonempty.

Proposition 2 Let χ and ω ∈ Ω be specified. Suppose that the second-

stage problem is convex. Then the following holds:• The functions θ(•, ω) and f(•, ω) are convex• There is no duality gap between the primal and the dual problems and

the dual problem has a nonempty set of optimal solutions if and only if

the optimal value function θ(•, ω) is subdifferentiable at χ.• Suppose that the optimal value of (SLP) is finite. Then there is no duality

gap between the primal and dual solutions and the dual problem has a

nonempty and bounded solution set if and only if

χ(•, ω) ∈ int (dom θ(•, ω)).

Stochastic Optimization 20

Page 22: Lecture 5 - Pennsylvania State University · Uday V. Shanbhag Lecture 5 A function fis said to be lsc if it is lsc at every x0 2Rn andlsc fis the largest lower semicontinuous function

Uday V. Shanbhag Lecture 5

Nonanticipativity

Consider a two-stage stochastic program in which Ω = ω1, . . . , ωK with

probabilities p1, . . . , pK:

minx

K∑k=1

pkF (x, ωk) subject toxk ∈ X, k = 1, . . . ,K.

Consider a relaxation of the first-stage problem in which x is replaced by K

vectors, x1, . . . , xK for each scenario.

The resulting problem is given by

minxk

K∑k=1

pkF (xk, ωk) subject toxk ∈ X, k = 1, . . . ,K.

Stochastic Optimization 21

Page 23: Lecture 5 - Pennsylvania State University · Uday V. Shanbhag Lecture 5 A function fis said to be lsc if it is lsc at every x0 2Rn andlsc fis the largest lower semicontinuous function

Uday V. Shanbhag Lecture 5

This is a set of K separable problems, with the kth problem given by

minxk

F (xk, ωk) subject toxk ∈ X.

More specifically, in the context of two-stage linear programming, this

problem is given by

minxk≥0,yk≥0

cTxk + qTk yk

subject toAxk = b,

Tkxk +Wkyk = hk.

(3)

However, this formulation is not suitable for modeling a two-stage process

because the first-stage variable xk is not dependent on the realization ωk.

Stochastic Optimization 22

Page 24: Lecture 5 - Pennsylvania State University · Uday V. Shanbhag Lecture 5 A function fis said to be lsc if it is lsc at every x0 2Rn andlsc fis the largest lower semicontinuous function

Uday V. Shanbhag Lecture 5

This can be resolved by adding an additional constraint:

(x1, . . . , xK) ∈ L,

where

L , x = (x1, . . . , xK) : x1 = x2 = . . . = xK.

This is a linear subspace of the nK dimensional space X = Rn× . . .×Rn.

Decisions lying in this set are NOT dependent on the realization of random

data.

This constraint is referred to as the nonanticipativity constraint and together

with this constraint, the two-stage problem can be posed as follows:

minx

K∑k=1

pkF (x, ωk) subject tox1 = . . . = xK, xk ∈ X, k = 1, . . . ,K.

Stochastic Optimization 23

Page 25: Lecture 5 - Pennsylvania State University · Uday V. Shanbhag Lecture 5 A function fis said to be lsc if it is lsc at every x0 2Rn andlsc fis the largest lower semicontinuous function

Uday V. Shanbhag Lecture 5

Another approach for specifying the non-anticipativity constraints is as

follows:x1 = x2

x2 = x3

...

xK−1 = xK

Suppose the nonanticipativity constraints are represented as

xk =K∑i=1

pixi, i = 1, . . . ,K.

Such a representation has particular relevance when contending with general,

rather than finite, distributions.

Stochastic Optimization 24

Page 26: Lecture 5 - Pennsylvania State University · Uday V. Shanbhag Lecture 5 A function fis said to be lsc if it is lsc at every x0 2Rn andlsc fis the largest lower semicontinuous function

Uday V. Shanbhag Lecture 5

Consider the space X equipped with the scalar product

〈x,y〉 :=K∑i=1

pixTi yi.

Furthermore, suppose the linear operator P is defined as

Px :=

∑Ki=1 pixi

...∑Ki=1 pixi

.

Consequently, we have that

xk =K∑i=1

pixi, i = 1, . . . ,K.

Stochastic Optimization 25

Page 27: Lecture 5 - Pennsylvania State University · Uday V. Shanbhag Lecture 5 A function fis said to be lsc if it is lsc at every x0 2Rn andlsc fis the largest lower semicontinuous function

Uday V. Shanbhag Lecture 5

can be compactly represented as x = Px. In fact, P is an orthogonal

projection operator on X in that

P(Px) = Px.

Furthermore, we have that

〈Px,y〉 =

K∑i=1

pixi

T K∑i=1

piyi

= 〈x,Py〉.

Stochastic Optimization 26

Page 28: Lecture 5 - Pennsylvania State University · Uday V. Shanbhag Lecture 5 A function fis said to be lsc if it is lsc at every x0 2Rn andlsc fis the largest lower semicontinuous function

Uday V. Shanbhag Lecture 5

Dualization of nonanticipativity constraints

Suppose we assign Lagrange multipliers λ1, . . . , λK to the nonanticipativity

constraints:

xk =K∑i=1

pixi, i = 1, . . . ,K.

We may then define the Lagrangian function L(x, λ) as follows:

L(x, λ) :=K∑k=1

pkF (xk, ωk) +K∑k=1

pkλTk (xk −

K∑i=1

pixi).

Since P is an orthogonal projection, it follows that I−P is also an

orthogonal projection.

Stochastic Optimization 27

Page 29: Lecture 5 - Pennsylvania State University · Uday V. Shanbhag Lecture 5 A function fis said to be lsc if it is lsc at every x0 2Rn andlsc fis the largest lower semicontinuous function

Uday V. Shanbhag Lecture 5

This is a consequence of noting that

(I−P)(I−P)x = (I−P)x−Px + P(Px)

= (I−P)x−Px + Px

= (I−P)x.

It follows that

K∑k=1

pkλTk (xk −

K∑i=1

pixi) = 〈λ, (I−P)x〉 = 〈(I−P)λ,x〉.

As a consequence, the Lagrangian function can be rewritten as follows:

L(x, λ) :=K∑k=1

pkF (xk, ωk) +K∑k=1

pk(λk −K∑i=1

piλi)Txk.

Stochastic Optimization 28

Page 30: Lecture 5 - Pennsylvania State University · Uday V. Shanbhag Lecture 5 A function fis said to be lsc if it is lsc at every x0 2Rn andlsc fis the largest lower semicontinuous function

Uday V. Shanbhag Lecture 5

Duality

Consider the optimization problem

minx1,...,xK ,z

K∑k=1

pkF (xk, ωk) subject to xk = z, xk ∈ X, k = 1, . . . ,K.

By using an indicator function, we can write this problem as follows:

minx1,...,xK ,z

K∑k=1

pkF (xk, ωk) subject to xk = z, xk ∈ X, k = 1, . . . ,K.

Stochastic Optimization 29

Page 31: Lecture 5 - Pennsylvania State University · Uday V. Shanbhag Lecture 5 A function fis said to be lsc if it is lsc at every x0 2Rn andlsc fis the largest lower semicontinuous function

Uday V. Shanbhag Lecture 5

Then the Lagrangian function is given by

L(x1, . . . , xK, z, λ1, . . . , λK) :=K∑k=1

pkFk(xk, ωk) +K∑k=1

pkλTk (xk − z).

The resulting min-max problem is given by

minx1,...,xK ,z

sup

λ1,...,λK

L(x1, . . . , xK, z, λ1, . . . , λK)

= minx1,...,xK ,z

supλ1,...,λK

K∑k=1

pkFk(xk, ωk) +K∑k=1

pkλTkxk − (

K∑k=1

pkλk)Tz

= sup

λ1,...,λK

minx1,...,xK ,z

K∑k=1

pkFk(xk, ωk) +K∑k=1

pkλTkxk − (

K∑k=1

pkλk)Tz

.

Stochastic Optimization 30

Page 32: Lecture 5 - Pennsylvania State University · Uday V. Shanbhag Lecture 5 A function fis said to be lsc if it is lsc at every x0 2Rn andlsc fis the largest lower semicontinuous function

Uday V. Shanbhag Lecture 5

However, the infimum of the Lagrangian is −∞ unless∑Kk=1 pkλk = 0. It

follows that this problem can then be stated as follows:

maxλ1,...,λK

D(λ) where D(λ) , minx1,...,xK ,z

K∑k=1

pkFk(xk, ωk) +K∑k=1

pkλTkxk

subject to

K∑k=1

pkλk = 0.

(4)

From the separable structure of the problem, we have that

L(x, λ) =K∑k=1

pkLk(xk, λk), where Lk(xk, λk) = F (xk, ωk) + λTkxk.

Stochastic Optimization 31

Page 33: Lecture 5 - Pennsylvania State University · Uday V. Shanbhag Lecture 5 A function fis said to be lsc if it is lsc at every x0 2Rn andlsc fis the largest lower semicontinuous function

Uday V. Shanbhag Lecture 5

Furthermore, we have that D(λ) =∑Kk=1 pkDk(λk),

where Dk(λk) = infxkLk(xk, λk).

• Suppose the problem is linear and the primal and dual problems are

feasible. Then there is no duality gap.

Stochastic Optimization 32

Page 34: Lecture 5 - Pennsylvania State University · Uday V. Shanbhag Lecture 5 A function fis said to be lsc if it is lsc at every x0 2Rn andlsc fis the largest lower semicontinuous function

Uday V. Shanbhag Lecture 5

Duality for general distributions

• Consider the optimization problem given by the following:

minx∈Rn

E[F (x, ω)],

where

F (x, ω) = F (x, ω) + 1lX(x).

• Let X be a linear space of measurable mappings from Ω→ Rn and be

defined as Lp(Ω,F ,P;Rn) for p ∈ [1,+∞]. Consequently, for every

x ∈ X , the expectation E[F (x, ω)] is well-defined.

• Consequently, we may articulate the expected value problem as follows:

minx∈Lx

E [F (x(ω), ω)],

Stochastic Optimization 33

Page 35: Lecture 5 - Pennsylvania State University · Uday V. Shanbhag Lecture 5 A function fis said to be lsc if it is lsc at every x0 2Rn andlsc fis the largest lower semicontinuous function

Uday V. Shanbhag Lecture 5

where

Lx , x ∈ X : x(ω) ≡ x for some x ∈ Rn

and x(ω) ≡ x implies that x(ω) = x for a.e. ω ∈ Ω.

• Consider the dual space of X , denoted by X ∗, and defined as X ∗ :=

Lq(Ω,F ,P;Rn), where 1/p + 1/q = 1. Note that by convention, if

p =∞, q = 1 and if q =∞, p = 1.

•We may now define the scalar or bilinear product given by

〈λ,x〉 = E [λTx] =∫

Ωλ(ω)Tx(ω)dP (ω), λ ∈ X ∗, x ∈ X .

• Further, consider the projection operator P : X → Lx defined as

[Px](ω) ≡ E[x].

By the definition of Lx, we have Lx = x : x ∈ X ,Px = x.

Stochastic Optimization 34

Page 36: Lecture 5 - Pennsylvania State University · Uday V. Shanbhag Lecture 5 A function fis said to be lsc if it is lsc at every x0 2Rn andlsc fis the largest lower semicontinuous function

Uday V. Shanbhag Lecture 5

• Recall that the inner product is defined as

〈λ,Px〉 = E [λTPx].

But Px = E[x]. Consequently, we have that

E [λTPx] = E[λ]TE[x] = 〈λ,Px〉 = 〈P∗λ,x〉,

where P∗ is a projection operator from X ∗ to a subspace formed by

constant a.e. maps.‡

• It follows that

L(x, λ) := E [F (x(ω), ω)] + E [λT(x− E[x])].‡Note that if p = 2, then X∗ = X and P∗ = P.

Stochastic Optimization 35

Page 37: Lecture 5 - Pennsylvania State University · Uday V. Shanbhag Lecture 5 A function fis said to be lsc if it is lsc at every x0 2Rn andlsc fis the largest lower semicontinuous function

Uday V. Shanbhag Lecture 5

• It can be observed that the second term can be rewritten as follows:

E [λT(x− E[x])] = 〈λ,x−Px〉 = 〈λ,x〉 − 〈λ,Px〉= 〈λ,x−Px〉 = 〈λ,x〉 − 〈P∗λ,x〉= 〈λ−P∗λ,x〉.

• Important observation:

λ+ u−P∗(λ+ u) = λ+ u− (P∗λ+ u) = λ−P∗λ,

where u is a constant map. Note that P∗u = u since P∗ is an operator

that projects onto the space of constant maps (a.e.).

• Consequently, λ − P∗λ does not change by adding a constant to λ(.).

It follows that we can subtract a constant P∗λ from λ.

Stochastic Optimization 36

Page 38: Lecture 5 - Pennsylvania State University · Uday V. Shanbhag Lecture 5 A function fis said to be lsc if it is lsc at every x0 2Rn andlsc fis the largest lower semicontinuous function

Uday V. Shanbhag Lecture 5

• It follows that we can set P∗λ = 0 or E[λ] = 0. As a result, the

Lagrangian function is defined as follows:

L(x, λ) := E [F (x(ω), ω) + λ(ω)T(x(ω))], for E[λ] = 0.

•We may now articulate the dual problem:

maxλ∈X ∗

D(λ) := inf

x∈XL(x, λ)

subject to E[λ] = 0.

• By the interchangeability principle§, we have that the following holds:

infx∈X

E[F (x(ω), ω) + λ(ω)Tx(ω)] = E[

infx∈Rn

(F (x, ω) + λ(ω)Tx

)].

§See Theorem 7.80: basically this provides conditions under which

E[

infxf(x, ω)

]= infχ∈M

E[Fχ], where Fχ(ω) = f(χ(ω), ω).

Stochastic Optimization 37

Page 39: Lecture 5 - Pennsylvania State University · Uday V. Shanbhag Lecture 5 A function fis said to be lsc if it is lsc at every x0 2Rn andlsc fis the largest lower semicontinuous function

Uday V. Shanbhag Lecture 5

• Consequently, D(λ) can be expressed as

D(λ) = E[Dω(λ(ω))], where Dω : Rn → R

is defined as

Dω(λ) := infx∈Rn

(λTx+Fω(x)) = − supx∈Rn

(−λTx−Fω(x)) , −F ∗ω(−λ).

• As a result, the dual function can be computed by solving Dω for every

ω and taking the Expectation over the optimal values

• From general theory, we have that the dual optimal value is less than

or equal to that of the primal problem. Furthermore, there is no duality

gap between these problems and both the primal and dual have optimal

Stochastic Optimization 38

Page 40: Lecture 5 - Pennsylvania State University · Uday V. Shanbhag Lecture 5 A function fis said to be lsc if it is lsc at every x0 2Rn andlsc fis the largest lower semicontinuous function

Uday V. Shanbhag Lecture 5

solutions x and λ if and only if (x, λ) is a saddle-point of the Lagrangian

function or

x ∈ arg minx∈Lx

L(x, λ) and λ ∈ arg maxλ:E[λ]=0

L(x, λ).

• By the interchangeability principle, we have that

x(ω) ≡ x and x ∈ arg minx∈Rn

F (x, ω) + λ(ω)Tx

, a.e. ω ∈ Ω.

Since x(ω) = x a.e., it follows that E[λ] = 0, a consequence of the

earlier result.

• Suppose we now impose a convexity requirement (and closedness re-

quirement) on X as well as a convexity assumption of Fω(.) for a.e.

Stochastic Optimization 39

Page 41: Lecture 5 - Pennsylvania State University · Uday V. Shanbhag Lecture 5 A function fis said to be lsc if it is lsc at every x0 2Rn andlsc fis the largest lower semicontinuous function

Uday V. Shanbhag Lecture 5

ω ∈ Ω. Consequently, Fω is also convex for a.e. ω ∈ Ω. Then, we have

that by the interchangeability principle, we have that

λ ∈ arg maxL(x, λ) if and only if λ(ω) ∈ −∂Fω(x).

To ensure feasibility with respect to E[λ] = 0, taking expectations on

both sides we have the following:

0 ∈ E[−∂Fω(x)].

However, under suitable regularity conditions, we can interchange E[.]

and ∂[.]. Furthermore, if 0 ∈ K if and only if 0 ∈ −K. It follows that

0 ∈ ∂[E[∂Fω(x)]].

Theorem 3 Suppose that the function F (x, ω) is random lower semi-

Stochastic Optimization 40

Page 42: Lecture 5 - Pennsylvania State University · Uday V. Shanbhag Lecture 5 A function fis said to be lsc if it is lsc at every x0 2Rn andlsc fis the largest lower semicontinuous function

Uday V. Shanbhag Lecture 5

continuous, the set X is convex and closed, and for a.e. ω ∈ Ω, the

function F (., ω) is convex. Suppose (P) and (D) are given by the

following:

minx∈Rn

E[F (x, ω)] (P)

maxλ∈X ∗

D(λ) := inf

x∈XL(x, λ)

subject to E[λ] = 0. (D)

Then there is no duality gap between (P) and (D) and both prob-

lems have an optimal solution if and only if there exists and x ∈ Rn

satisfying

0 ∈ ∂[E[∂Fω(x)]].

In such a case, x is a solution of (P) and λ(ω) is a measurable

selection such that

λ(ω) ∈ −∂Fω(x)

such that E[λ] = 0 is an optimal solution of (D).

Stochastic Optimization 41