Lecture 5 - Pennsylvania State University · Uday V. Shanbhag Lecture 5 A function fis said to be...
Transcript of Lecture 5 - Pennsylvania State University · Uday V. Shanbhag Lecture 5 A function fis said to be...
Lecture 5
Two-stage stochastic convex programs
February 4, 2015
Uday V. Shanbhag Lecture 5
Convex two-stage problems
• Consider a convex two-stage stochastic program of the form
SP minimizex
f(x) , E[f(x, ω)]
subject to x ∈ X,
where f(x, ω) is the optimal value of the second-stage problem
SecStage(ω) minimizey∈Y
q(y, ω)
subject to gi(y, ω) + χi ≤ 0, i = 1, . . . ,m
and χi = ti(x, ω).
•We assume throughout this section that q(y, ω), gi(y, ω), and ti(x, ω)
Stochastic Optimization 1
Uday V. Shanbhag Lecture 5
are real-valued convex functions∗ for a.e. ω, and both X and Y are
convex sets.
• The second-stage constraints can be absorbed into the objective func-
tions by defining a suitable indicator function. The resulting second-stage
problem is given by the following:
SecStage(ω) minimizey∈Rm
q(y, χ, ω)
wwhere q(y, χ, ω) = q(y, χ, ω) + 1lY (y).
•We refer to the optimal value of the second stage problem by θ(χ, ω).
∗Recall that real-valued convex functions are continuous (in fact they are locally Lipschitz continuous).
Stochastic Optimization 2
Uday V. Shanbhag Lecture 5
Conjugate duality: An introduction
•We now provide a brief aside regarding conjugate duality (Fenchel, 1951)
based on Veinott (1989).
• Consider the primal problem that requires choosing an x ∈ Rn that
minimizes c(x, p) where c is an extended real-valued convex function on
Rm+n and p ∈ Rm. Let C(q) be defined as follows for q ∈ Rm:
C(q) , infxc(x, q). (Primal)
It is well known that C is convex.
• The dual program requires choosing a (π, µ) ∈ Rm+1 such that the
linear function πTp− µ is maximized subject to the inequalities
πTq − µ ≤ C(q), ∀q ∈ Rm.
Stochastic Optimization 3
Uday V. Shanbhag Lecture 5
• More specifically, for a given p, we have the following problem:
maxπ,µ
πTp− µ
πTq − µ ≤ C(q), ∀q ∈ Rm.(1)
• Geometric insight: Choosing the affine function πTp − µ that minorize
C(p), the one whose value πTp− µ is maximized.
• In fact, C is closed at p if and only if the primal infimum is equal to the
dual supremum (the main duality relationship).
• Now consider Rockafellar’s conjugate duality relationship:
• For each fixed π, one can maximize πTp−µ subject to πTq−µ ≤ C(q)
by putting µ = C∗(π), where
C∗(π) = supq
[qTπ − C(q)
].
Stochastic Optimization 4
Uday V. Shanbhag Lecture 5
• The dual program then reduces to the following:
C∗∗(p) = supπ∈Rm
[pTπ − C∗(π)
].
In effect, the supremum is given by the conjugate of C∗. Furthermore,
C(p) = C∗∗(p) if and only if C is closed at p.
• Let us now return to our original problem given by (Primal).
• By Rockafellar (1968, 1970), the conjugate of c is denoted by c∗, we
have that C∗(π) = c∗(0, π) based on noting that
C∗(π) = supq∈Rm
[πTq − C(q)
]= sup
x∈Rn,q∈Rm
[0Tx+ πTq − c(x, q)
]= c∗(0, π).
Stochastic Optimization 5
Uday V. Shanbhag Lecture 5
• It follows from C(p) = C∗ ∗ (p) and by putting p = 0, we have that
infxc(x,0) = C(0) = C∗∗(0) = sup
π
[0Tπ − C∗(π)
]= sup
π−c∗(0, π).
Stochastic Optimization 6
Uday V. Shanbhag Lecture 5
664 ARTHUR F. VEINOTT, JR.
PRIMAL PROGRAM
Consider the primal program of choosing x E !H fl that minimizes
c(x, P>>
where c is a + w or real-valued convex function on $3 n +“’ and p E 3 n’. Let C be the projection of c, i.e.,
C(9) = i;fc(x,y) for 9 E S”‘.
It is well known that C is convex.
DUAL PROGRAM
The dual program is that of choosing (7~, p) E !B n’+l to maximize the linear function
CT> P> -lJ (1)
subject to the linear inequalities
(77>9)-pGCc(9) for all 9 E % I”, (2)
where (. , . > is the usual inner product on 3 nr. The geometric interpretation of the problem, illustrated in Figure 1, is that of choosing, from among all affine functions (v, .) - p minorizing C( .), one whose value at p is maxi- mum.
FIG. 1 Figure 1: A schematic
Stochastic Optimization 7
Uday V. Shanbhag Lecture 5
Application of conjugate duality to linear programming
Consider the standard form LP:
minx
cTx
subject to Ax = b
x ≥ 0.
(LP)
Consider the parametrized form of this LP:
minx
cTx
subject to Ax = b+ q
x ≥ 0.
(LP)
Note that when q ≡ 0, we recover the original form.
Stochastic Optimization 8
Uday V. Shanbhag Lecture 5
We define c(x, q) of our primal problem as follows:
c(x, q) ,
cTx if Ax = b+ q, x ≥ 0
+∞ otherwise .
Then the original LP corresponds to the primal problem:
infxc(x,0).
Stochastic Optimization 9
Uday V. Shanbhag Lecture 5
By the conjugate duality framework, the dual problem is given by
c∗(q, π) = supx
[qTπ − c(x, q)
]= sup
x:x≥0,Ax=b+q
[πTq − cTx
]= sup
x:x≥0
[πT(Ax− b)− cTx
]= sup
x:x≥0
[(ATπ − c)Tx
]− πTb.
Next we note that
supx:x≥0
[(ATπ − c)Tx
]− πTb =
−πTb, ATπ ≤ c
+∞. Otherwise.
Stochastic Optimization 10
Uday V. Shanbhag Lecture 5
But this is the dual problem
minπ
− bTπ
subject to ATπ ≤ c.(2)
This problem is equivalent to the following:
maxπ
bTπ
subject to ATπ ≤ c.(Dual)
Stochastic Optimization 11
Uday V. Shanbhag Lecture 5
Return to convex two-stage problems
• Consider a convex two-stage stochastic program of the form
SP minimizex
f(x) , E[f(x, ω)]
subject to x ∈ X,
where f(x, ω) is the optimal value of the second-stage problem
SecStage(ω) minimizey∈Y
q(y, ω)
subject to gi(y, ω) + χi(ω) ≤ 0, i = 1, . . . ,m.
where χi(ω) = ti(x, ω).
•We assume throughout this section that q(y, ω), gi(y, ω), and ti(x, ω)
Stochastic Optimization 12
Uday V. Shanbhag Lecture 5
are real-valued convex functions† for a.e. ω, X,Y are convex sets.
• Suppose ψ(y, χ, ω) is defined as
ψ(y, χ, ω) , q(y, ω) + 1lRn−G(y, ω) + χ(x, ω)),
where
q(y, ω) , q(y, ω) + 1lY (y),
G(y, ω) , (g1(y, ω); . . . ; gm(y;ω), and 1lRn−(•) is the indicator func-
tion for the negative orthant or
1lRn−(z) =
0, z ≤ 0
+∞. otherwise
†Recall that real-valued convex functions are continuous (in fact they are locally Lipschitz continuous).
Stochastic Optimization 13
Uday V. Shanbhag Lecture 5
• In the remainder of this discussion, we suppress ω are refer to ψ(y, χ, ω)
by ψ(y, χ).
•We may now compute the conjugate function of ψ(χ, y):
ψ∗(y∗, χ∗)
= sup(y,χ)∈Rm×Rn
((χ∗)Tχ+ (y∗)Ty)− q(y, ω)− 1lRn−(G(y, ω) + χ)
= sup
(y,χ)∈Rm×Rn
((χ∗)T(G(y) + χ)− (χ∗)TG(y) + (y∗)Ty)
− q(y, ω)− 1lRn−(G(y, ω) + χ)
= supy∈Rm
(y∗)Ty − q(y, ω)− (χ∗)TG(y)
+ supχ∈Rn
[((χ∗)T(G(y) + χ)− 1lRn−(G(y, ω) + χ)
].
Stochastic Optimization 14
Uday V. Shanbhag Lecture 5
• Suppose z = G(y) + χ, we have that
supχ∈Rn
[((χ∗)T(G(y) + χ)− 1lRn−(G(y, ω) + χ)
]= sup
z∈Rn
[(χ∗)Tz − 1lRn−(z)
]= sup
z∈Rn−
[(χ∗)Tz
]
=
0, χ∗ ≥ 0
+∞, otherwise.
= 1lRn+(χ∗), where 1lR+m
(u) =
0, u ≥ 0
+∞, otherwise .
Stochastic Optimization 15
Uday V. Shanbhag Lecture 5
• Consequently, we obtain
ψ∗(χ∗, y∗, ω) = supy∈Rm
(y∗)Ty − L(y, χ∗)
+ 1lRm+(χ∗),
where L(y, χ∗) , q(y) +∑mi=1 χ
∗igi(y, ω).
• Recall that
θ∗(χ∗) = ψ∗(0, χ∗) = supy∈Rm
−L(y, χ∗)+ 1lR+m
(χ∗)
= − infy∈Rm
L(y, χ∗) + 1lR+m
(χ∗).
Stochastic Optimization 16
Uday V. Shanbhag Lecture 5
As a result, the dual of the second-stage problem is given by
θ∗∗(χ) = maxλ∈Rm
λTχ− θ∗(λ)
= max
λ∈Rm
λTχ+ inf
y∈RmL(y, χ)− 1lRm+(λ)
= max
λ∈Rm+
λTχ+ inf
y∈RmL(y, λ)
.
• By the Fenchel-Moreau theorem, we have that either θ∗∗(•) = −∞ or
θ∗∗(y) = lsc(conv θ)(y), ∀y ∈ Rm.
• Recall that a function f is lower semicontinuous at x0 if
lim infx→x0
f(x) ≥ f(x0).
Stochastic Optimization 17
Uday V. Shanbhag Lecture 5
A function f is said to be lsc if it is lsc at every x0 ∈ Rn and lsc f is
the largest lower semicontinuous function that is less than equal to f .
• Consequently, θ∗∗(y) ≤ θ(y) for any y ∈ Rm and there is said to be no
duality gap or θ∗∗(y) = θ(y).
• Consider a setting where ψ(x, y) is convex over (x, y) ∈ Rn × Rn.
It is relatively straightforward to ascertain that θ(y) is convex and
conv θ(•) = θ(•).
• Furthermore, it is said that (**) is subconsistent if for a given value
of y, lsc θ(y) < +∞.
• Note that if (**) is feasible or dom ψ(•, y) 6= ∅, then θ(y) < +∞and (**) is subconsistent.
Proposition 1 Suppose that ψ(•, •) is convex. Then the following
holds:
Stochastic Optimization 18
Uday V. Shanbhag Lecture 5
1. The optimal value function θ(•) is convex;
2. If (**) is subconsistent, then θ∗∗(y) = θ(y) if and only if θ(•) is
lsc at y;
3. If θ∗∗(y) is finite, then the set of optimal solutions of the dual
problem (***) coincides with ∂θ∗∗(y);
4. The set of optimal solutions of (***) is nonempty and bounded if
and only if θ(y) is finite and θ(•) is continuous at y.
Remark: Some quick observations:
(2.) follows from the Fenchel-Moreau theorem;
(3.) follows from
∂f∗∗(x) = arg maxz∈Rn
zTx− f∗(z)
.
Stochastic Optimization 19
Uday V. Shanbhag Lecture 5
If θ(•) is continuous at y then it is lsc at y and θ∗∗(y) = θ(y).
Moreover, it follows that ∂θ∗∗(y) = ∂θ(y) and is nonempty and
bounded if θ(y) is finite. But by hypothesis of (iii) (θ∗∗(y) is finite), we
have that the set of optimal solutions of (***) is bounded and nonempty.
Proposition 2 Let χ and ω ∈ Ω be specified. Suppose that the second-
stage problem is convex. Then the following holds:• The functions θ(•, ω) and f(•, ω) are convex• There is no duality gap between the primal and the dual problems and
the dual problem has a nonempty set of optimal solutions if and only if
the optimal value function θ(•, ω) is subdifferentiable at χ.• Suppose that the optimal value of (SLP) is finite. Then there is no duality
gap between the primal and dual solutions and the dual problem has a
nonempty and bounded solution set if and only if
χ(•, ω) ∈ int (dom θ(•, ω)).
Stochastic Optimization 20
Uday V. Shanbhag Lecture 5
Nonanticipativity
Consider a two-stage stochastic program in which Ω = ω1, . . . , ωK with
probabilities p1, . . . , pK:
minx
K∑k=1
pkF (x, ωk) subject toxk ∈ X, k = 1, . . . ,K.
Consider a relaxation of the first-stage problem in which x is replaced by K
vectors, x1, . . . , xK for each scenario.
The resulting problem is given by
minxk
K∑k=1
pkF (xk, ωk) subject toxk ∈ X, k = 1, . . . ,K.
Stochastic Optimization 21
Uday V. Shanbhag Lecture 5
This is a set of K separable problems, with the kth problem given by
minxk
F (xk, ωk) subject toxk ∈ X.
More specifically, in the context of two-stage linear programming, this
problem is given by
minxk≥0,yk≥0
cTxk + qTk yk
subject toAxk = b,
Tkxk +Wkyk = hk.
(3)
However, this formulation is not suitable for modeling a two-stage process
because the first-stage variable xk is not dependent on the realization ωk.
Stochastic Optimization 22
Uday V. Shanbhag Lecture 5
This can be resolved by adding an additional constraint:
(x1, . . . , xK) ∈ L,
where
L , x = (x1, . . . , xK) : x1 = x2 = . . . = xK.
This is a linear subspace of the nK dimensional space X = Rn× . . .×Rn.
Decisions lying in this set are NOT dependent on the realization of random
data.
This constraint is referred to as the nonanticipativity constraint and together
with this constraint, the two-stage problem can be posed as follows:
minx
K∑k=1
pkF (x, ωk) subject tox1 = . . . = xK, xk ∈ X, k = 1, . . . ,K.
Stochastic Optimization 23
Uday V. Shanbhag Lecture 5
Another approach for specifying the non-anticipativity constraints is as
follows:x1 = x2
x2 = x3
...
xK−1 = xK
Suppose the nonanticipativity constraints are represented as
xk =K∑i=1
pixi, i = 1, . . . ,K.
Such a representation has particular relevance when contending with general,
rather than finite, distributions.
Stochastic Optimization 24
Uday V. Shanbhag Lecture 5
Consider the space X equipped with the scalar product
〈x,y〉 :=K∑i=1
pixTi yi.
Furthermore, suppose the linear operator P is defined as
Px :=
∑Ki=1 pixi
...∑Ki=1 pixi
.
Consequently, we have that
xk =K∑i=1
pixi, i = 1, . . . ,K.
Stochastic Optimization 25
Uday V. Shanbhag Lecture 5
can be compactly represented as x = Px. In fact, P is an orthogonal
projection operator on X in that
P(Px) = Px.
Furthermore, we have that
〈Px,y〉 =
K∑i=1
pixi
T K∑i=1
piyi
= 〈x,Py〉.
Stochastic Optimization 26
Uday V. Shanbhag Lecture 5
Dualization of nonanticipativity constraints
Suppose we assign Lagrange multipliers λ1, . . . , λK to the nonanticipativity
constraints:
xk =K∑i=1
pixi, i = 1, . . . ,K.
We may then define the Lagrangian function L(x, λ) as follows:
L(x, λ) :=K∑k=1
pkF (xk, ωk) +K∑k=1
pkλTk (xk −
K∑i=1
pixi).
Since P is an orthogonal projection, it follows that I−P is also an
orthogonal projection.
Stochastic Optimization 27
Uday V. Shanbhag Lecture 5
This is a consequence of noting that
(I−P)(I−P)x = (I−P)x−Px + P(Px)
= (I−P)x−Px + Px
= (I−P)x.
It follows that
K∑k=1
pkλTk (xk −
K∑i=1
pixi) = 〈λ, (I−P)x〉 = 〈(I−P)λ,x〉.
As a consequence, the Lagrangian function can be rewritten as follows:
L(x, λ) :=K∑k=1
pkF (xk, ωk) +K∑k=1
pk(λk −K∑i=1
piλi)Txk.
Stochastic Optimization 28
Uday V. Shanbhag Lecture 5
Duality
Consider the optimization problem
minx1,...,xK ,z
K∑k=1
pkF (xk, ωk) subject to xk = z, xk ∈ X, k = 1, . . . ,K.
By using an indicator function, we can write this problem as follows:
minx1,...,xK ,z
K∑k=1
pkF (xk, ωk) subject to xk = z, xk ∈ X, k = 1, . . . ,K.
Stochastic Optimization 29
Uday V. Shanbhag Lecture 5
Then the Lagrangian function is given by
L(x1, . . . , xK, z, λ1, . . . , λK) :=K∑k=1
pkFk(xk, ωk) +K∑k=1
pkλTk (xk − z).
The resulting min-max problem is given by
minx1,...,xK ,z
sup
λ1,...,λK
L(x1, . . . , xK, z, λ1, . . . , λK)
= minx1,...,xK ,z
supλ1,...,λK
K∑k=1
pkFk(xk, ωk) +K∑k=1
pkλTkxk − (
K∑k=1
pkλk)Tz
= sup
λ1,...,λK
minx1,...,xK ,z
K∑k=1
pkFk(xk, ωk) +K∑k=1
pkλTkxk − (
K∑k=1
pkλk)Tz
.
Stochastic Optimization 30
Uday V. Shanbhag Lecture 5
However, the infimum of the Lagrangian is −∞ unless∑Kk=1 pkλk = 0. It
follows that this problem can then be stated as follows:
maxλ1,...,λK
D(λ) where D(λ) , minx1,...,xK ,z
K∑k=1
pkFk(xk, ωk) +K∑k=1
pkλTkxk
subject to
K∑k=1
pkλk = 0.
(4)
From the separable structure of the problem, we have that
L(x, λ) =K∑k=1
pkLk(xk, λk), where Lk(xk, λk) = F (xk, ωk) + λTkxk.
Stochastic Optimization 31
Uday V. Shanbhag Lecture 5
Furthermore, we have that D(λ) =∑Kk=1 pkDk(λk),
where Dk(λk) = infxkLk(xk, λk).
• Suppose the problem is linear and the primal and dual problems are
feasible. Then there is no duality gap.
Stochastic Optimization 32
Uday V. Shanbhag Lecture 5
Duality for general distributions
• Consider the optimization problem given by the following:
minx∈Rn
E[F (x, ω)],
where
F (x, ω) = F (x, ω) + 1lX(x).
• Let X be a linear space of measurable mappings from Ω→ Rn and be
defined as Lp(Ω,F ,P;Rn) for p ∈ [1,+∞]. Consequently, for every
x ∈ X , the expectation E[F (x, ω)] is well-defined.
• Consequently, we may articulate the expected value problem as follows:
minx∈Lx
E [F (x(ω), ω)],
Stochastic Optimization 33
Uday V. Shanbhag Lecture 5
where
Lx , x ∈ X : x(ω) ≡ x for some x ∈ Rn
and x(ω) ≡ x implies that x(ω) = x for a.e. ω ∈ Ω.
• Consider the dual space of X , denoted by X ∗, and defined as X ∗ :=
Lq(Ω,F ,P;Rn), where 1/p + 1/q = 1. Note that by convention, if
p =∞, q = 1 and if q =∞, p = 1.
•We may now define the scalar or bilinear product given by
〈λ,x〉 = E [λTx] =∫
Ωλ(ω)Tx(ω)dP (ω), λ ∈ X ∗, x ∈ X .
• Further, consider the projection operator P : X → Lx defined as
[Px](ω) ≡ E[x].
By the definition of Lx, we have Lx = x : x ∈ X ,Px = x.
Stochastic Optimization 34
Uday V. Shanbhag Lecture 5
• Recall that the inner product is defined as
〈λ,Px〉 = E [λTPx].
But Px = E[x]. Consequently, we have that
E [λTPx] = E[λ]TE[x] = 〈λ,Px〉 = 〈P∗λ,x〉,
where P∗ is a projection operator from X ∗ to a subspace formed by
constant a.e. maps.‡
• It follows that
L(x, λ) := E [F (x(ω), ω)] + E [λT(x− E[x])].‡Note that if p = 2, then X∗ = X and P∗ = P.
Stochastic Optimization 35
Uday V. Shanbhag Lecture 5
• It can be observed that the second term can be rewritten as follows:
E [λT(x− E[x])] = 〈λ,x−Px〉 = 〈λ,x〉 − 〈λ,Px〉= 〈λ,x−Px〉 = 〈λ,x〉 − 〈P∗λ,x〉= 〈λ−P∗λ,x〉.
• Important observation:
λ+ u−P∗(λ+ u) = λ+ u− (P∗λ+ u) = λ−P∗λ,
where u is a constant map. Note that P∗u = u since P∗ is an operator
that projects onto the space of constant maps (a.e.).
• Consequently, λ − P∗λ does not change by adding a constant to λ(.).
It follows that we can subtract a constant P∗λ from λ.
Stochastic Optimization 36
Uday V. Shanbhag Lecture 5
• It follows that we can set P∗λ = 0 or E[λ] = 0. As a result, the
Lagrangian function is defined as follows:
L(x, λ) := E [F (x(ω), ω) + λ(ω)T(x(ω))], for E[λ] = 0.
•We may now articulate the dual problem:
maxλ∈X ∗
D(λ) := inf
x∈XL(x, λ)
subject to E[λ] = 0.
• By the interchangeability principle§, we have that the following holds:
infx∈X
E[F (x(ω), ω) + λ(ω)Tx(ω)] = E[
infx∈Rn
(F (x, ω) + λ(ω)Tx
)].
§See Theorem 7.80: basically this provides conditions under which
E[
infxf(x, ω)
]= infχ∈M
E[Fχ], where Fχ(ω) = f(χ(ω), ω).
Stochastic Optimization 37
Uday V. Shanbhag Lecture 5
• Consequently, D(λ) can be expressed as
D(λ) = E[Dω(λ(ω))], where Dω : Rn → R
is defined as
Dω(λ) := infx∈Rn
(λTx+Fω(x)) = − supx∈Rn
(−λTx−Fω(x)) , −F ∗ω(−λ).
• As a result, the dual function can be computed by solving Dω for every
ω and taking the Expectation over the optimal values
• From general theory, we have that the dual optimal value is less than
or equal to that of the primal problem. Furthermore, there is no duality
gap between these problems and both the primal and dual have optimal
Stochastic Optimization 38
Uday V. Shanbhag Lecture 5
solutions x and λ if and only if (x, λ) is a saddle-point of the Lagrangian
function or
x ∈ arg minx∈Lx
L(x, λ) and λ ∈ arg maxλ:E[λ]=0
L(x, λ).
• By the interchangeability principle, we have that
x(ω) ≡ x and x ∈ arg minx∈Rn
F (x, ω) + λ(ω)Tx
, a.e. ω ∈ Ω.
Since x(ω) = x a.e., it follows that E[λ] = 0, a consequence of the
earlier result.
• Suppose we now impose a convexity requirement (and closedness re-
quirement) on X as well as a convexity assumption of Fω(.) for a.e.
Stochastic Optimization 39
Uday V. Shanbhag Lecture 5
ω ∈ Ω. Consequently, Fω is also convex for a.e. ω ∈ Ω. Then, we have
that by the interchangeability principle, we have that
λ ∈ arg maxL(x, λ) if and only if λ(ω) ∈ −∂Fω(x).
To ensure feasibility with respect to E[λ] = 0, taking expectations on
both sides we have the following:
0 ∈ E[−∂Fω(x)].
However, under suitable regularity conditions, we can interchange E[.]
and ∂[.]. Furthermore, if 0 ∈ K if and only if 0 ∈ −K. It follows that
0 ∈ ∂[E[∂Fω(x)]].
Theorem 3 Suppose that the function F (x, ω) is random lower semi-
Stochastic Optimization 40
Uday V. Shanbhag Lecture 5
continuous, the set X is convex and closed, and for a.e. ω ∈ Ω, the
function F (., ω) is convex. Suppose (P) and (D) are given by the
following:
minx∈Rn
E[F (x, ω)] (P)
maxλ∈X ∗
D(λ) := inf
x∈XL(x, λ)
subject to E[λ] = 0. (D)
Then there is no duality gap between (P) and (D) and both prob-
lems have an optimal solution if and only if there exists and x ∈ Rn
satisfying
0 ∈ ∂[E[∂Fω(x)]].
In such a case, x is a solution of (P) and λ(ω) is a measurable
selection such that
λ(ω) ∈ −∂Fω(x)
such that E[λ] = 0 is an optimal solution of (D).
Stochastic Optimization 41