Optimization with multivariate stochastic dominance ... · 1 (;F;P) dominates Y 2Lm 1 (;F;P)...

Optimization with multivariate stochasticdominance constraints

Darinka Dentcheva & Eli Wolfhagen

Stevens Institute of Technology, Hoboken, New Jersey, USA

This research is supported by NSF

ICSP 2013, Universita degli studi di Bergamo, July 8, 2013

Stochastic Orders - Introduction

Let (Ω,F ,P) denote an abstract probability space.

Usual stochastic order

X ∈ L1(Ω,F ,P) dominates Y ∈ L1(Ω,F ,P) in the first order (or inthe usual stochastic order), denoted X (1) Y , if

PX ≤ η ≤ PY ≤ η, ∀η ∈ R.

For any X ∈ L1(Ω,F ,P), define the expected shortfall function

F(2)(X ; η) =

∫ η

−∞PX ≤ tdt = E [(η − X )+] .

Stochastic dominance of the second order

X ∈ L1(Ω,F ,P) dominates Y ∈ L1(Ω,F ,P) in the second order,denoted X (2) Y , if

F(2)(X ; η) ≤ F(2)(Y ; η), ∀η ∈ R.

Second-order stochastic dominance is particularly popular in industrysince it models risk-averse preferences.

Darinka Dentcheva & Eli Wolfhagen Optimization with Multivariance Stochastic Dominance

Multivariate Stochastic Dominance - Motivation

In many applications, it is necessary to keep track of multiple randomoutcomes. Several examples in existing literature include applicationsto:

Budget Allocation - a budget needs to be formulated in such away that multiple performance measures dominate somebenchmark outcomesCancer Treatment - various tissue areas (e.g., diseased andhealthy tissues) have separate treatment requirementsFinance - an investor desires a portfolio where the return rateand various additional measures of risk dominate a givenbenchmark portfolio

Extensions of scalar stochastic orders to multivariate randomvariables are needed to handle these multi-objective problems.


Multivariate Stochastic Dominance - Definition

There are several competing ideas as to how best extend the ideas ofstochastic dominance to multivariate random vectors X and Y. Weshall focus on the idea of introducing a scalarization family S and useunivariate orders to compare the scalarizations c>X and c>Y forc ∈ S.

Linear multivariate stochastic dominance

X ∈ L m1 (Ω,F ,P) dominates Y ∈ L m

1 (Ω,F ,P) linearly in the secondorder, denoted X lin

(2) Y, if

c>X (2) c>Y for all c ≥ 0,m∑

i=1

ci = 1.

We denote the set of all scalarizations as S = c ≥ 0 :∑m

i=1 ci = 1.

These scalarizations serve as a ranking of the importance of each ofthe random outcomes. Thus, if X lin

(2) Y, then under any ranking ofthe importance of the outcomes any risk-averse decision maker willprefer X over Y.


Optimization Assumptions

We want to include the concept of linear multivariate stochasticdominance as a constraint for accepting decisions. For this problemwe assume that

There are n decisions x = (x1, . . . , xn) ∈ Rn

Our objective function f : Rn → R is convex.The mapping G : Rn → L m

1 (Ω,F ,P) comprises randomoutcomes Gi (x), i = 1, . . . ,m which are the performancemeasures of our decisions.The mapping x 7→ [G(x)](ω) is concave for P-almost all ω ∈ Ω.The set X ⊆ Rn is closed and convex, representing deterministicconstraints on our decisions.Y ∈ L m

1 (Ω,F ,P) is an m-dimensional random vector serving asour benchmark.


Optimization with Multivariate Stochastic Dominance Constraints

Specifically, we analyze the following optimization problem

minimize f (x) (1)

subject to E[(η − c>G(x)

)+

]≤ E

[(η − c>Y

)+

]for all c ∈ S and η ∈ [a,b] (2)

x ∈ X. (3)

In (2), we assume that a and b are sufficiently chosen to contain thesupport of all scalarizations c>Y and c>G(x). In this was (2) isequivalent to the condition that G(x) lin

(2) Y.


Shortfall-Events Approximation Method (SEAM)

Step 0: Set k = 1. Choose some J1 ⊂ [a,b]× S.Step 1: Solve the approximate problem:

min f (x) (4)

s.t. E[ (ηj − (c j )>G(x)

)+

]≤ E

[ (ηj − (c j )>Y

)+

]for each (ηj , c j ) ∈ Jk

x ∈ X.

Let xk denote the solution of problem (4); set Xk = G(xk ).Step 2: Calculate the quantity

δk = infη,c

E[(η − c>Y

)+

]− E

[(η − c>Xk)

+

]: c ∈ S, η ∈ [a,b]

. (5)

If δk ≥ 0, stop; otherwise, continue.Step 3: Determine (ηk , ck ) such that

E[ (ηk − (ck )>Y

)+

]− E

[ (ηk − (ck )>Xk)

+

]≤ δk

2. (6)

Step 4: Set Jk+1 = Jk ∪ (ηk , ck ). Increase k by one, and go toStep 1.


Numerical Method - General Discussion

Assume that the probability space (Ω,F ,P) is finite.

TheoremAssuming that subproblem (5) can be solved, the Shortfall-EventsApproximation Method will converge to a solution to problem (1)–(3)in finitely many iterations.

The master problem (4) in this algorithm is a convex optimizationproblem and can be solved using known methods. More difficult tohandle is subproblem (5), which is a non-convex global optimizationproblem.


Dealing with Subproblem (5)

The objective function for subproblem (5):

φ(η, c) = E[ (η − c>Y

)+

]− E

[ (η − c>Xk)

+

],

is a difference of two convex functions, which motivated us to applyDC-optimization tools. To this end, we subdifferentiate the function

hk (η, c) = E[

max(0, η − c>Xk )]

at each point (ηi , c i ) ∈ Jk . For each point (ηi , c i ), we define the event

A ik = ω ∈ Ω : (c i )>Xk (ω) ≤ ηi,

which we use to following vector g ik is a subgradient of hk (η, c) at(ηi , c i ):

g ik =

(P(A ik )

−E[Xk1A ik ]

)∈ ∂hk (ηi , c i ).

Here 1A ik stands for the indicator function of the event A ik .


Dealing with Subproblem (5) (cont.)

We use these subgradients to construct a convex upper bound ofφ(·, ·) :

φki (η, c) = E[

(η − c>Y

)+

]− hk (ηi , c i )− (g ik )>(η − ηic − c i

)= E[

(η − c>Y

)+

]− ηP(A ik ) + c>E[Xk1A ik ].

We can, in fact, tighten the approximation by taking a polyhedralconvex minorization of hk (η, c), given by

hk (η, c) = maxi∈Jk

(ηP(A ik )− c>E[Xk

1A ik ]).

and defining

φk (η, c) = E[(η − c>Y

)+

]− hk (η, c)

= mini∈Jk

E[(η − c>Y

)+

]− ηP(A ik ) + c>E[Xk1A ik ]

.

We have, for any i ∈ Jk ,

φki (η, c) ≥ φk (η, c) ≥ φ(η, c) for all (η, c) ∈ [a,b]× S.


Dominance Verification Method for Subproblem (5)

We propose the following algorithm for solving subproblem (5).

Dominance Verification Method (DVM)Step 2a: Set j = 1 and calculate (ξ1, v1) as the solution of thefollowing optimization problem

minη,cφk (η, c) : (η, c) ∈ [a,b]× S

Step 2b: Define the set Bj = ω ∈ Ω : (v j )>Xk (ω) ≤ ξj. Let

hj (η, c) = max

hk (η, c), max1≤i≤j

(P(Bj )η − c>E[Xk

1Bj ]).

Step 2c: Calculate (ξj+1, v j+1) as the solution of the followingoptimization problem

minη,c

E[(η − c>Y

)+

]− hj (η, c) : (η, c) ∈ [a,b]× S

If hj (ξj+1, v j+1) = hk (ξj+1, v j+1), then set (ηk+1, ck+1) = (ξj+1, v j+1)and stop. Otherwise increase j by one and go to Step 2b.


Dual Methods

Recall the master problem in SEAM:min f (x)

s.t. E[ (ηj − (c j )>G(x)

)+

]≤ E

[ (ηj − (c j )>Y

)+

](7)

for each (ηj , c j ) ∈ Jk

x ∈ X.

If for each inequality constraint in (7) we associate a Lagrangemultiplier λj ≥ 0, then we can formulate the approximate Lagrangianat iteration k :

Lk (x , λ) = f (x)+∑j∈Jk

λj

E[ (ηj − (c j )>G(x)

)+

]− E

[ (ηj − (c j )>Y

)+

].

Minimizing this, we arrive at the approximate dual function

ϕ(λ) = min

Lk (x , λ) : x ∈ X. (8)


Dual Approximation - Bundle Method (DABM)

Step 0: Set k = 1. Choose some λ1 ≥ 0 and J1 ⊂ [a,b]× S.Step 1: Solve the minimization problem (8) to obtain ϕ(λk ) andminimizer xk . Set Xk = G(xk ).Step 2: Calculate the quantity

δk = infη,c

E[(η − c>Y

)+

]− E

[(η − c>Xk)

+

]: c ∈ S, η ∈ [a,b]

. (9)

If δk ≥ 0, go to Step 3; otherwise, add some appropriate new cut(ηk , ck ) to Jk and go to Step 3.Step 3: For each j ∈ Jk , calculate

∆kj = E

[(ηj − (c j )>Xk)

+

]− E

[(ηj − (c j )>Y

)+

]ϕk (λ) = min

1≤`≤kϕ(λ) + (∆`)>(λ− λ`)

Step 4: If k = 1 of ϕ(λk ) ≥ (1− γ)ϕ(wk−1) + γϕk−1(λk ), then setwk := λk ; otherwise set wk := wk−1.Step 5: Calculate a solution (θk+1, λ

k+1) to the problem:

maxθ − %

2‖λ− wk‖2 : λ ≥ 0, ϕ(λ`) + (∆`)>(λ− λ`) ≥ θ, ` = 1, . . . , k

.

Step 6: If ϕ(λk ) = ϕk (λk+1), then Stop. Otherwise, set k ← k + 1and go to Step 1.


Dual Approximation - Trust Region Method (DATRM)

Step 0: Set k = 1. Choose some λ1 ≥ 0 and J1 ⊂ [a,b]× S.Step 1: Solve the minimization problem (8) to obtain ϕ(λk ) andminimizer xk . Set Xk = G(xk ).Step 2: Calculate the quantity

δk = infη,c

E[(η − c>Y

)+

]− E

[(η − c>Xk)

+

]: c ∈ S, η ∈ [a,b]

. (10)

If δk ≥ 0, go to Step 3; otherwise, add some appropriate new cut(ηk , ck ) to Jk and go to Step 3.Step 3: For each j ∈ Jk , calculate

∆kj = E

[(ηj − (c j )>Xk)

+

]− E

[(ηj − (c j )>Y

)+

]ϕk (λ) = min

1≤`≤kϕ(λ) + (∆`)>(λ− λ`)

Step 4: If k = 1 of ϕ(λk ) ≥ (1− γ)ϕ(wk−1) + γϕk−1(λk ), then setwk := λk ; otherwise set wk := wk−1.Step 5: Calculate a solution (θk+1, λ

k+1) to the problem:

maxθ : ‖λ− wk‖∞ ≤ ε, ϕ(λ`) + (∆`)>(λ− λ`) ≥ θ, ` = 1, . . . , k

.

Step 6: If ϕ(λk ) = ϕk (λk+1), then Stop. Otherwise, set k ← k + 1and go to Step 1.


Dual Methods - Discussion

Uniform Dominance ConditionThere is some xS ∈ X such that

infE[ (η − c>Y

)+

]− E

[ (η − c>G(xS)

)+

]: (η, c) ∈ [a,b]× S

> 0

TheoremAssuming that subproblem (9) (or (10)) can be solved, the sequenceof points wk generated either by the Dual Approximation - BundleMethod (or Dual Approximation - Trust Region Method, respectively)converges to a solution of the dual problem. If the uniform dominancecondition holds, then the sequences θk and ϕk (λk+1) converge to theoptimal value of problem (1)–(3) and we can recover the optimalsolution of it.

Again the difficulty lies in the problem for verifying the dominanceconstraint. We implemented DVM with these regularized dualmethods in the numerical experiments to follow.


Numerical Experience

Homem-de-Mello & Mehrotra (2009) showed that the problem (1)–(3)can be reduced to enforcing univariate dominance at a particular finitesubset c1, . . . , cJ ⊂ S. Noyan & Rudolf (2012) showed that thisfinite subset can be represented as the vertices of the polyhedron

P(Y) =

(c, η,w) ∈ S × R× RM+ : wj ≥ η − c>Yj , j = 1, . . . ,M

.

An alternative method involves calculating the vertices of thepolyhedron and enforcing univariate dominance at these points.

Consider the problemmax 3x1 + 2x2 subject to (11)

−

4± α 22 2± α1 0

(x1x2

)lin

(2) −

200± 10β160

40± 5β

,

where µ± ν is a random variable which takes values µ+ ν and µ− νeach with probability 1/2. Here we take α and β are independentrandom variables uniformly distributed between 0 and 2.


Numerical Results for Test 1

M V (Y) tEnum tSEAM tDABM tDATRM

4 4 0.11 0.21 0.83 0.2020 24 1.48 0.54 2.41 0.3240 87 8.94 0.73 3.23 0.45

100 381 114.18 2.14 3.67 1.10200 756 540.56 4.36 5.88 1.50300 913 1879.47 8.12 11.39 2.45400 975 4060.78 13.39 11.81 5.78

In the above table, there were S = 40 realizations for the mapping Gand M realizations for the benchmark Y. Values presented areaverages over 20 independent runs.


Additional Numerical Experience

Consider the problem

min d>x : Rx lin(2) Y, x ∈ R100

+ , (12)

where R is a 11× 100 matrix with entries independently andidentically uniformly distributed on the interval (0,1) and Y ∈ R11 is arandom vector defined as follows

Y =

µ1 + Z1µ2 + Z2µ3 + Z1

...µ10 + Z2

µ11 + θZ1 + (1− θ)Z2

,

where Zii.i.d.∼ N (0,1), for i = 1,2 and θ ∈ (0,1) is a known parameter.

We also consider for comparison the approximate problem

min

d>x :(Rx)

i (2) Yi , i = 1, . . . ,11, x ∈ R100+

(13)


Numerical Results for Test 2

N tEnum V (Y) tSEAM tDABM tDATRM tCoord Viol.10 17.1 1261 2.5 15.6 11.9 0.4 3520 499.3 5186 5.5 21.1 45.6 1.7 7930 2178.7 10581 5.4 24.9 25.3 2.9 8940 14 463.0 18121 16.3 20.2 23.2 7.5 158

Here N = M = S and each scenario was equally likely (withprobability 1

N ). Values presented are average of 5 independent runs.

Viol. is the number of vertices c ∈ V (Y) for which the optimal solutionof the approximate coordinate order problem (13), xCoord, satisfiesc>RxCoord 6(2) c>Y.


Recommendations

In theory, J1 can be any subset of [a,b]× S, but we found it helpful toinitialize with the set J1 = (η1,e1), . . . , (ηm,em), where ei is the i thcoordinate vector and ηi = min1≤j≤M Yj

i .

It is also possible to check univariate second order dominancerelation is satisfied for each scalarization c j ∈ S such that(ηj , c j ) ∈ Jk . If this is not the case, then we would generate additonalpoints (η, c j ) ∈ Jk in order to enforce the univariate second orderdominance for each such scalarization c j .


Discussion

The DVM proves to be inexpensive to implement compared to vertexenumeration, but the linearization techniques can only guaranteeconvergence to a local solution of the problem (5). Otherconsiderations (including enforcing univariate second orderdominance as discussed above) were used to help avoid gettingstuck a such a local solution.

The regularized dual methods (DABM and DATRM) performcomparably and are operate at similar speeds to the primal SEAMmethod.

Test 2 shows that the method is much more reliable than solving therelaxation to the cheaper coordinate-wise dominance constraint.

The definitions that we used preferred larger values, but the methodremains when dealing with preferences for smaller values (increasingconvex order).


Optimization with multivariate stochastic dominance ... · 1 (;F;P) dominates Y 2Lm 1 (;F;P)...

Documents

Transcript of Optimization with multivariate stochastic dominance ... · 1 (;F;P) dominates Y 2Lm 1 (;F;P)...