Robust Optimization: The Need, The Challenge, The ...mirai/opta/seminar03slides.pdfRobust...

Post on 10-Jul-2020

9 views 0 download

Transcript of Robust Optimization: The Need, The Challenge, The ...mirai/opta/seminar03slides.pdfRobust...

Robust Optimization: The Need,The Challenge, The Achievements

May 26, 27, 2018

Dept. of Statistical Inference and Mathematics,

The Institute of Statistical Mathematics, Tokyo

Aharon Ben-Tal

Technion—Israel Institute of Technology

DATA UNCERTAINTY IN OPTIMIZATION

♣ Consider a generic optimization problem of the form

minx

f (x; ζ) : F (x; ζ) ∈ K• x ∈ Rn: decision vector • ζ ∈ RM : data • K ⊂ Rm: closed convex set

♠ More often than not the data ζ is uncertain – not known exactly whenproblem is solved.Sources of data uncertainty:• part of the data is measured/estimated ⇒ estimation errors• part of the data (e.g., future demands/prices) does not exist when

problem is solved ⇒ prediction errors• some components of a solution cannot be implemented exactly as

computed ⇒ implementation errors which in many models can be mim-icked by appropriate data uncertainty

2

Example

Effect of implementation errors

Worst-case chance constraints

Antenna array (Ben-Tal and Nemirovski (2002))

1 We consider an optimization problem with 40 circular antennas.

2 Each antenna has its diagram Di (φ) - a plot of intensity of signal sentto different directions.

3 The diagram of the set of 40 antennas is the sum of their diagrams .

D(φ) =n∑

i=1

xiDi (φ)

4 To the i-th antenna we can send a different amount of power xi .

5 Objective: Set the xi ’s in such a way that the diagram has thedesired shape.

Postek et al. (2015) 27 / 34

Application - antenna array optimization

Consider a circular antenna:

X

Y

Z

φ

ri

Energy sent in angle φ ischaracterized by diagram

Diagram of a single antenna:

Di (φ) =1

2

2π∫0

cos

(2πi

40cos(φ) cos(θ)

)dθ

Diagram of n antennas

D(φ) =n∑

i=1

xiDi (φ)

xi - power assigned to antenna i

Objective: construct D(φ) as close as possible to the desired D∗(φ) using theantennas available.

Worst-case chance constraints

Desired diagram graphically

0 20 40 60 80−0.2

0

0.2

0.4

0.6

0.8

1

Dia

gram

Angle

0.5

1

30

210

60

240

90

270

120

300

150

330

180 0

Postek et al. (2015) 29 / 34

Worst-case chance constraints

Antenna array (Ben-Tal and Nemirovski (2002))

Problem conditions:

for 77 < φ ≤ 90 the diagram is nearly uniform:

0.9 ≤n∑

i=1

xiDi (φ) ≤ 1, 77 < φ ≤ 90

for 70 < φ ≤ 77 the diagram is bounded:

−1 ≤n∑

i=1

xiDi (φ) ≤ 1, 70 < φ ≤ 77

we minimize the maximum absolute diagram value over 0 < φ ≤ 70:

min max0<φ≤70

∣∣∣∣∣n∑

i=1

xiDi (φ)

∣∣∣∣∣Postek et al. (2015) 28 / 34

Optimization problem to be solved

min τ

s.t. −τ ≤n∑

i=1

xiDi (φ) ≤ τ, 0 ≤ φ ≤ 70

−1 ≤n∑

i=1

xiDi (φ) ≤ 1, 70 ≤ φ ≤ 77

0.9 ≤n∑

i=1

xiDi (φ) ≤ 1, 77 ≤ φ ≤ 90

Typically, decisions xi suffer from implementation error zi :

xi 7→ xi = (1 + zi )xi

We want each constraint to hold with probability at least 1− ε!

Ù 2! ׳#" ï Å;' !ôëò

0 10 20 30 40 50 60 70 80 90−0.2

0

0.2

0.4

0.6

0.8

1

1.2

1.81e−02

0.2

0.4

0.6

0.8

1

30

210

60

240

90

270

120

300

150

330

180 0

103110

×ÖÑÌ- ! '- +- 2" ­: 'ô!4\ ( þ JIH

0 10 20 30 40 50 60 70 80 90−400

−300

−200

−100

0

100

200

300

400 3.89e+02

129.7185

259.437

389.1556

30

210

60

240

90

270

120

300

150

330

180 0

103110

3 ÑÌ'I$ ä GKJL ãÒ M G çzä G M GKN ÕÖc- íÑ!]ííÑ õíÑ!]ííÑ eOP úBô#"Ì- F- + : 'ô!4\ ( x 9Q8õ ;RS

0 10 20 30 40 50 60 70 80 90−6000

−4000

−2000

0

2000

4000

6000

5.18e+03

1035.5601

2071.1202

3106.6803

4142.2405

5177.8006

30

210

60

240

90

270

120

300

150

330

180 0

103110

3 ÑÌ'I$ ä GKJL ãÒ M G çzä G M G N Õ³c- íÑ!]í õíÑ!]í IP úBô#Ì- Ì- + : 'ô!"4 ( x S 9 õ" Z TZ

¢k§

Example

Effect of data inaccuracy

Example

Effect of uncertain predictions

Truss Topology Design

aaaaaaaaaaaa!!!!!!!!!!!!""""""""""""bbbbbbbbbbbb @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@AAAAA@@@@@@@@@@@@@@@!!!!!!!!!!!!!!!!!aaaaaaaaaaaaaaaaaJJJJJJJJ@@@

2

The simplest TTD problem is

Compliance = 12fT x → min

s.t.

[m∑

i=1

tibibTi

]

︸ ︷︷ ︸

A(t)0

x = f

m∑

i=1

ti ≤ w

t ≥ 0

• Data:

— bi ∈ Rn, n – # of nodal degrees of freedom

(for a 10 × 10 × 10 ground structure, n ≈ 3, 000)

— m – # of tentative bars (for 10 × 10 × 10

ground structure, m ≈ 500, 000)

• Design variables: t ∈ Rm, x ∈ Rn

3

Can we trust the truss?

2

04-115

Example: Assume we are designing a planar truss acantilever; the 9 9 nodal structure and the only loadof interest f are as shown on the picture:

9 9 ground structure and the load of interest

The optimal single-load design yields a nice truss asfollows:

aaaaaaaaaaaa

!!!!

!!!!

!!!!

""""""""""""

bbbbbbbbbbbb

@@@

@

@@@@

@@@@@@@@@@

@@@@@@@@@@@@

AAAAA

@@@@@@@

@@@

@@@@@

!!!!

!!!!

!!!!

!!!!!

aaaaaaaaaaaaaaaaa

JJJJJJJJ

@@@

Optimal cantilever (single-load design)the compliance is 1.000

“NON-ADJUSTABLE” ROBUST OPTIMIZATION:Robust Counterpart of Uncertain Problem

minx

f (x, ζ) : F (x, ζ) ∈ K (U)

♣ The initial (“Non-Adjustable”) Robust Optimization paradigm (Soys-ter ’73, B-T&N ’97–, El Ghaoui et al. ’97–, Bertsimas&Sim ’03–,...) isbased on the following tacitly accepted assumptions:

A.1. All decision variables in (U) represent “here and now” decisionswhich should get specific numerical values as a result of solving theproblem and before the actual data “reveals itself”.

A.2. The uncertain data are “unknown but bounded”: one can spec-ify an appropriate (typically, bounded) uncertainty set U ⊂ RM of pos-sible values of the data. The decision maker is fully responsible forconsequences of the decisions to be made when, and only when, theactual data is within this set.

A.3. The constraints in (U) are “hard” – we cannot tolerate violationsof constraints, even small ones, when the data is in U .

5

minx

f (x, ζ) : F (x, ζ) ∈ Kζ ∈ U (U)

♠ Conclusions:• The only meaningful candidate solutions are the robust ones – those

which remain feasible whatever be a realization of the data from theuncertainty set:

x robust feasible ⇔ F (x, ζ) ∈ K ∀ζ ∈ U• “Robust optimal” solution to be used is a robust solution with the

smallest possible guaranteed value of the objective, that is, the optimalsolution of the optimization problem

minx,t

t : f (x, ζ) ≤ t, F (x, ζ) ∈ K ∀ζ ∈ U (RC)

called the Robust Counterpart of (U).

6

♠ With traditional modelling methodology,• “large” data uncertainty is modelled in a stochastic fashion and

then processed via Stochastic Programming techniques

Fact: In many cases, it is difficult to specify reliably the distribution ofuncertain data and/or to process the resulting Stochastic Programmingprogram.

♠ The ultimate goal of Robust Optimization is to take into account datauncertainty already at the modelling stage in order to “immunize”solutions against uncertainty.• In contrast to Stochastic Programming, Robust Optimization does

not assume stochastic nature of the uncertain data (although can uti-lize, to some extent, this nature, if any).

4

Semi-Infinite Conic Programs

♣ Conic Program:min

x

cTx : Ax − b ∈ K

(C)

• (c, A, b) – problem’s data• closed pointed convex cone K, int K 6= ∅, in a Euclidean space –problem’s structureExamples:

• Linear Programming: K = Rn+

• Conic Quadratic Programming: K is a direct product of Lorentzcones

Lk = y ∈ Rk : yk ≥√

y21 + ... + y2

k−1• Semidefinite Programming: K = Sn

+ is the cone of positive semidef-inite matrices in the space Sn of n × n symmetric matrices

♣ Semi-Infinite Conic Program:

minx

cTx : Ax − b ∈ K∀[A, b] ∈ U

where U is a given “uncertainty set” (assumed to be convex and com-pact).

2

minx

cTx : Ax − b ∈ K∀[A, b] ∈ U

(S)

♣ The main mathematical question associated with semi-infinite prob-lem (S) is:

(?) When and how (S) can be reformulated as a “computation-ally tractable” optimization problem?

The answer to (?) clearly depends on the interplay between the ge-ometries of the cone K and of the uncertainty set U .

9

Intractability

Consider a (nearly linear) constraint:

‖Px− p‖1 ≤ 1 , ∀ p ∈ U (1)

whereU = p = Bζ : ‖ζ‖∞ ≤ 1

(a polyhedral set). B º 0 given matrix

Check whether x = 0 is robust feasible, i.e. the validityof the inequality

‖Bζ‖1 ≤ 1 ∀ ζ : ‖ζ‖∞ ≤ 1 . (2)

Since ‖u‖1 = maxyT u | ‖y‖∞ ≤ 1, (2) is equivalent to

maxy,ζ

yT Bζ | ‖y‖∞ ≤ 1, ‖ζ‖∞ ≤ 1

≤ 1 . (3)

Maximum is achieved at y = ζ∗ so (3) is equivalent to

maxyT By | ‖y‖∞ ≤ 1 ≤ 1 .

The problem on the lhs (maximizing a nonnegativequadratic form over the unit box) is known to beNP-hard. In fact, it is NP-hard to compute thismaximum with an accuracy better than 4%.

∗Suppose (y, ζ), y 6= ζ is optimal. Then(

y+ζ2

, y+ζ2

)is feasible

and(

y+ζ2

)TB

(y+ζ2

)> yT Bζ which is a contradiction to the

optimality of y, ζ.

1

A Short Introduction to Conic Optimization

01-26

Conic optimization program

• Let K ⊂ Rm be a cone defining a good vector inequality ≥K

(i.e., K is a closed pointed cone with a nonempty interior).

A generic conic problem associated with K is an optimiza-

tion program of the form

minx

cTx : Ax− b ≥K 0

. (CP)

01-39

Conic Duality Theorem

• A conic problem

minx

eTs : s ∈ [L + f ] ∩K

is called strictly feasible, if its feasible plane intersects the

interior of the cone K.

(P): minx

cTx : Ax− b ≥K 0

(D): maxy

bTy : ATy = c, y ≥K∗ 0

• Conic Duality Theorem. Consider a conic problem (P) along

with its dual (D).

1. Symmetry: The duality is symmetric: the problem dual

to dual is (equivalent to) the primal;

2. Weak duality: The value of the dual objective at any

dual feasible solution is ≤ the value of the primal objective

at any primal feasible solution;

3. Strong duality in strictly feasible case: If one of the prob-

lems (P), (D) is strictly feasible and bounded, then the other

problem is solvable, and the optimal values in (P) and (D)

are equal to each other.If both (P), (D) are strictly feasible, then both problems

are solvable with equal optimal values.

01-27

minx

cTx : Ax− b ≥K 0

. (CP)

Examples:

• Linear Programming

minx

cTx : Ax− b ≥ 0

(LP )

(K is a nonnegative orthant)

• Conic Quadratic Programming:

minx

cTx : ‖D`x + d`‖2 ≤ eT

` x + f`, ` = 1, ..., k

m

minx

cTx : Ax− b ≡

D1x + d1

eT1 x + f1

...

Dkx + dk

eTk x + fk

≥K 0

,

K = Lm1 × ...× Lmk

is a direct product of Lorentz cones

(CQP )

• Semidefinite Programming:

minx

cTx : Ax−B ≡ x1A1 + ... + xnAn −B º 0

[P º Q ⇔ P ≥Sm

+ Q] (SDP )

Conic Quadratic Problem

Primal

minx

cTx

‖Dix− di‖2 ≤ pTi x− qi i = 1, . . . , k

Rx = r

Dual

maxv,y;u

rTv +K∑i=1

(diyi + qiui)

RTv +∑

(DTi yi + piui) = c

‖yi‖ ≤ ui i = 1, . . . , k

03-05

Program dual to an SDP program

minx

cTx | Ax−B ≡ n∑

j=1xjAj −B º 0

(SDPr)

According to our general scheme, the problem dual to (SDPr)

is

maxY〈B, Y 〉 | A∗Y = c, Y º 0 (SDDl)

(recall that Sm+ is self-dual!).

It is easily seen that the operator A∗ conjugate to A is given

by

A∗Y = (Tr(Y A1), ..., Tr(Y An))T : Sm → Rn.

Consequently, the dual problem is

maxYTr(BY ) | Tr(Y Ai) = ci, i = 1, ..., n, Y º 0 (SDDl)

01-16

• Example: The problem with nonlinear objective and con-

straints

minimizen∑

`=1x2

`

(a) x ≥ 0;

(b) aT` x ≤ b`, ` = 1, ..., n;

(c) ‖Px− p‖2 ≤ cTx + d;

(d) x`+1`

` ≤ eT` x + f`, ` = 1, ..., n;

(e) xl

l+3` x

1l+3l+1 ≥ gT

` x + h`, ` = 1, ..., n− 1;

(f ) Det

x1 x2 x3 · · · xn

x2 x1 x2 · · · xn−1

x3 x2 x1 · · · xn−2... ... ... . . . ...

xn xn−1 xn−2 · · · x1

≥ 1;

(g) 1 ≤ n∑`=1

x` cos(`ω) ≤ 1 + sin2(5ω)∀ω ∈[−π

7 , 1.3]

can be converted, in a systematic way, into an equivalent

problem

minx

cTx : Ax− b º 0

,

” º ” being one of our 3 standard vector inequalities, so that

seemingly highly diverse constraints of the original problem

allow for unified treatment.

03-13

• Lemma on Schur Complement. A symmetric block matrix

A =

P QT

Q R

with positive definite R is positive (semi)definite if and only

if the matrix

P −QTR−1Q

is positive (semi)definite.

Proof. A is º 0 if and only if

infv

u

v

T P QT

Q R

u

v

≥ 0 ∀u. (∗)

When R Â 0, the left hand side inf can be easily computed

and turns to be

uT (P −QTR−1Q)u.

Thus, (∗) is valid if and only if

uT (P −QTR−1Q)u ≥ 0 ∀u,

i.e., if and only if

P −QTR−1Q º 0.

Lecture 2

Robust Solutions of Uncertain Linear Optimization Problems

When treating an uncertain linear inequality aT x ≤ b we’ll use

U =

(a

b

)

=

(a0

b0

)

+∑

ζ`

(a`

b`

)

| ζ ∈ Z

where Z is represented by linear conic inequalities:

Z = ζ | ∃u : Pζ + Qu + p ∈ K

where K is closed convex pointed cone.

Efficient Representation of Sets

X ⊂ IRn × IRk represents X ⊂ IRn, if

X = x ∈ IRn | ∃u ∈ IRk : (x, u) ∈ X

(the projection of X onto the space of the x-variables is exactly X)

Example

X =

x ∈ IRn :

n∑

j=1

|xj | ≤ 1

.

Straightforward representation of X requires the 2n linear inequalities

±x1 ± x2 ± · · · ± xn ≤ 1 .

Alternatively, X can be represented by

X =

(x, u) ∈ IRn × IRn :∑

ui ≤ 1, −uj ≤ xi ≤ uj , ∀ j

requiring only 2n + 2 linear inequalities.

The Constraint-wise Nature of the RC

x1 ≥ ζ1

x2 ≥ ζ2

(ζ1

ζ2

)

∈ U =

(ζ1

ζ2

)∣∣∣∣

ζ1 ≥ 0

ζ1 + ζ2 ≤ 1

⇔x1 ≥ max

ζ1∈Uζ1 = 1

x2 ≥ maxζ2∈u

ζ2 = 1

Same RC under uncertainty

U = U1 × U2 Ui = ζi | 0 ≤ ζi ≤ 1

Ui = proj. of U on ζi-space.

Conclusion The RC of uncertain linear inequalities under uncertainty

set u is the same when u is extended to the direct product

U = U1 × U2 · · · × Um

of its projections onto the spaces of the data of respective constraints.

Moreover, each Ui can be replaced by its closed convex hull.

Focus on a single uncertainty-affected linear inequality—a family

aT x ≤ b

[a;b]∈U

of linear inequalities with the data varying in the uncertainty set

U =

[a; b] =[a0; b0

]+

L∑

`=1

ζ`

[a`; b`

]; ζ ∈ Z

and on “tractable representation” of the RC

aT x ≤ b ∀(

[a; b] =[a0; b0

]+

L∑

`=1

ζ`

[a`; b`

]; ζ ∈ Z

)

(2)

of this uncertain inequality

Tractable Representation of (2): Simple Cases

Example

Z = Box1 ≡ζ ∈ IRL : ‖ζ‖∞ ≤ 1

.

In this case, (2) reads

[a0]T x +L∑

`=1

ζ`[a`]T x ≤ b0 +

L∑

`=1

ζ`b` ∀ (ζ : ‖ζ‖∞ ≤ 1)

⇔L∑

`=1

ζ`[[a`]T x − b`] ≤ b0 − [a0]T x ∀ (ζ : |ζ`| ≤ 1, ` = 1, . . . , L)

⇔ max−1≤ζ`≤1

[L∑

`=1

ζ`[[a`]T x − b`]

]

≤ b0 − [a0]T x

The concluding maximum in the chain is clearly∑L

`=1 |[a`]T x − b`|, so

(2) becomes

[a0]T x +

L∑

`=1

|[a`]T x − b`| ≤ b0,

which in turn admits a representation by a system of linear inequalities:

−u` ≤ [a`]T x − b` ≤ u`, ` = 1, . . . , L,

[a0]T x +∑L

`=1 u` ≤ b0 .

Example

Z = BallΩ =ζ ∈ IRL : ‖ζ‖2 ≤ Ω

.

In this case, (2) reads

[a0]T x +

L∑

`=1

ζ`[a`]T x ≤ b0 +

L∑

`=1

ζ`b` ∀ (ζ : ‖ζ‖2 ≤ Ω)

⇔ max‖ζ‖2≤Ω

[L∑

`=1

ζ`[[a`]T x − b`]

]

≤ b0 − [a0]T x

⇔ Ω

√√√√

L∑

`=1

([a`]T x − b`)2 ≤ b0 − [a0]T x,

a conic quadratic constraint.

The RC of a Linear Inequality-General Case

Z = ζ ∈ IRL : ∃u ∈ IRK : Pζ + Qu + p ∈ K↑

closed convex pointed cone in IRN

U =

(a

b

)

=

(ao

bo

)

+∑

ζ`

(a`

b`

) ∣∣∣∣

ζ ∈ Z

Uncertain linear constraint

aT x ≤ b ∀

(a

b

)

∈ U

becomes (ao)T x − bo

︸ ︷︷ ︸

d[x]

+

L∑

`=1

ζ` [(a`)T x − b`]︸ ︷︷ ︸

c`[x]

≤ 0 ∀ ζ ∈ Z

⇔ supζ∈Z

cT [x]ζ + d[x] ≤ 0

⇔ supζ,u

cT [x]ζ | Pζ + Qu + p ∈ K ≤ −d[x] . (∗)

Theorem Let the perturbation set Z be given by

Z = ζ ∈ IRL | ∃u ∈ IRK : Pζ + Qu + p ∈ K

where K is a closed convex pointed cone in IRN which is either

polyhedral, or is such that

∃ ζ, u : P ζ + Qu + p ∈ int K .

Consider the robust counterpart of a linear inequality:

aT x ≤ b ∀

(a

b

)

∈ U

where

U =

(a

b

)

=

(ao

bo

)

+

L∑

`=1

ζ`

(a`

b`

)∣∣∣∣∣

ζ ∈ Z

.

Then a vector x ∈ IRn is robust feasible if and only if ∃ y ∈ IRL,

which together with x satisfies the following linear/conic inequalities:

pT y + (ao)T x ≤ b0

QT y = 0

(PT y)` + (a`)T x = b`, ` = 1, 2, . . . , L

y ∈ K∗

where K∗ = y | yT v ≥ 0, ∀ v ∈ K is the dual cone of K.

Illustration: Single-Period Portfolio Selection

There are 200 assets. Asset #200 (“money in the bank”) has yearly

return r200 = 1.05 and zero variability. The yearly returns r`,

` = 1, . . . , 199 of the remaining assets are independent random variables

taking values in the segments [µ` − σ`, µ` + σ`] with expected values µ`;

here

µ` = 1.05 + 0.3(200 − `)

199, σ` = 0.05 + 0.6

(200 − `)

199, ` = 1, . . . , 199 .

The goal is to distribute $1 between the assets in order to maximize the

return of the resulting portfolio, the required risk level being ε = 0.5%.

We want to solve the uncertain LO problem

maxy,t

t :

199∑

t=1

r`y` + r200y200 − t ≥ 0,

200∑

`=0

y` = 1, y` ≥ 0 ∀ `

,

where y` is the capital to be invested into asset #`.

The uncertain data are the returns r`, ` = 1, . . . , 199; their natural

parameterization is

r` = µ` + σ`ζ` ,

where ζ`, ` = 1, . . . , 199, are independent random perturbations with

zero mean varying in the segments [−1, 1]. Setting x = [y;−t] ∈ IR201,

the problem becomes

minimize x201

subject to

(a)[

a0 +∑199

`=1 ζ`a`]T

x −[

b0 +∑199

`=1 ζ`b`]

≤ 0

(b)∑200

j=1 x` = 1

(c) x` ≥ 0, ` = 1, . . . , 200

(4)

where

a0 = [−µ1;−µ2; . . . ;−µ199;−r200;−1]; a` = σ` · [0`−1,1; 1; 0201−`,1], ` = 1, . . . , 199;

b` = 0, ` = 0, 1, . . . , 199 .

The only uncertain constraint in the problem is the linear inequality

(a). We consider 3 perturbation sets along with the associated robust

counterparts of problem (4).

1. Box RC which ignores the information on the stochastic nature of

the perturbations affecting the uncertain inequality and uses the

only fact that these perturbations vary in [−1, 1]. The underlying

perturbation set Z for (a) is ζ : ‖ζ‖∞ ≤ 1 ;

2. Ball-Box with the safety parameter Ω =√

2 ln(1/ε) = 3.255, which

ensures that the optimal solution of the associated RC (a CQ prob.)

satisfies (a) with probability at least 1 − ε = 0.995. The underlying

perturbation set Z for (a) is ζ : ‖ζ‖∞ ≤ 1, ‖ζ‖2 ≤ 3.255 ;

3. Budgeted uncertainties with the uncertainty budget

γ =√

2 ln(1/ε)√

199 = 45.921, which results in the same

probabilistic guarantees as for the Ball-Box RC. The underlying

perturbation set Z for (a) is ζ : ‖ζ‖∞ ≤ 1, ‖ζ‖1 ≤ 45.921 ;

Results

Box RC. The associated RC is the LP

maxy,t

t :

199∑

`=1

(µ` − σ`)y` + 1.05y200 ≥ t

200∑

`=1

y` = 1, y ≥ 0

;

as it should be expected, this is nothing but the instance of our

uncertain problem corresponding to the worst possible values

r` = µ` − σ`, ` = 1, . . . , 199, of the uncertain returns. Since these

values are less than the guaranteed return for money, the robust

optimal solution prescribes to keep our initial capital in the bank with

guaranteed yearly return 1.05.

Ball-Box RC. The associated RC is the conic quadratic problem

maxy,z,w,t

t :

199∑

`=1

(µ`y` + 1.05y200 −199∑

`=1

|z`| − 3.255

√√√√

199∑

`=1

w2` ≥ t

z` + w` = y`, ` = 1, . . . , 199,

200∑

`=1

y` = 1, y ≥ 0

.

The robust optimal value is 1.1200, meaning 12.0% profit with risk as

low as ε = 0.5%.

02-34

Example 1: Synthesis of Antennae array

♣ The diagram of an antenna. Consider a (monochromatic)

antenna placed at the origin. The electric field generated by

the antenna at a remote point rδ (δ is a unit direction) is

E = a(δ)r−1 cos (φ(δ) + tω − 2πr/λ) + o(r−1)

• t: time • ω: frequency • λ: wavelength

• It is convenient to aggregate a(δ) and φ(δ) into a single

complex-valued function – the diagram of the antenna

D(δ) = a(δ)(cos(φ(δ)) + i sin(φ(δ))).

• The directional density of the energy sent by the antenna

is proportional to |D(·)|2• The diagram D(·) of a complex antenna comprised of sev-

eral antenna elements is the sum of the diagrams Di(·) of

the elements:

D(δ) = D1(δ) + ... + DN(δ)

3 45 Ù 2! ׳#" (Y 3 45ÁÏØÖÙ '!#FúUI$Y+\ - Ñ;

X

Y

Z

θ

í;ü ãþýÅç7 ,2 2.ÿ? pÈ ãþýÅç ã "ç

4

¸ÈôÌ#F- ô!Ñ+\ô ! Ìô! F!# ý $÷½

Ø ÿ 7YF- +

X

Z

Y

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6−1.5

−1

−0.5

0

0.5

1

1.5

2

2.5

3

3.5

9 8Å2'Q'!# ×ÖÌ#F- ,°Û! '!#Ø ×³) 9 8Å2' '!# ' !³úB+F! ÒÔ !,Ìô'È G G ? PUõ ­õO!$!$!põí PÒ (!#% íÑ! Ø F#° - ''- .8Ô!Ô- )7 '- - °\! ô#"Ì- - ô ú' ' ! ­F!# ° ! '25 íÎì ý6ì í ! L'ô!8ú4\ (

Ü! #"?$&%'$)(?+* wwww ?ÐG *-, ä G íü-, ãþýÅç. /10 2354 %76

wwwwô! ! Ñ5'PÓ!)ï ò ! ôÌ#F- ' ! ­F!# ° '25 8Ûì ýÔì 9í'Î!ÑF$ z"-

:éì ýÎì 9­í;=< íÑ!>9 ì í ãþýÅç¬ì ï òÐ !QôÌ#F- "'ô!6!³F!#Ö '2øø!% ;Ì#°

è í ãþýÅçèÑì ÷ý!

¢-?

Worst-case chance constraints

Desired diagram graphically

0 20 40 60 80−0.2

0

0.2

0.4

0.6

0.8

1

Dia

gram

Angle

0.5

1

30

210

60

240

90

270

120

300

150

330

180 0

Postek et al. (2015) 29 / 34

Ù 2! ׳#" ï Å;' !ôëòØ @ jhtY R üZ_BA V¹l Y R ' #"r("Ñ 42$ ! +'- F ' !Ï +#F-

ÜÞÝvßàq <àààààààààà= àààààààààà> â

ì ?FG *-, í;ü-, ãký çzä G ì í ì ý DC í; ì ?FG *-, í;ü-, ãký çzä G ì õ í; ì ý DC 8íù!E9 ì ?FG *-, í ü-, ãký çzä G ì õ 8 ì ý ì 9í;

K ààààààààààMààààààààààNï fÚ- ò

ÒÔ! áý îF+*-, 'Ô ­öù! #'ô ° Ì!#"¿÷þØ ÑFI$P ! Å"- +ô Ò #2 ä G Ì L!ÌFP'5'8° +2$Y'ÅFøô! (Y'Å Ìô F !Ñc ba PÑô 42$ '- +/ú- 2) "8þÖ VR jhtY R üZ_ùû V¹l Yö R T ü R v V bYö_+` l3VR-l YiýY V ý¥j³ý VY$h8iU_ V h VR ýüýYj R VX&X j Xl:G

¢|¢

02-52

How it works? – Antenna Example

minx,τ

τ : −τ ≤ D∗(θ`)−

10∑

j=1xjDj(θ`) ≤ τ, ` = 1, ..., L

mminx,τ

τ : Ax + τa + b ≥ 0 (LP)

• The influence of “implementation errors”

xj 7→ (1 + εj)xj

is as if there were no implementation errors, but the part

A of the constraint matrix was uncertain and known “up

to multiplication by a diagonal matrix with diagonal entries

from [0.999, 1.001]”:

Uini = A = AnomDiag(1 + ε1, ..., 1 + ε10) : |εj| ≤ 0.001 (U)

Note that

As far as a particular constraint is concerned, the uncertainty

is an interval one with δAij = 0.001|Aij|. The remaining coef-

ficients (and the objective) are certain.

♣ To improve reliability of our design, we could replace the

uncertain LP program (LP), (U) with its robust counterpart,

which is nothing but an explicit LP program.

However, to work directly with Uini would be “too conserva-

tive” – we would ignore the fact that the implementation er-

rors are random and independent, so that the probability for

all of them to take simultaneously the “most unfavourable”

values is negligibly small.

Let us try to define the uncertainty set in a smarter way.

Worst-case chance constraints

Nominal solution - dream and reality

0 20 40 60 80

0

0.5

1

Dia

gram

Angle

Nominal solution − no implementation error

0.5

1

30

210

60

240

90

270

120

300

150

330

180 0

No implementation error − polar plot

0 20 40 60 80−100

−50

0

50

Dia

gram

Angle

Nominal solution − implementation error ρ=0.001

50

100

30

210

60

240

90

270

120

300

150

330

180 0

Implementation error − polar plot

Postek et al. (2015) 31 / 34

02-55

♣ Applying the outlined methodology to our Antenna exam-

ple:

minx,τ

τ : −τ ≤ D∗(θ`)−

10∑

j=1xjDj(θ`) ≤ τ, 1 ≤ ` ≤ 120

(LP)

⇓minx,τ

τ

D∗(θ`)− 10∑j=1

xjDj(θ`) + κσ√√√√ 10∑

j=1x2

jD2j (θ`) ≤ τ

D∗(θ`)− 10∑j=1

xjDj(θ`)− κσ√√√√ 10∑

j=1x2

jD2j (θ`) ≥ −τ

1 ≤ ` ≤ 120

(RC)

[σ = 0.001]

we get a robust design.

: - - F$ Ù 2! ×6#fQ - +- 2) 0.2

0.4

0.6

0.8

1

30

210

60

240

90

270

120

300

150

330

180 0

103110

0.24673

0.49346

0.74019

0.98692

30

210

60

240

90

270

120

300

150

330

180 0

103110

0.24378

0.48756

0.73134

0.97512

30

210

60

240

90

270

120

300

150

330

180 0

103110

fQ- '!Ì ÿ '''+'ô!Ì 3 "4 2(Ì 3 45- +- 2) þ £¢

129.7185

259.437

389.1556

30

210

60

240

90

270

120

300

150

330

180 0

103110

0.24656

0.49311

0.73967

0.98623

30

210

60

240

90

270

120

300

150

330

180 0

103110

0.24384

0.48767

0.73151

0.97534

30

210

60

240

90

270

120

300

150

330

180 0

103110

fQ- '!Ì ÿ '''+'ô!Ì 3 "4 2(Ì 3 45- +- 2" ­ ¢

1035.5601

2071.1202

3106.6803

4142.2405

5177.8006

30

210

60

240

90

270

120

300

150

330

180 0

103110

0.24878

0.49757

0.74635

0.99513

30

210

60

240

90

270

120

300

150

330

180 0

103110

0.24233

0.48466

0.72699

0.96932

30

210

60

240

90

270

120

300

150

330

180 0

103110

fQ- '!Ì ÿ '''+'ô!Ì 3 "4 2(Ì 3 45

§|§

02-41

Example 2: NETLIB Case Study: Diagnosis

♣ NETLIB is a collection of about 100 not very large LPs,mostly of real-world origin. To motivate the methodologyof our “case study”, here is constraint # 372 of the NETLIBproblem PILOT4:

aT x ≡ −15.79081x826 − 8.598819x827 − 1.88789x828 − 1.362417x829 − 1.526049x830

−0.031883x849 − 28.725555x850 − 10.792065x851 − 0.19004x852 − 2.757176x853

−12.290832x854 + 717.562256x855 − 0.057865x856 − 3.785417x857 − 78.30661x858

−122.163055x859 − 6.46609x860 − 0.48371x861 − 0.615264x862 − 1.353783x863

−84.644257x864 − 122.459045x865 − 43.15593x866 − 1.712592x870 − 0.401597x871

+x880 − 0.946049x898 − 0.946049x916

≥ b ≡ 23.387405

(C)

The related nonzero coordinates in the optimal solution x∗ ofthe problem, as reported by CPLEX, are:

x∗826 = 255.6112787181108 x∗827 = 6240.488912232100 x∗828 = 3624.613324098961x∗829 = 18.20205065283259 x∗849 = 174397.0389573037 x∗870 = 14250.00176680900x∗871 = 25910.00731692178 x∗880 = 104958.3199274139

This solution makes (C) an equality within machine precision.

♣Most of the coefficients in (C) are “ugly reals” like -15.79081

or -84.644257. We definitely may believe that these coef-

ficients characterize technological devices/processes, and as

such hardly are known to high accuracy. Thus, “ugly coefficients”

may be assumed to be uncertain and to coincide with the

“true” data within accuracy of 3-4 digits. The only excep-

tion is the coefficient 1 of x880, which perhaps reflects the

structure of the problem and is exact.

02-42

aT x ≡ −15.79081x826 − 8.598819x827 − 1.88789x828 − 1.362417x829 − 1.526049x830

−0.031883x849 − 28.725555x850 − 10.792065x851 − 0.19004x852 − 2.757176x853

−12.290832x854 + 717.562256x855 − 0.057865x856 − 3.785417x857 − 78.30661x858

−122.163055x859 − 6.46609x860 − 0.48371x861 − 0.615264x862 − 1.353783x863

−84.644257x864 − 122.459045x865 − 43.15593x866 − 1.712592x870 − 0.401597x871

+x880 − 0.946049x898 − 0.946049x916

≥ b ≡ 23.387405

(C)

x∗826 = 255.6112787181108 x∗827 = 6240.488912232100 x∗828 = 3624.613324098961x∗829 = 18.20205065283259 x∗849 = 174397.0389573037 x∗870 = 14250.00176680900x∗871 = 25910.00731692178 x∗880 = 104958.3199274139

♣ Assume that the uncertain entries of a are 0.1%-accurate

approximations of unknown entries in the “true” data a, how

would this uncertainty affect the validity of the constraint

evaluated at the nominal solution x∗?• The worst case, over all 0.1%-perturbations of uncertain

data, violation of the constraint is as large as 450% of the right

hand side!

• In the case of random and independent 0.1% perturbations

of the uncertain coefficients, the statistics of the “relative

constraint violation”

V =max[b− aTx∗, 0]

b× 100%

also is disastrous:

ProbV > 0 ProbV > 150% Mean(V )

0.50 0.18 125%

Relative violation of constraint # 372 in PILOT4

(1,000-element sample of 0.1% perturbations of the

uncertain data)

♣ We see that quite small (just 0.1%) perturbations of “obviously

uncertain” data coefficients can make the “nominal” optimal solution

x∗ heavily infeasible and thus – practically meaningless.

02-43

♣ In our Case Study, we choose a “perturbation level” ε (tak-

ing values 1%, 0.1%, 0.01%), and, for every one of the NETLIB

problems, measure the “reliability index” of the nominal so-

lution at this perturbation level, specifically, as follows.

1. We compute the optimal solution x∗ of the program by

CPLEX.

2. For every one of the inequality constraints

aTx ≤ b (∗)• we split the right hand side coefficients aj into “cer-

tain” (rational fractions p/q with |q| ≤ 100) and “uncer-

tain” (all the rest). Let J be the set of all uncertain

coefficients of (∗).• we define the reliability index of (∗) as

aTx∗ + ε√ ∑

j∈Ja2

j(x∗j)2 − b

max[1, |b|] × 100% (I)

Note that the reliability index is of order of typical viola-

tion (measured in percents of the right hand side) of

the constraint, as evaluated at x∗, under independent random

perturbations, of relative magnitude ε, of the uncertain coeffi-

cients.

3. We treat the nominal solution as unreliable, and the prob-

lem - as bad, the level of perturbations being ε, if the

worst, over the inequality constraints, reliability index is

worse than 5%.

02-44

♣ The results of the Diagnosis phase of our Case Study are

as follows.

From the total of 90 NETLIB problems we have processed,

• in 27 problems the nominal solution turned out to be

unreliable at the largest (ε = 1%) level of uncertainty;

• 19 of these 27 problems are already bad at the 0.01%-

level of uncertainty, and in 13 of these 19 problems, 0.01%

perturbations of the uncertain data can make the nominal

solution more than 50%-infeasible for some of the constraints.

Problem Sizea) ε = 0.01% ε = 0.1% ε = 1%#badb) Indexc) #bad Index #bad Index

80BAU3B 2263× 9799 37 84 177 842 364 8,42025FV47 822× 1571 14 16 28 162 35 1,620ADLITTLE 57× 97 2 6 7 58AFIRO 28× 32 1 5 2 50BNL2 2325× 3489 24 34BRANDY 221× 249 1 5CAPRI 272× 353 10 39 14 390CYCLE 1904× 2857 2 110 5 1,100 6 11,000D2Q06C 2172× 5167 107 1,150 134 11,500 168 115,000E226 224× 282 2 15FFFFF800 525× 854 6 8FINNIS 498× 614 12 10 63 104 97 1,040GREENBEA 2393× 5405 13 116 30 1,160 37 11,600KB2 44× 41 5 27 6 268 10 2,680MAROS 847× 1443 3 6 38 57 73 566NESM 751× 2923 37 20PEROLD 626× 1376 6 34 26 339 58 3,390PILOT 1442× 3652 16 50 185 498 379 4,980PILOT4 411× 1000 42 210,000 63 2,100,000 75 21,000,000PILOT87 2031× 4883 86 130 433 1,300 990 13,000PILOTJA 941× 1988 4 46 20 463 59 4,630PILOTNOV 976× 2172 4 69 13 694 47 6,940PILOTWE 723× 2789 61 12,200 69 122,000 69 1,220,000SCFXM1 331× 457 1 95 3 946 11 9,460SCFXM2 661× 914 2 95 6 946 21 9,460SCFXM3 991× 1371 3 95 9 946 32 9,460SHARE1B 118× 225 1 257 1 2,570 1 25,700

a) # of linear constraints (excluding the box ones) plus 1 and # of variablesb) # of constraints with index > 5%c) The worst, over the constraints, reliability index, in %

02-59

How it works? NETLIB Case Study

♣We solved the Robust Counterparts of the bad NETLIB prob-

lems, assuming interval uncertainty in “ugly coefficients” of

inequality constraints and no uncertainty in equations. It turns

out that

• Reliable solutions do exist, except for 4 cases correspond-

ing to the highest (ε = 1%) perturbation level.

• The “price of immunization” in terms of the objective

value is surprisingly low: when ε ≤ 0.1%, it never exceeds

1% and it is less than 0.1% in 13 of 23 cases. Thus, passing

to the robust solutions, we gain a lot in the ability of the solu-

tion to withstand data uncertainty, while losing nearly nothing in

optimality.

Lecture 3

Robust Conic Quadratic Problems

Theorem [S-Lemma] Let A, B be symmetric n × n matrices, and

assume that the quadratic inequality

xT Ax ≥ 0 (1)

is strictly feasible: there exists xT Ax > 0. then the quadratic inequality

xT Bx ≥ 0 (2)

is a consequence of (1) if and only if it is a linear consequence of (1),

i.e., if and only if there exists a nonnegative λ such that

B λA .

The RC of a Quadratically Constrained Problem

minx∈IRn

γT x : xT AT Ax ≤ 2bT x + c, ∀ (A, b, c) ∈ U

, (UQ)

with single ellipsoid uncertainty, namely

U =

(A, b, c) = (A0, b0, c0) +

L∑

`=1

u`(A`, b`, c`) : ‖u‖2 ≤ 1

. (1)

Theorem 1 A robust counterpart of (UQ) with the uncertainty set Ugiven by (1) is equivalent to the SDP problem

min(x,λ)∈IRn

×IRγT x

s.t.

c0 + 2xT b0 − λ 12c1 + xT b1 · · · 1

2cL + xT bL (A0x)T

12c1 + xT b1 λ (A1x)T

.... . .

...12cL + xT bL λ (ALx)T

A0x A1x · · · ALx Im

0 , (RUQ)

user
Note

Theorem 4 The set Rρ of (x, λ) satisfying λ ≥ 0 and

SDP (ρ)

c[x]−

∑Kk=1 λk (−bρ[x]− dρ)T a[x]T

−bρ[x]− dρ∑K

k=1 λkQk −Aρ[x]T

a[x] −Aρ[x] IM

0

is an approximate robust counterpart of the set Xρ of robust feasible solutions

of (UQC), i.e. (12) holds:

uTQku ≤ 1 , k = 1, . . . , K ⇒ (12)

uTAρ[x]TAρ[x]u+ 2uT (Aρ[x]Ta[x]− bρ[x]− dρ) ≤ c[x]− a[x]Ta[x].

The level of conservativeness of Xρ is

Ω ≤ 9.19√

logK (K is the number of ellipsoids)

i.e.

opt(ρ) ≤ SDP (ρ) ≤ opt(Ωρ) .

For box uncertainty set ‖u‖∞| ≤ |

Ω ≤ π

2.

Robust Semidefinite Optimization

Uncertain Semidefinite Programming

• For an Uncertain Semidefinite problem

(USD):min

x

cT x : A[x] ≡ A0 +

n∑

j=1

xjAj º 0

: [A0, . . . , An] ∈ U

the RC can be NP-hard already in the simplestcases when U is a box or an ellipsoid.

• The strongest generic result on tightcomputationally tractable approximations ofUncertain Semidefinite constraints deals with thecase of structured norm-bounded perturbations

U =

A[x] = An[x] +

L∑

`=1

[LT` ∆`R`[x] + RT

` [x]∆T` L`] :

∆` ∈ IRd`×d` , ‖∆`‖ ≤ ρ , ∆` = δ`Id`, ` ∈ IS

– An[x]: symmetric m×m matrix affinely dependingon x

– R`[x] : d` ×m matrix affinely depending on x

(RC):An[x] +

L∑

`=1

[LT` ∆`R`[x] + RT

` [x]∆T` L`] º 0

∀∆`L`=1 : ∆` ∈ IRd`×d` , ‖∆`‖ ≤ ρ, ∆` = δ`Id`

, ` ∈ IS

Theorem 1 (Ben-Tal, Nemirovski, Roos ’02) TheRobust Counterpart (RC) of an Uncertain LMI withstructured norm-bounded perturbations admits aϑ-tight approximation which is an explicit semidefiniteprogram. The tightness factor ϑ depends solely on themaximum of sizes d` of scalar perturbation blocks

µ = maxd` : ` ∈ IS

specifically,

ϑ ≤ π√

µ

2.

If there are no scalar perturbation blocks (IS = φ), orall scalar perturbation blocks are of size 1 (d` = 1), then

ϑ =π

2.

In the case of a single perturbation block (L = 1),ϑ = 1, i.e. (RC) is equivalent to an explicit single LMI.

Unstructured Norm-BoundedPerturbations

Definition 1 We say that uncertain LMI

Aζ(y) ≡ An(y) +L∑

`=1

ζ`A`(y) º 0 , (1)

is with unstructured norm-bounded perturbations, if

1. The perturbation set Z is the set of all p× q

matrices ζ with the usual matrix norma ‖ · ‖2,2 notexceeding 1;

2. Aζ(y) can be represented as

Aζ(y) ≡ An(y) +[LT (y)ζR(y) + RT (y)ζT L(y)

],

(2)where both L(·), R(·) are affine and at least one ofthese matrix-valued functions is independent of y.

1

We have proved the following statement:

Theorem: 1 The RC

An(y)+LT (y)ζR+RT ζT L(y) º 0, ∀ (ζ ∈ IRp×q : ‖ζ‖2,2 ≤ ρ)(3)

of uncertain LMI (1) with unstructured norm-boundeduncertainty (2) (where, w.l.o.g., we assume that R 6= 0and R is independent of y) and uncertainty level ρ canbe represented equivalently by the LMI

λIp ρL(y)

ρLT (y) An(y)− λRT R

º 0 (4)

in variables y, λ.

3

Example: Truss Topology Design

A truss is a mechanical construction, like railroadbridge, electric mast or Eiffel Tower, comprised of thinelastic bars linked with each other at nodes. Some ofthe nodes are partially or completely fixed. An externalload is a collection of external forces acting at thenodes; under such a load, the nodes slightly move, thuscausing elongations and compressions in bars, until theconstruction achieves an equilibrium, where thetensions caused in the bars as a result of theirdeformations compensate the external forces. Thecompliance is the potential energy capacitated in thetruss at the equilibrium as a result of deformations ofthe bars.

A mathematical model of the outlined situation is asfollows.

4

TTD

minx∈IRM , t∈IRN

fT x

A(t)x = f

t ∈ T

where

A(t) =N∑

i=1

tibibTi

is the stiffness matrix of the truss;bi is a vector given in terms of the material property ofbar i (Young modulus) and the “nominal” (i.e. in theunloaded truss) position of the nodes.

T =

t ∈ IRN : t ≥ 0,∑

tj ≤ w

ti represent volume of bar i.

fT x is the compliance — the potential energycapacitated in the truss at the equilibrium due toexternal force f ∈ IRM acting on the nodes.

An equivalent formulation of a TTD problem is asfollows:

5

Complf (A) = minx

12fT x | A(t)x = f

= maxx

fT x− 1

2xT A(t)x

(TTD) is equivalent to

mint∈T

Complf (A) = mint∈T

maxx

fT x− 1

2xT A(t)x

⇔ mint∈Tτ∈R

τ

fT x− 12

xT A(t)x ¹ τ ∀x

⇔ mint,ττ | 2τs2 − 2s fT (sx) +

12(xs)T A(t)(sx) º 0

∀x, s

⇔ mint,ττ | 2τs2 − 2s fT v +

12vT A(t)v º 0, ∀ v, s

⇔ mint,τ

τ |

2τ −fT

−f A

º 0

⇔ mint∈T

τ |

2τ f

f A

º 0

(SDP)

6

By Theorem 1, (5) is equivalent to the single LMI

λI ρL

ρLT AN − λRT R

º 0 (6)

which here becomes

λI 0 ρBT

0 2τ − λ fTN

ρB fN A(t)

º 0 (7)

8

04-115

Example: Assume we are designing a planar truss acantilever; the 9 9 nodal structure and the only loadof interest f are as shown on the picture:

9 9 ground structure and the load of interest

The optimal single-load design yields a nice truss asfollows:

aaaaaaaaaaaa

!!!!

!!!!

!!!!

""""""""""""

bbbbbbbbbbbb

@@@

@

@@@@

@@@@@@@@@@

@@@@@@@@@@@@

AAAAA

@@@@@@@

@@@

@@@@@

!!!!

!!!!

!!!!

!!!!!

aaaaaaaaaaaaaaaaa

JJJJJJJJ

@@@

Optimal cantilever (single-load design)the compliance is 1.000

A Guide to Deriving Robust Counterparts

Consider the uncertain constraint

f(a, x) ≤ 0 , (5)

where x ∈ IRn is the optimization variable, f(., x) is concave for all x ∈ IRn,

and a ∈ IRm is an uncertain vector, which is only known to reside in a set U .

The robust counterpart of (5) is then

(RC) f(a, x) ≤ 0 , ∀ a ∈ U , (6)

where the uncertainty set U is modeled as follows:

U = a = a0 + Aζ | ζ ∈ Z ⊂ IRL .

Here a0 ∈ IRm is the so-called “nominal value”, the matrix A is given column

wise: A = (a1a2, . . . , aL) ∈ IRm×L, ζ is called the vector of “primitive uncer-

tainties”, and Z is a given nonempty, convex and compact set, with 0 ∈ ri(Z).

With this formulation, it is required to determine the value of x before the actual

realization of a is available (“here and now” decisions).

Definition 1 The nominal vector a0 is called regular if a0 ∈ ri(dom f(., x)),

∀x.

Note that when a0 is regular and since 0 ∈ ri(Z), then the following holds:

ri(U) ∩ ri(dom f)(., x)) 6= ∅ , ∀x . (7)

The robust inequality (RC) can be rewritten as

maxa∈U

f(a, x) ≤ 0 . (8)

A General Principle

The dual of the optimization problem in (8) has the general form

ming(b, x) | b ∈ Z(x) . (9)

Under suitable convexity and regularity conditions on f(., x) andU (such as (7))

strong duality holds between the maximization problem in (8) and (9); hence, x

is robust feasible if and only if

ming(b, x) | b ∈ Z(x) ≤ 0 . (10)

So finally, x is robust feasible for (6) if and only if x and b solve the systemg(b, x) ≤ 0

b ∈ Z(x) .

(11)

In case strong duality does not hold, we still have (by weak duality) that (10)

implies (9). Hence, whenever x and some b solve (11), then x satisfies (6), i.e.,

it is robust feasible.

Fenchel Duality

The primal problem is given as follows:

(P ) inff(x)− g(x)|x ∈ dom(f) ∩ dom(g) .

The Fenchel dual of (P ) is given by:

(D) supg∗(y)− f ∗(y)|y ∈ dom(g∗) ∩ dom(f ∗) .

The Fenchel duality theorem is stated next.

Theorem 5 If ri(dom(f)) ∩ ri(dom(g)) 6= ∅ then the optimal values of (P )

and (D) are equal and the maximal value of (D) is attained.

If ri(dom(g∗)) ∩ ri(domf ∗)) 6= ∅ then the optimal values of (P ) and (D) are

equal and the minimum value of (P ) is attained. 2

Note that since f ∗∗ = f and g∗∗ = g, we have that the dual of (D) is (P ).

The next basic result gives an equivalent reformulation for (RC) which can be

used extensively to derive tractable RCs.

Theorem 2 Let a0 be regular. Then the vector x ∈ IRn satisfies (RC) if and

only if x ∈ IRn, v ∈ IRm satisfy the single inequality

(FRC) (a0)Tv + δ∗(ATv|Z)− f∗(v, x) ≤ 0 , (12)

in which the support function δ∗ and the partial concave conjugate function f∗

are defined in (3) and (1), respectively.

Proof Using the definition of indicator functions (2) and using Fenchel duality,

we have

F (x) := maxa∈U

f(a, x) (13)

= maxa∈Rmf(a, x)− δ(a|U) (14)

= minv∈Rmδ∗(v|U)− f∗(v, x) , (15)

(strong duality holds by regularity) where

f∗(v, x) = infa∈IRmaTv − f(a, x) ,

and

δ∗(v|U) = supζ∈ZaTv | a = a0 + Aζ (16)

= (a0)Tv + supζ∈Z

vTAζ (17)

= (a0)Tv + δ∗(ATv|Z) . (18)

By (18) and (15), F (x) ≤ 0 is exactly condition (12).

Remarks

1. In (FRC), the computation involving f are completely independent from

those involving Z.

2. To derive (FRC) we did not assume f(a, x) to be convex in x. However, if

f(a, x) is convex in x, ∀ a ∈ U , then f∗(v, x) is concave in (v, x), so then

(FRC) is a convex inequality in (v, x).

3. It is interesting to observe that robustifying a nonlinear constraint may have

a “convexification effect”. This is illustrated in the next nominal constraint

which is nonconvex, but whose robust counterpart is convex. We consider

the following robust counterpart:

f(a, x) :=m∑i=1

aifi(x) ≤ b ∀ a ∈ U ,

where U = a ∈ IRm|‖a− a0‖∞ ≤ ρ, and let ρ+ a0i ≥ 0, i = 1, . . . ,m.

We assume that fi(x), i = 1, . . . ,m, are convex and fi(x) ≥ 0, ∀x. Sup-

pose (some of) the nominal values a0i are negative, which means that the

nominal inequalitym∑i=1

a0ifi(x) ≤ b

may not be convex. However, in this case the (FRC)

m∑i=1

(a0i + ρ)fi(x) ≤ b

is indeed a convex inequality.

Conjugate Functions, Support Functions and Fenchel Duality

We outline some basic results on conjugate functions, support functions and

Fenchel duality.

We start with some well-known results on conjugate functions. First, note that

f ∗ is closed convex, and g∗ is closed concave; moreover, f ∗∗ = f and g∗∗ = g.

It is well-known that for a > 0

(af)∗(y) = af ∗(ya

)and for f(x) = f(ax), a > 0, we have f ∗(y) = f ∗

(ya

)and for f(x) = f(x− a) we have f ∗(y) = f ∗(y) + ay .

We frequently use the following sum-rules for conjugate functions.

Lemma 1 Assume that fi, i = 1, . . . ,m, are convex, and the intersection

of the relative interiors of the domains of fi, i = 1, . . . ,m, is nonempty, i.e.,

∩mi=1ri(dom fi) 6= ∅. Then(m∑i=1

fi

)∗(s) = inf

vimi=1

m∑i=1

f ∗i (vi)

∣∣∣∣∣m∑i=1

vi = s

, ,

and the inf is attained for some vi, i = 1, . . . ,m.

In particular, let S1, . . . , Sk be closed convex sets, such that ∩iri(Si) 6= ∅, and

let S = ∩ki=1Si. Then

δ∗(y|S) = min

k∑

i=1

δ∗(vi|Si)

∣∣∣∣∣k∑

i=1

vi = y

.

2

We now state three results which are used to derive tractable robust counterparts.

The first lemma relates the conjugate of the adjoint function (f♦(x) = xf(1x

),

x > 0) to the conjugate of the original function. Note that f♦(x) is convex

if f(x) is convex. The next proposition can be used in cases where f ∗ is not

available in closed form, but (f♦)∗ is available as such.

Lemma 2 For the conjugate of a function f : IR+ −→ IR and the conjugate of

its adjoint f♦, we have f ∗(s) = infy ∈ IR : (f♦)∗(−y) ≤ −s. 2

The next proposition can be used in cases where f−1 is not available in closed

form, but (f−1)∗ is available as such.

Lemma 3 Let f : IR −→ IR be strictly increasing and concave. Then, for all

y > 0

(f−1)∗(y) = −yf∗(1

y

)= −(f∗)♦(y) . 2

The next proposition gives a useful result related to the conjugate of a function

after linear transformations.

Lemma 4 Let A be a linear transformation from IRn to IRm. Assume there

exists an x such that Ax ∈ ri(dom g). Then, for each convex function g on

IRm, one has

(gA)∗(z) = infyg∗(y) | ATy = z .

where for each z the infimum is attained, and where the function gA is defined

by

(gA)(x) = g(Ax) . 2

f(t) f ∗(s) (domain)

t 0 (s = 1)

t2 s2/4 (s ∈ IR)

|t|p/p (p > 1) |s|q/q (s ∈ IR)

−tp/p (t ≥ 0, 0 < p < 1) −(−s)q/q (s ≤ 0)

− log t (t > 0) −1− log(−s) (s < 0)

t log t (t > 0) es−1 (s ∈ IR)

et

s log s− s (s > 0)

0 (s = 0)

log(1 + et)

s log s+ (1− s) log(1− s) (0 < s < 1)

0 (s = 0, 1)

√1 + t2 −

√1− s2 (−1 ≤ s ≤ 1)

Some examples for f , with conjugate f ∗. The parameters p and q are related as follows:

1/p+ 1/q = 1.

Example (Transformed uncertainty region). Suppose f(a, x) = aTx − β and

the uncertainty region U is defined as follows:

U =(a | h(a) = h1(a1), . . . , hm(am))T ∈ U

,

where hi(·) is convex for each i, and moreover we assume that h−1i exists for all

i. By substituting a = h−1(a), we obtain for (RC)

h−1(a)Tx ≤ β ∀ a ∈ U .

The corresponding (FRC) becomes:

(a0)Tv + δ∗(ATv|Z) +n∑i=1

xi((hi)

−1)∗ (vi/xi) ≤ β .

Using the result in the conjugate of h−1, we finally get:

(a0)Tv + δ∗(ATv|Z) +n∑i=1

vi(hi)∗(xi/vi) ≤ β .

The result shows that even if we cannot compute a closed form for h−1i , we can

still construct the robust counterpart. As an example, take hi(t) = −(t+ log t).

There is no closed form for h−1i , but there is an explicit formula for the conjugate

of hi:

h∗i (s) = −1− log(−(s+ 1)) . 2

Finally, we give some examples in which f(a, x) cannot be written as f(a)Tg(x),

but still f∗(v, x) can be computed.

Nonconcave uncertainty

So far, it was assumed that f(a, x) is concave in a for each x. If this assumption

does not hold, one may try to reformulate the (RC) problem to regain convexity.

We describe different ways for such a reformulation:

Reparametrizing and computing convex hull. Suppose f(a, x) can be written

as f(a)Tg(x), where f(a) is not necessarily concave and/or g(x) may attain

positive and negative values. Let us, for ease of notation, also assume that

m = L, A = 1, and a0 = 0. By the parametrization b = f(a), (RC) becomes

bTg(x) ≤ 0 ∀ (a, b) ∈ U ,

where U = (a, b) | a ∈ U , b = f(a). Since the left-hand side of this

constraint is linear in the uncertain parameter b, we may replace U by conv(U).

Hence, if we can compute conv(U), we can apply Theorem 2.

A well-known example is quadratic uncertainty and an ellipsoidal uncertainty

region. For this case, it has been proved that conv(U) can be formulated as an

LMI. The resulting robust counterpart is therefore a system of LMIs.

Globalized Robust Optimization

Motivation

Idea Globalized Robust Optimization

inner region

outer region

inner region small constraintviolation allowed

large constraintviolation allowed

constrainedviolation

unbounded

Inner uncertainty region: full feasibility as in classical ROAllow restricted constraint violations for parameters in the outeruncertainty regionAllowed violation depends on distance to the inner region

Ruud Brekelmans (Tilburg University) Globalized Robust Optimization SIAM OP 2014 4 / 20

Globalized Robust Counterparts

Globalized Robust Counterpart

f (a, x) ≤ mina∈U1

φ(a, a) ∀a ∈ U2 (GRC)

Uncertainty regions

U1: inner uncertainty region (convex and compact)U2: outer uncertainty region (convex)Z1 and Z2 corresponding primitive uncertainty regions

Violation measureφ(a, a): “distance” between a and a

nonnegative and jointly convex in both argumentsφ(a,a) = 0 for all a ∈ Rm

Examples:Norm based: φ(a, a) = θ · ‖a− a‖ or φ(a, a) = θ · ‖a− a‖2

Phi-divergence distancesRuud Brekelmans (Tilburg University) Globalized Robust Optimization SIAM OP 2014 12 / 20

Globalized Robust Counterparts

Example: Simple Case

Linear constraint,violation linear in distance

aT x − b ≤ mina∈U1

θ · ‖a− a‖2 ∀a ∈ U2

with

Ui =

a = a0 + Aζ∣∣ ζ ∈ Zi

i = 1,2

Z1 =ζ ∈ RL ∣∣ ‖ζ‖1 ≤ ρ1

Z2 =

ζ ∈ RL ∣∣ ‖ζ‖∞ ≤ ρ2

where 0 < ρ1 < ρ2.

U2

U1a0

Ruud Brekelmans (Tilburg University) Globalized Robust Optimization SIAM OP 2014 14 / 20

Globalized Robust Counterparts

Example: Simple Case (continued)

Original GRC

aT x − b ≤ mina∈U1

θ · ‖a− a‖2 ∀a ∈ U2

GRC theorem

aT0 (v +w) + δ∗(AT v | Z1) + δ∗(AT w | Z2)− f∗(v + w , x) + φ∗(v ;−v) ≤ 0

Tractable GRCaT

0 (v + w) + ρ1‖AT v‖∞ + ρ2‖AT w‖1 − b + 0 ≤ 0v + w = x‖v‖2 ≤ θ

Ruud Brekelmans (Tilburg University) Globalized Robust Optimization SIAM OP 2014 15 / 20

Theorem 1 Let f(., x) be a concave function in IRm for all x ∈ IRn, and φ :

IRm × IRm → IR a convex and nonnegative function for which f(a, a) = 0

for all a ∈ IRm. Let the set Z1 ⊂ IRL be nonempty, convex, and compact with

0 ∈ ri(Z1), let Z2 be a convex set such that Z1 ⊂ Z2, and let the sets U1 and U2

be defined by

Ui = a = a0 + Aζ | ζ ∈ Zi , i = 1, 2,

where a0 ∈ IRm and A ∈ IRm×L.

Then the vector x ∈ IRn satisfies the GRC

f(a, x) ≤ mina′∈U1

φ(a, a′) ∀ a ∈ U2, (1)

if and only if there exist v, w ∈ IRm that satisfy the single inequality

aT0 (v+w)+δ∗(ATv | Z1)+δ

∗(ATw | Z2)−f∗(v+w, x)+φ∗(v;−v) ≤ 0 . (2)

Simplification of the Robust Counterparts

for Conic Quadratic Optimization

F A C U L T Y O F E C O N O M I C S A N D B U S I N E S S A D M I N I S T R A T I O N

Inhomogeneous S-lemma

3 / 40

Lemma 1 Let A, D be symmetric matrices of the same size, and let the quadratic

form zT Az + 2bT z + c be strictly positive at some point. Then the implication

zT Az + 2bT z + c ≥ 0 ⇒ zT Dz + 2eT z + f ≥ 0

holds true if and only if

∃λ ≥ 0 :

(D − λA e − λb

eT − λbT f − λc

)

0. (3)

F A C U L T Y O F E C O N O M I C S A N D B U S I N E S S A D M I N I S T R A T I O N

5 / 40

However, although theoretically speaking (6) is tractable, computationally speaking it

is not!

Aim: to show that the RC is a conic quadratic programming problem.

Core problem: find a tractable robust counterpart for:

aT Da + 2aT d ≤ γ ∀ a : aT Aa ≤ ρ2.

F A C U L T Y O F E C O N O M I C S A N D B U S I N E S S A D M I N I S T R A T I O N

Simultaneously diagonalizability

8 / 40

Definition 2 Real symmetric matrices A and B are called simultaneously

diagonalizable if there exists a nonsingular matrix S such that both ST AS and

ST BS are diagonal.

The following theorem, proved by Uhlig (1972), gives a sufficient condition for

simultaneously diagonalizability.

Theorem 3 Real symmetric matrices A and B can be simultaneously diagonalized if

x|xT Ax = 0 ∩ x|xT Bx = 0 = 0.

Note that if one of the matrices A and B is positive definite, then these two matrices

can be simultaneously diagonalized.

F A C U L T Y O F E C O N O M I C S A N D B U S I N E S S A D M I N I S T R A T I O N

9 / 40

How to calculate S?

Suppose B is positive definite, then:

1. compute the Cholesky factorization B = GGT

2. compute C = G−1AG−T

3. use symmetric QR-algorithm to compute the Schur decomposition

QT CQ = diag (α1, · · · , αn)

4. set S = G−T Q.

We now have ST BS = I and ST AS = diag (α1, · · · , αn), i.e., matrices A and B

can be diagonalized by S.

F A C U L T Y O F E C O N O M I C S A N D B U S I N E S S A D M I N I S T R A T I O N

Nonconvex quadratic problems (with one constraint)

11 / 40

We consider the following optimization problem:

(P )

min 1

2zT Dz + eT z

s.t. 12zT Az + bT z + c ≤ 0,

where D, A ∈ Rn×n are symmetric, z, b, e ∈ Rn, and c ∈ R. We assume that A

and D can be simultaneously diagonalized: ∃ nonsingular S such that

ST AS = diag (α1, · · · , αn) and ST DS = diag (δ1, · · · , δn).

Using the change of variables z = Sx and change of parameters β = ST b,

ε = ST e, and by setting yi = 12x2

i , we can rewrite problem (P ) as follows:

(P1)

min δT y + εT x

s.t. αT y + βT x + c ≤ 012x2

i − yi = 0, ∀i.

F A C U L T Y O F E C O N O M I C S A N D B U S I N E S S A D M I N I S T R A T I O N

12 / 40

We consider the following convex relaxation of (P1):

(P2)

min δT y + εT x

s.t. αT y + βT x + c ≤ 012x2

i − yi ≤ 0 ∀i.

The following theorem shows the equivalence of (P1) and (P2).

Theorem 4 Assume that there exists a strictly feasible solution to (P ). Then, if there

exists an optimal solution to (P2) then there exists an optimal solution to (P1).

Hence, the (probably nonconvex) quadratic problem (P ) can be solved by solving a

convex quadratic optimization problem (P2).

F A C U L T Y O F E C O N O M I C S A N D B U S I N E S S A D M I N I S T R A T I O N

Example of nonconvex quadratic problems

13 / 40

.

min−1

2z21 − 1

2z22 − z2 s.t. z2

1 +1

2z22 + z2 ≤ 1.

The corresponding problem (P2) is:

min−y1 − y2 − x2 s.t. 2y1 + y2 + x2 ≤ 1,1

2x2

1 − y1 ≤ 01

2x2

2 − y2 ≤ 0.

The optimal objective value of this problem is −1, and the optimal solution

x∗1 = 0, x∗

2 = 0, y∗1 = 0, y∗2 = 1, and the KKT multipliers u = 1, µ1 = 1, µ2 = 0.

This solution clearly does not satisfy y∗i = 12(x∗

i )2.

However, such a solution is given by x1 = 0, x2 = −1 ±√

3, y1 = 0, y2 = 2 ∓√

3,

with objective value −1.

F A C U L T Y O F E C O N O M I C S A N D B U S I N E S S A D M I N I S T R A T I O N

Dual problem of (P2)

16 / 40

One can even simplify problem (P2)by taking the dual.

Theorem 6 Suppose there exists a strictly feasible solution to (P2). Then the

objective values of (P2) and the following dual problem are equal:

(D2)

maxv∈R −∑

i(vβi+εi)

2

2(δi+vαi)+ cv

s.t. δi + vαi ≥ 0, ∀i

v ≥ 0.

Note: the objective of (D2) is concave in v.

F A C U L T Y O F E C O N O M I C S A N D B U S I N E S S A D M I N I S T R A T I O N

Sharpening the S-lemma in the case of SDG

17 / 40

We start with the inhomogeneous version of the fundamental S-lemma.

Lemma 7 Let A, D be symmetric matrices of the same size, and let the quadratic

form zT Az + 2bT z + c be strictly positive at some point. Then the implication

zT Az + 2bT z + c ≥ 0 ⇒ zT Dz + 2eT z + f ≥ 0

holds true if and only if

∃λ ≥ 0 :

(D − λA e − λb

eT − λbT f − λc

)

0. (10)

In case that the matrices A and D are simultaneously diagonalizable we can sharpen

the S-lemma, i.e., the LMI can be replaced by a simple convex constraint.

F A C U L T Y O F E C O N O M I C S A N D B U S I N E S S A D M I N I S T R A T I O N

18 / 40

Lemma 8 Let A, D be symmetric matrices of the same size and simultaneously

diagonalizable by S into diag (α1, ..., αn) and diag (δ1, ..., δn), respectively. Let the

quadratic form zT Az + 2bT z + c be strictly positive at some point. Then the

implication

zT Az + 2bT z + c ≥ 0 ⇒ zT Dz + 2eT z + f ≥ 0

holds true if and only if there exist v ∈ R such

−∑

i(vβi−εi)

2

δi−vαi− cv + f ≥ 0

δi − vαi ≥ 0

v ≥ 0,

in which β = ST b and ε = ST e.

F A C U L T Y O F E C O N O M I C S A N D B U S I N E S S A D M I N I S T R A T I O N

Applications - General

32 / 40

There are also many examples of optimization problems that are not conic

quadratic, but can be reformulated as such.

See the excellent paper by Lobo et al. (2009).

Examples: max of norms problems, logarithmic Chebychev approximation,

quadratic/linear fractional problems.

The robust counterpart of the original problem and the reformulated conic

quadratic problem are not always equivalent.

E.g., for the sum of norms problem mentioned the reformulation does not allow

you to use implementation error.

F A C U L T Y O F E C O N O M I C S A N D B U S I N E S S A D M I N I S T R A T I O N

Conclusions

40 / 40

In case of SDG, the S-lemma can be sharpened, and be extended to the case

of three quadratic forms.

In case of SDG, a convex quadratic constraint with ellipsoidal uncertainty can be

transformed into a conic quadratic constraint.

In case of SDG, a conic quadratic constraint with ellipsoidal uncertainty can be

transformed into a ’nearly’ conic quadratic constraint, with convex level sets.

This has many applications!

Robust Optimization and Chance Constraints

Chance Constraints

p(w) ≡ Prob

w0 +

d∑`=1

z`w` ≥ 0

≥ 1− ε (C)

• In general, (C) can be difficult to process:

– The feasible set X of (C) can be nonconvex, which makes

it problematic to optimize under the constraint.

– Even when convex,X can be “computationally intractable”:

Let z ∼ Uniform([0.1]d). In this case, X is con-

vex (Lagoa et al., 2005); however, unless P = NP ,

there is no algorithm capable to compute p(w) within

accuracy δ in time polynomial in the size of the (ra-

tional) data w and in ln(1/δ) (L. Khachiyan, 1989).

• When (C) is difficult to process “as it is”, one can look for

a safe tractable approximation of (C) — a computationally

tractable convex set Uε such that Uε ⊂ X ≡ w : p(w) ≥ ε.

Probabilistic Guarantees via RO

f0(x) +

d∑l=1

zlfl(x) ≤ 0 . (1)

Assumption

z1, z2, . . . , zd independent rv’s

zl ∼ Pl ∈ Pl (compact all prob. dist. in Pl has common support

= [−1, 1]).

Definition A vector x satisfying, for a given 0 < z < 1:

Prf0(x) + Σzlfl(x) ≤ 0 ≥ 1− ε (chance constraint) (2)

provides a safe approximation of (1).

Challenge Find uncertainty set for z, Uε s.t. the Robust Counter-

part of (1):

f0(x) + Σzlfl(x) ≤ 0, ∀ z ∈ Uε (3)

is a safe approximation of (1), i.e., every x satisfying (3) satisfiesthe CC (2).

Theorem

Uε = B ∩ (M + Eε)

B = u ∈ IRd | ‖u‖∞ ≤ 1

where M = u|µ−l ≤ ul ≤ µ+l , l = 1, . . . , d (4)

E = u|Σu2l /σ

2l ≤ 2 log(1/ε)

µ−l , µ+l and σl are such that

Al(y) ≤ max(µ−l y, µ+l y) +

σ2l

2y2l , ∀ l = 1, . . . , d

where

Al(y) = maxPl∈Pl

log

(∫exp(ys)dPl(s)

).

Values of are explicitly known for various families Pl,e.g.

Moreover, for the LP case (fl(y) affine) the RC of (3), with U as inthe Theorem, is conic quadratic or LP.

ll σμ ,+

( ) [ ]

( ) [ ]241

21,21unimodal P

1,1supp

610

symmetric and unimodal P1,1supp

==−=

⎭⎬⎫−⊂

==

⎭⎬⎫−⊂

+−

±

lll

ll

μμ

μ

σ

σ

P

P

(1) a(ζ)T x ≤ b(ζ)

a(ζ) = a0 +L∑

`=1

ζ`a`, b(ζ) = b0 +

L∑

`=1

ζ`b`

ζ1, . . . , ζL i.i.d., E(ζ`) = 0 , |ζ`| ≤ 1

(CC)ε Probζ

(a(ζ)T x ≤ b(ζ)

) ≥ 1− ε

Let UΩ =ζ ∈ IRL | ‖ζ‖2 ≤ Ω

.

Consider the RC of (1) w.r.t. UΩ:

a(ζ)T x ≤ b(ζ) ∀ ζ ∈ UΩ

which we already know is equivalent to

(RC)Ω (a0)T x + Ω√∑L

`=1 ((a`)T x− b`)2 ≤ b0

Theorem 1 If x solves (RC)Ω with Ω ≥√

2 log(1/ε),then x solves (CC)ε

OR :

x solves (RC)Ω then x solves

(CC)ε with ε < e−Ω2/2

e.g., Ω = 7.44, 1− ε = 1− 10−12.1

Discussion What if we ignore the stochastic information and just use

|ζ`| ≤ 1? In this case, the CC coincide with the RC of the linear eq.

with uncertainty

UBox = ζ | |ζi| ≤ 1, ` = 1, . . . , L

which is here∑∣

∣(a`)T x − b`∣∣ ≤ b0 − (ao)T x (3)

Hence

x feasible for (3) ⇒ CC is feasible with prob. 1. (4)

When using the stochastic information, the RC is

Ω√∑

[(a`)T x − b`]2 ≤ b0 − (a0)T x (5)

which corresponds to UBall, and we have

x feasible for (4) ⇒ CC is feasible with prob. 1 − exp(−Ω2/2) (6)

(For Ω = 7.44 1 − exp(−Ω2/2) = 1 − 10−12 so (4) and (6) are

indistinguishable!)

But, for large L

Vol(UBox)

Vol(UBall)−→ ∞ superexponentially

(ratio is > 1 starting with L = 237).

For small L, we could use

U = UBox ∩ UBall =ζ ∈ IRL | ‖ζ‖∞ ≤ 1, ‖ζ‖2 ≤ Ω

Here the RC is

∑|z`| + Ω

√∑w2

` ≤ b0 − (a0)T x

z` + w` = b` = (a`)T x(7)

and we have

If x is a component of a feasible

solution (x, z, w) of (7)=⇒

CC is feasible with

prob. 1 − exp(−Ω2/2)(8)

Discussion With U = UBox ∩ UBall, we get always (for all L) less

conservative RC (incomparable so for Ω > 7 compared to UBox) and

yet guarantee the CC with prob. essen. 1.

Striking phenomenon Consider the special case of P in Case I:

Pr(ζi = ±1) = 1/2 .

Note that here

‖ζ‖22 = L ;

hence, if L > Ω2, the set U does not contain even a single

realization of ζ, and yet the CC holds with high probability.

Conclusion The “immunization power” of the RC (7) cannot be

explained by the fact that the underlying perturbation set U contains

“nearly all” realizations of the random perturbation vector.

Illustration: Single-Period Portfolio Selection

There are 200 assets. Asset #200 (“money in the bank”) has yearly

return r200 = 1.05 and zero variability. The yearly returns r`,

` = 1, . . . , 199 of the remaining assets are independent random variables

taking values in the segments [µ` − σ`, µ` + σ`] with expected values µ`;

here

µ` = 1.05 + 0.3(200 − `)

199, σ` = 0.05 + 0.6

(200 − `)

199, ` = 1, . . . , 199 .

The goal is to distribute $1 between the assets in order to maximize the

return of the resulting portfolio, the required risk level being ε = 0.5%.

We want to solve the uncertain LO problem

maxy,t

t :

199∑

t=1

r`y` + r200y200 − t ≥ 0,

200∑

`=0

y` = 1, y` ≥ 0 ∀ `

,

where y` is the capital to be invested into asset #`.

The uncertain data are the returns r`, ` = 1, . . . , 199; their natural

parameterization is

r` = µ` + σ`ζ` ,

where ζ`, ` = 1, . . . , 199, are independent random perturbations with

zero mean varying in the segments [−1, 1]. Setting x = [y;−t] ∈ IR201,

the problem becomes

minimize x201

subject to

(a)[

a0 +∑199

`=1 ζ`a`]T

x −[

b0 +∑199

`=1 ζ`b`]

≤ 0

(b)∑200

j=1 x` = 1

(c) x` ≥ 0, ` = 1, . . . , 200

(4)

where

a0 = [−µ1;−µ2; . . . ;−µ199;−r200;−1]; a` = σ` · [0`−1,1; 1; 0201−`,1], ` = 1, . . . , 199;

b` = 0, ` = 0, 1, . . . , 199 .

The only uncertain constraint in the problem is the linear inequality

(a). We consider 3 perturbation sets along with the associated robust

counterparts of problem (4).

1. Box RC which ignores the information on the stochastic nature of

the perturbations affecting the uncertain inequality and uses the

only fact that these perturbations vary in [−1, 1]. The underlying

perturbation set Z for (a) is ζ : ‖ζ‖∞ ≤ 1 ;

2. Ball-Box with the safety parameter Ω =√

2 ln(1/ε) = 3.255, which

ensures that the optimal solution of the associated RC (a CQ prob.)

satisfies (a) with probability at least 1 − ε = 0.995. The underlying

perturbation set Z for (a) is ζ : ‖ζ‖∞ ≤ 1, ‖ζ‖2 ≤ 3.255 ;

Results

Box RC. The associated RC is the LP

maxy,t

t :

199∑

`=1

(µ` − σ`)y` + 1.05y200 ≥ t

200∑

`=1

y` = 1, y ≥ 0

;

as it should be expected, this is nothing but the instance of our

uncertain problem corresponding to the worst possible values

r` = µ` − σ`, ` = 1, . . . , 199, of the uncertain returns. Since these

values are less than the guaranteed return for money, the robust

optimal solution prescribes to keep our initial capital in the bank with

guaranteed yearly return 1.05.

Ball-Box RC. The associated RC is the conic quadratic problem

maxy,z,w,t

t :

199∑

`=1

(µ`y` + 1.05y200 −199∑

`=1

|z`| − 3.255

√√√√

199∑

`=1

w2` ≥ t

z` + w` = y`, ` = 1, . . . , 199,

200∑

`=1

y` = 1, y ≥ 0

.

The robust optimal value is 1.1200, meaning 12.0% profit with risk as

low as ε = 0.5%.

Robust Solution of Uncertain Optimization

Problems under Ambiguous Stochastic Data

Aharon Ben-Tal

Technion—Israel Institute of Technology

Joint work with K. Postek, D. den Hertog and B. Melenberg

Deterministic optimization problem (z known parameters)

minx∈X

f (x, z)

s.t.

g(x, z) ≤ 0 .

What if z is uncertain?

Case I z is only known to reside in a bounded set U (called

uncertainty set).

Approach: Robust optimizationminx∈X

maxz∈U

F (x, z)

s.t.maxz∈U

g(x, z) ≤ 0

For tractability it is needed that f and g are convex in x and con-

cave in z.

Case II z is a random vector with known distribution function P.

Approach

minx∈X EPf (x, z)

s.t.ProbabilityPg(x, z) ≤ 0 ≥ 1− ε

Difficulties: Obj. function involves multiple integration. Con-

straint is usually NP-hard (nonconvex).

Case III z is stochastic but its distribution P is not known exactly.

It is only known to belong to a family P of distributions.

Approach

minx∈X

maxP∈P

EPf (x, z)

s.t.maxP∈P

ProbPg(x, z) ≤ 0 ≥ 1− ε

This is Robust Optimization under Stochastic Ambiguity.

Basic theory

Ambiguous constraints

Consider a constraintf (x, z) ≤ 0,

where

x ∈ Rnx is the decision vector

z ∈ Rnz is an uncertain parameter vector

f (·, z) is convex for all z.

Assume that z has a stochastic nature:

follows a probability distribution PP belongs to an ambiguity set P, based on some partial information

Postek et al. (2015) 4 / 34

Basic theory

Distributionally Robust Optimization

Two types of constraints:

worst-case expected feasibility constraints:

supP∈P

EPf (x, z) ≤ 0, (WC-EF)

worst-case chance constraints:

supP∈P

P (f (x, z) > 0) ≤ ε. (WC-CC)

(WC-EF) is used to construct safe approximations of (WC-CC).

Postek et al. (2015) 5 / 34

Basic theory

Distributionally Robust Optimization

Two types of constraints:

worst-case expected feasibility constraints:

supP∈P

EPf (x, z) ≤ 0, (WC-EF)

worst-case chance constraints:

supP∈P

P (f (x, z) > 0) ≤ ε. (WC-CC)

(WC-EF) is used to construct safe approximations of (WC-CC).

Postek et al. (2015) 5 / 34

Basic theory

Ambiguity set P

Ambiguity set P should be such that it is possible to obtain good,computationally tractable upper bounds on

supP∈P

EPf (x, z)

Most frequently, P consists of P with known:

mean

(co)variance matrix

possibly, higher order moment information

Major works: Scarf (1958), Dupacova (1977), Birge and Wets (1987),Birge and Dula (1991), Gallego (1992), Gallego, Ryan & Simchi-Levi(2001), Delage and Ye (2010), Wiesemann et al. (2014) and manyothers...

Postek et al. (2015) 6 / 34

Basic theory

Scarf’s (1958) newsvendor problem

Scarf considered a newsvendor problem with a single product such that

c is the selling price per unit

z ≥ 0 is the uncertain demand with known mean µ and variance σ2

x is the number of items chosen by the newsvendor to purchase

The objective is to maximize the worst-case expected profit:

maxx

supP∈P

minx , z − cx

wherePz = P : EPz = µ, EP(z − µ)2 = σ2.

Postek et al. (2015) 7 / 34

Basic theory

Scarf’s result

In our framework the problem is given by

max t

supP∈P

EPf (x , z) ≤ −t

x ≥ 0,

where f (x , z) = cx −minx , z.The optimal value is equal to:

t∗ = µ

((1− c) +

√σ2c(1− c)/µ2

)+

.

However, no closed-form tight bound for the worst-case expectation of ageneral convex f (x , ·) under mean-variance information!

Postek et al. (2015) 8 / 34

Basic theory

Forgotten result of Ben-Tal and Hochman (1972)

An exact upper bound when the dispersion measure is the mean absolutedeviation (MAD).

TheoremAssume that a one-dimensional random variable z has support included in[a, b] and its mean and mean absolute deviation are µ and d :

P = P : supp(z) ⊆ [a, b], EPz = µ, EP|z − µ| = d .

Then, for any convex function g : R→ R it holds that

supP∈P

EPg(z) = p1g(a) + p2g(µ) + p3g(b),

where p1 = d2(µ−a) , p3 = d

2(b−µ) , p2 = 1− p1 − p3.

Postek et al. (2015) 9 / 34

Basic theory

Generalization to multiple dimensions

The result of Ben-Tal and Hochman (1972) generalizes tomultidimensional z with independent components.

P = P : supp(zi ) ⊆ [ai , bi ], EPzi = µi , EP|zi − µi | = di , zi ⊥ zj .

Independence implies that the worst-case distribution is a product of theper-component worst-case distributions.

For each convex g(·) it holds that

supP∈P

EPg(z) =∑

α∈1,2,3nz

(nz∏i=1

piαi

)g(τ1α1

, . . . , τnzαnz)

where piαiand τ iαi

depend only on ai , bi , µi , and di (not on g(·)).

Postek et al. (2015) 10 / 34

Basic theory

Lower bound result

Ben-Tal and Hochman (1972) provide also an exact formula for the lowerbound on the expectation if additionally, it is known that P(z ≥ µ) = β:

Pβ = P : P ∈ P, P(z ≥ µ) = β .

Then, for any convex function g : R→ R it holds that

infP∈Pβ

EPg(z) = βg

(µ+

d

)+ (1− β)g

(µ− d

2(1− β)

).

Postek et al. (2015) 11 / 34

Worst-case expected feasibility constraints

Application to (WC-EF)

Postek et al. (2015) 12 / 34

Worst-case expected feasibility constraints

Setting

Consider (WC-EF):supP∈P

EPf (x, z) ≤ 0,

where f (x, ·) is convex. Then, (WC-EF) is equivalent to:

gU(x) ≤ 0

where

gU(x) =∑

α∈1,2,3nz

(nz∏i=1

piαi

)f (x, τ1α1

, . . . , τnzαnz)

with fixed τ iαi∈ ai , µi , bi, piαi

. Note that gU(x) is convex.

Postek et al. (2015) 13 / 34

Worst-case expected feasibility constraints

What can be gained by the lower bound result?

If for each zi we also know βi = P(zi ≥ µi ) then we know not only the

supP∈P

EPf (x, z)

but alsoinfP∈P

EPf (x, z)

by a generalization of the lower-bound result to multiple dimensions.

Hence, an entire, tight interval for EPf (x, z) is provided!

If βi unknown, assume several values and evaluate the lower bound foreach.

Postek et al. (2015) 14 / 34

Worst-case expected feasibility constraints

Specific applications - convex functions

Constraints that are convex in the uncertain parameter z are generallyintractable in the classical (worst-case oriented) RO.

This can occur, e.g., when:

implementation error x 7→ x + z is present: f (x, z) = g(x + z)

linear decision rules x = v + V z are applied: f (x, z) = g(v + V z)

the objective has a sum-of-max form:f (x, z) =

∑i maxl1(x, z), . . . , lIi (x, z), where lj(·, ·) are bilinear

Postek et al. (2015) 15 / 34

Worst-case expected feasibility constraints

Inventory management (Ben-Tal et al. (2004))

Problem characteristics:

6 periods to manage the inventory

involves purchase, holding and shortage costs

the uncertain parameter is the deviation of product demand zt fromits forecast value µt

decision variables are ordering decisions xt(zt) (linear decisions ofobserved past demand)

We assume independence of the deviations zt with mean 0 and knownMAD.

Postek et al. (2015) 16 / 34

Worst-case expected feasibility constraints

Inventory management (Ben-Tal et al. (2004))

Research question: What is the difference in average-case performncebetween

RO solutions - solutions that minimize the worst-case outcome

WCE solutions - solutions that minimize the worst-case expectation ofthe outcome?

Objective type βTotal costs

RO WCE

Worst-case value - 1950 2384Expectation range 0.25 [1255,1280] [1004,1049]Expectation range 0.5 [1223,1280] [970,1049]Expectation range 0.75 [1230,1280] [994,1049]

We assume three possible values of β (skewness of the product demand).

Postek et al. (2015) 17 / 34

Worst-case expected feasibility constraints

Inventory management (Ben-Tal et al. (2004))

Research question: What is the difference in average-case performncebetween

RO solutions - solutions that minimize the worst-case outcome

WCE solutions - solutions that minimize the worst-case expectation ofthe outcome?

Objective type βTotal costs

RO WCE

Worst-case value - 1950 2384Expectation range 0.25 [1255,1280] [1004,1049]Expectation range 0.5 [1223,1280] [970,1049]Expectation range 0.75 [1230,1280] [994,1049]

We assume three possible values of β (skewness of the product demand).

Postek et al. (2015) 17 / 34

Worst-case expected feasibility constraints

Inventory management (Ben-Tal et al. (2004))

Objective type βTotal costs

RO WCE

Worst-case value - 1950 2384Expectation range 0.25 [1255,1280] [1004,1049]Expectation range 0.5 [1223, 1280] [970, 1049]Expectation range 0.75 [1230,1280] [994,1049]

WCE solution performs worse on the worst-case basis (as expected)

WCE solution yields strictly better average-case performance than therobust solution - for β = 0.5 the entire WCE interval lies below theRO interval

even more so, the WCE solutions are better by one standard deviationthan the RO solutions

Postek et al. (2015) 18 / 34

Worst-case expected feasibility constraints

Average-case enhancement of RO solutions

Suppose that f (x, z) is RO tractable (concave or special-case convex in z)and we minimize sup

z∈Zf (x, z).

Often, there are multiple worst-case optimal x (Iancu and Trichakis(2013)). Answer? Choose the one with the best average-case performance.

1 Solve the RO problem minimizing the worst-case objective:

minx

supz∈Z

f (x, z)

Denote optimal value by t.2 Solve the second-stage enhancing problem:

minx

supP∈P

EPf (x, z)

s.t. supz∈Z

f (x, z) ≤ t

Postek et al. (2015) 19 / 34

Worst-case expected feasibility constraints

Inventory management (Ben-Tal et al. (2004))

Research question: What is the difference in simulation performncebetween

non-enhanced RO solutions

enhanced RO solutions?

Demand in the simulation is sampled uniformly from the uncertainty set.

We study the distribution of the total costs obtained in the simulationstudy by each of the solutions.

Postek et al. (2015) 20 / 34

Worst-case expected feasibility constraints

Inventory management (Ben-Tal et al. (2004))

0 500 1000 1500 2000 2500 3000 35000

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6x 10

−3

Objective value

Den

sity

func

tion

Non−enhanced RO(µ,d)−enhanced RO

The enhanced solution is better because:

both solutions yield the same worst-case value (3149)

the worst-case mean objective value brought by the enhanced solutionis 1284 vs 1502 of the non-enhanced solution (down by 15%)

the histogram of total costs obtained by the enhanced solution isvisibly more skewed to the left (smaller total costs more probable)

Postek et al. (2015) 21 / 34

Worst-case chance constraints

Application to (WC-CC)

Postek et al. (2015) 22 / 34

Worst-case chance constraints

Setting

We assume w.l.o.g. that supp(zi ) ∈ [−1, 1], EPzi = 0 and EP|zi − 0| = d .

Consider the (WC-CC):

supP∈P

P(aT (z)x > b(z)

)≤ ε,

where

[a(z); b(z)] = [a0; b0] +nz∑i=1

zi [ai ; bi ].

Postek et al. (2015) 23 / 34

Worst-case chance constraints

Safe approximations

As such, (WC-CC) is intractable and we need a safe approximation - acomputationally tractable set S of deterministic constraints such that

x feasible for S ⇒ x feasible for (WC-CC)

How to construct safe approximations?The crucial step is a construction of an upper bound on the momentgenerating function (MGF) of z (Ben-Tal et al. (2009)):

supP∈P

EP exp(wTz).

.

Postek et al. (2015) 24 / 34

Recall: For each convex g(·) it holds that

supP∈P

EPg(z) =∑

α∈1.2.3n

(n∏i=1

piαi

)g(τ 1α1, . . . , τ

nzαnz

)where piαi and τ iαi depend only on ai, bi, µi, and di (not on g(·)).

This formula has 3n terms!

However:

supP∈P

log(EP exp(wTz)

)= sup

P∈Plog(EP(ew1z1+···+wnzn

))= sup

P∈Plog

(EP

n∏i=1

ewizi

)= due to zi’s being independent

= supP∈P

log

(n∏i=1

E ewizi

)= sup

P∈P

n∑i=1

(logE ewizi) .

So here we need to apply the (B-H) upper (lower) bound separately

to each on the n one-variable convex functions E ewizi!

Worst-case chance constraints

MGF with our distributional assumptions

We know exactly the worst-case value of the MGF (not just an upperbound):

supP∈P

EP exp(wTz) =nz∏i=1

supP∈P

EP exp(wizi )

=nz∏i=1

(d

2exp(−wi ) + 1− d +

d

2exp(wi )

)

=nz∏i=1

(d cosh(wi ) + 1− d)

Using this fact, we are able to construct three safe approximations ofincreasing tightness and increasing complexity.

Postek et al. (2015) 25 / 34

Worst-case chance constraints

An example of a safe approximation

TheoremLet

[a(z); b(z)] = [a0; b0] +nz∑i=1

zi [ai ; bi ].

If there exists α > 0 such that (x, α) satisfies the constraint

(a0)Tx− b0 + α log

(nz∑i=1

(di cosh

((ai )Tx− bi

α

)+ 1− di

))+α log(1/ε) ≤ 0,

then x satisfies the (WC-CC): supP∈P

P(aT (z)x > b(z)

)≤ ε.

The approximating constraint is convex in (x, α)!

Postek et al. (2015) 26 / 34

Worst-case chance constraints

Antenna array (Ben-Tal and Nemirovski (2002))

1 We consider an optimization problem with 40 circular antennas.

2 Each antenna has its diagram Di (φ) - a plot of intensity of signal sentto different directions.

3 The diagram of the set of 40 antennas is the sum of their diagrams .

D(φ) =n∑

i=1

xiDi (φ)

4 To the i-th antenna we can send a different amount of power xi .

5 Objective: Set the xi ’s in such a way that the diagram has thedesired shape.

Postek et al. (2015) 27 / 34

Application - antenna array optimization

Consider a circular antenna:

X

Y

Z

φ

ri

Energy sent in angle φ ischaracterized by diagram

Diagram of a single antenna:

Di (φ) =1

2

2π∫0

cos

(2πi

40cos(φ) cos(θ)

)dθ

Diagram of n antennas

D(φ) =n∑

i=1

xiDi (φ)

xi - power assigned to antenna i

Objective: construct D(φ) as close as possible to the desired D∗(φ) using theantennas available.

Worst-case chance constraints

Antenna array (Ben-Tal and Nemirovski (2002))

Problem conditions:

for 77 < φ ≤ 90 the diagram is nearly uniform:

0.9 ≤n∑

i=1

xiDi (φ) ≤ 1, 77 < φ ≤ 90

for 70 < φ ≤ 77 the diagram is bounded:

−1 ≤n∑

i=1

xiDi (φ) ≤ 1, 70 < φ ≤ 77

we minimize the maximum absolute diagram value over 0 < φ ≤ 70:

min max0<φ≤70

∣∣∣∣∣n∑

i=1

xiDi (φ)

∣∣∣∣∣Postek et al. (2015) 28 / 34

Worst-case chance constraints

Desired diagram graphically

0 20 40 60 80−0.2

0

0.2

0.4

0.6

0.8

1

Dia

gram

Angle

0.5

1

30

210

60

240

90

270

120

300

150

330

180 0

Postek et al. (2015) 29 / 34

Optimization problem to be solved

min τ

s.t. −τ ≤n∑

i=1

xiDi (φ) ≤ τ, 0 ≤ φ ≤ 70

−1 ≤n∑

i=1

xiDi (φ) ≤ 1, 70 ≤ φ ≤ 77

0.9 ≤n∑

i=1

xiDi (φ) ≤ 1, 77 ≤ φ ≤ 90

Typically, decisions xi suffer from implementation error zi :

xi 7→ xi = (1 + zi )xi

We want each constraint to hold with probability at least 1− ε!

Worst-case chance constraints

Implementation error

Typically, decisions xi suffer from implementation error zi :

xi 7→ xi = (1 + ρzi )xi

We want each constraint to hold with probability at least 1− ε for allP ∈ P, for example:

P

(n∑

i=1

xi (1 + ρzi )Di (φ) ≤ 1

)≥ 1− ε, 77 < φ ≤ 90, ∀P ∈ P

Two solutions:

nominal: no implementation error

robust: ρ = 0.001 and ε = 0.001.

Postek et al. (2015) 30 / 34

Implementation error and chance constraints

Assumptions on implementation errors z1, . . . , zn:

independence: zi independent from zj for i 6= j

support: −1 ≤ zi ≤ 1 for all i

zero mean: EPzi = 0 for all i

mean absolute deviation: EP|zi − µi | = di for all i

Hence the ambiguity set:

P(µ,d) = P : zi ∈ [−1, 1], EPzi = 0, EP|zi − µi | = di , zi ⊥ zj

Safe approximations of chance constraints are typically constructedby constructing upper bounds on moment-generating functionsM(w):

supP∈P

EPM(w) = supP∈P

EP exp(wTz

)

Ben-Tal and Hochman (1972)

Assume f : R 7→ R is convex and z follows distribution P belongingto ambiguity set P such that

P =P : supp(z) = [a−, a+], EPz = µ, EP|z − µ| = d

Then it holds that:

supP∈P

EPf (zi ) = p1f (a−) + p2f (µ) + p3f (a+)

where p1 = d2(a+−µ) , p3 = d

2(µ−a−) , p2 = 1− p1 − p3.

Using this to the MGF problem with our assumptions on z1, . . . , znwe have

supP∈P(µ,d)

EP exp(wTz

)=

n∏i=1

(di cosh(wi ) + 1− di )

How does it apply to antenna implementation error

We need µ−i , µ+i , σi such that:

di cosh(t) + 1− di ≤ exp

(maxµ+i t, µ

−i t+

1

2σ2i t

)for all t ∈ R.We easily find the right values µ−i = µ+i = 0 and

σi = supt∈R

√2 log (di cosh(t) + 1− di )

t2

to satisfy this requirement.

Worst-case chance constraints

Nominal solution - dream and reality

0 20 40 60 80

0

0.5

1

Dia

gram

Angle

Nominal solution − no implementation error

0.5

1

30

210

60

240

90

270

120

300

150

330

180 0

No implementation error − polar plot

0 20 40 60 80−100

−50

0

50

Dia

gram

Angle

Nominal solution − implementation error ρ=0.001

50

100

30

210

60

240

90

270

120

300

150

330

180 0

Implementation error − polar plot

Postek et al. (2015) 31 / 34

Worst-case chance constraints

Robust solution - dream and reality

0 20 40 60 80

0

0.5

1

Dia

gram

Angle

Robust solution − no implementation error

0.5

1

30

210

60

240

90

270

120

300

150

330

180 0

No implementation error − polar plot

0 20 40 60 80

0

0.5

1

Dia

gram

Angle

Robust solution − implementation error ρ=0.001

0.5

1

30

210

60

240

90

270

120

300

150

330

180 0

Implementation error − polar plot

Postek et al. (2015) 32 / 34

Worst-case chance constraints

Paper

Postek, K., Ben-Tal, A., Den Hertog, D., & Melenberg, B. (2015).

Exact robust counterparts of ambiguous stochastic constraints under meanand dispersion information.

. Operations Research 2018

Postek et al. (2015) 33 / 34

Recovery of signals

from noisy outputs

The Estimation Problem

y = Hx + w

Given y, find an estimator x, which is as “close” aspossible to x.

w random vector

E(w) = 0, cov(w) = C positive definite

CLASSICAL METHODS are based on minimizingdata error ‖y −Hx‖ .

2

CLASSICAL APPROACH (Gauss,. . .)

Closeness measured by (standardized) data error

‖C−1/2(y −Hx)‖2Least Squares Estimator

xLS = arg minx‖C−1/2(y −Hx)‖2 convex

optimization

SOLUTION (H full column rank)

xLS = (HT C−1H)−1HT C−1y

a linear estimatorx = Gy

CLASSICAL MODIFICATION (Tikhonov,. . .)

xT = arg minx

‖C−1/2(y −Hx)‖2 + λ‖x‖2still

convex

optimization

SOLUTION

xT = (HT C−1H + λI)−1HT C−1y

also a linear estimator.

3

True signal Observations

LS

4

MSE estimator

minx

E‖x− x‖2

With a linear estimator x = Gy problem becomes

minG

xT (I −GH)T (I −GH)x + Tr(GCG)

bias variance

but x unknown!

“Solution”: minimal variance unbiased estimatorGH = I

Solution: Same as xLS· · ·

Our approach: minmax MSE linear estimator:x = Gy, where:

minG

max‖x‖T≤L

xT (I −GH)T (I −GH)x + Tr(GCG)

minG

L2λ max(T−1/2(I − GH)T (I − GH)T−1/2)

+ Tr(GCGT ) (7)

m

mins.t

L2λ + t

T−1/2(I − GH)T (I − GH)T−1/2 ≤ λI

Tr(GCGT ) ≤ t

(8)

Not an SDP . . . yet.

36

Schur’s complement:

T−1/2(I − GH)T (I − GH)T−1/2 λI

λI T−1/2(I − GH)T

(I − GH)T−1/2 I

0

︸ ︷︷ ︸

LMI

Tr(GCGT ) ≤ t ⇔

t gT

g I

0

︸ ︷︷ ︸

LMI

where g = vec (GC1/2).

Theorem I: Original MinMax MSE problem (1) is

equivalent to the SDP problem:

mins.t.

L2λ + t

λI T−1/2(I − GH)T

(I − GH)T−1/2 I

0

t gT

g I

0

37

Theorem II: For the special case T = I, SDP can besolved explicitly. The optimal MMX MSE estimator is

xmmx = α (HT C−1H)−1HT C−1y︸ ︷︷ ︸xLS

where α =L2

L2 + Tr((HT C−1H)−1)

3

Proof Structure

(I) Establish the structure of the optimal solution

G = V DV T (HT C−1H)−1HT C−1

where V is the orthogonal matrix diagonalizingHT C−1H, i.e.,

HT C−1H = V ΣV ∗

Σ = diag(σ1, . . . , σn)

This is obtained by optimality condition. Usingthis, we end up with an equivalent problem invariable (matrix) D (Problem B below).

(II) Show that ∃ an optimal matrix D which isdiagonal.

(III) Find the diagonal elements of D.

4

The First Part of the Proof

The optimization problem

(A)

mins.t.

L2λ + Tr(GCGT )

λI (I − GH)T

I − GH I

0

Form the Lagrangian:

L(G, λ, U) = L2λ + Tr(GCGT )

− Tr

U1 UT

2

U2 U3

λI (I − GH)T

I − GH I

= L2λ + Tr(GCGT ) − λTr(U1) − 2Tr(U2(I − GH))

− Tr(U3) ,

U :=

U1 UT

2

U2 U3

0

40

Differentiating the Lagrangian with respect to G:

∂L

∂G= 0 ⇔ G = U2H

T C−1

⇔ GH = U2(HT C−1H)

⇒ U2 = (GH)(HT C−1H)−1

change of variables: D = V T (GH)V (V T V = I)

⇔ V DV T = GH

⇒ U2 = V DV T (HT C−1H)−1

G = V DV T (HT C−1H)−1HT C−1

In particular, if the orthogonal matrix V is chosen as

the matrix which diagonalizes HT C−1H, i.e.

(HT C−1H) = V diag(σ1, . . . , σn)︸ ︷︷ ︸

V T

then our problem (A), after substituting G becomes

41

(B)minD,λ

L2λ + Tr(DT DΣ−1)

(I −D)T (I −D) 4 λI

Second part of the proof (“optimal D can be chosendiagonal”).

Let Jn be the set of 2n matrices which are n× n,diagonal, with the entries in the diagonal being +1 or−1.

Claim If D∗ is an optimal solution of (B), then so is

JD∗J, ∀ J ∈ Jn

Proof

Tr[(JDJ)T (JDJ)Σ−1 = Tr(DT DΣ−1)]

(I − JDJ)T (I − JDJ) ¹ λI ⇔ (I −D)T (I −D)

Conclusion Since (B) is a convex problem ⇒ itsoptimal solution set is convex, so if D∗ is an optimalsolution, so is

12n

J∈Jn

(JD∗J)

5

D =

a, b

c, d

J2 =

1 0

0 1

,

−1 0

0 1

,

1 0

0 −1

−1 0

0 −1

J1 J2 J3 J4

J1DJ1 =

a b

c d

J2DJ2 =

a −b

−c d

J3DJ3 =

a −b

−c d

J4DJ4 =

a b

c d

1

4

4∑

i=1

JiDJi =1

4

4a 0

0 4d

=

a 0

0 d

General result1

2n

∑J∈Jn

JDJ = diag D

Part 3 of the proof with D = diag(d1, . . . , dn) problem (A)

⇔ (B) reduces to

mindi,λ

L2λ +∑

(d2i /σi)

s.t. (1− di)2 ≤ λ , ∀ i

This problem can be solved analytically, which gives the

final result claimed in Theorem II. 16

True signal Observations

LS Minmax Use

6

Adjustable Robust Optimization

Challenges: Adjustable Robust Optimization♣ Aside of applications of the RO methodology invarious subject areas, an important venue of theRO-related research is extending the RO methodologybeyond the scope of the RC approach as presented so far.

The most important in this respect is, we believe,passing to Adjustable Robust Optimization, where thedecision variables are allowed to “adjust themselves”, tocome extent, to the true values of the uncertain data.♠ One of the central assumptions which led us tothe notion of Robust Counterpart reads:

A.1. All decision variables in uncertain problem repre-sent “here and now” decisions; they should be assignedspecific numerical values as a result of solving the prob-lem before the actual data “reveals itself.”

While being adequate to many decision making sit-uations, A.1 is not a “universal truth.”

3

♠ In some cases, not all decision variables represent“here and now” decisions. In dynamical decision mak-ing some of the variables represent “wait and see”decisions and as such can depend on the portionof the true data which “reveals itself” before themoment when the decision is being made.Example: In an inventory affected by uncertain demand,there are no reasons to specify all replenishment orders in ad-vance; the true time to specify the replenishment order of pe-riod t is the beginning of this period, and thus we can allow thisorder to depend on the actual demands in periods 1, ..., t− 1.♠ Usually, not all decision variables represent actual deci-sions; there exist also “analysis” (or slack) variableswhich do not represent decisions at all and are usedto convert the problem into a desired form, e.g.,one of a LO problem. Since the analysis variables donot represent actual decisions, why not to allow them todepend on the entire true data?

4

Adjustable Robust Optimization

(NLP)

minx f(x)

g(x, z) ≤ 0

z uncertain parameter vector, z ∈ U ← uncertainty set.

E.g., in the LP case

(LP)

min cT (x)

Ax− b ≤ 0 ≤ 0

z = (c, A, b).

Robust counterpart of (NLP):

(RC)

min f(x)

g(x, z) ≤ 0 ∀ z ∈ U .

Implicitly, it is understood that optimal x has to befound prior to the realization of the data z.

• In many practical cases, part of the decisionvariables can be determined after (part of) thedata vector z is realized (e.g., multi-stage decisionproblems).

• Adjustable/nonadjustable variables.

1

Uncertain LP with Adjustable Variables

(LPz)

minu,v cT u

Uu + V v ≤ b

(U,V,b)∈Z

u = vector of nonadjustable variables(“here and now” decisions)

v = vector of adjustable variables(“wait and see” decisions)

V is called the recourse matrix.

The (usual) RC of (LPz):

(RC)

minu,v cT u

Uu + V v ≤ b , ∀ (U, V, b) ∈ Z

The new adjustable RC (ARC) is defined

(ARC)

minu cT u

∀ (U, V, b) ∈ Z, ∃ v :

Uu + V v ≤ b

ARC is less conservative.

2

Example (RC versus ARC)

Consider the following simple example:

minu,v

−u :

(1− 2ξ)u + v ≥ 0ξu− v ≥ 0

u ≤ 1

0≤ξ≤1

The RC and the ARC are, respectively, the problems:

minu

−u :

∃ v ∀ (ξ ∈ [0, 1]) :(1− 2ξ)u + v ≥ 0,ξu− v ≥ 0, u ≤ 1

(RC)

—————————————————————–

minu

−u :

∀ (ξ ∈ [0, 1])∃ v :(1− 2ξ)u + v ≥ 0,ξu− v ≥ 0, u ≤ 1

(ARC)

Let us solve the RC:

ξ = 1 ⇒−u + v ≥ 0

u− v ≥ 0; ξ = 0 ⇒

u + v ≥ 0−v ≥ 0

.

The only u, v that satisfy these inequalities areu = v = 0; therefore the only feasible solution to RC isu = 0, and the optimal value is 0. When solvingARC, for every u ∈ [0, 1], taking v = ξu, we see that u

is feasible. Thus, the optimal solution to ARC is u = 1,and the optimal value is −1. Hence, the ARC of oursimple problem is completely different from its RC, i.e.,the ARC is essential.

3

Tractability Status of the ARC

Consider an uncertain Linear Programming problem

(LPz)

minu cT u

s.t. Uu + V v ≤ b

[U,V,b]∈Z

Theorem 1 In the case when (fixed recourse)

Z = Conv [U1,V , b1], . . . , UN , V , bN ] ,

the ARC of LPz is equivalent to the usual LP problem

minu,v1,...,vN

cT u : U`u + V v` ≥ b`, ` = 1, . . . , N

.

Corollary 1 In the case when

1) uncertainty set Z is a polytope given as a convexhull of a finite set of scenarios,

2) V is certain (fixed recourse).

The Adjustable Robust Counterpart of the LPz is anexplicit LP program and thus is computationallytractable.

However, if only one of the properties 1) and 2) takesplace, the ARC is not necessarily computationallytractable.

5

Computationally Intractable ARC’s

Theorem 2 In the case of an uncertainty set definedas a convex hull of finitely many scenarios with varying(from scenario to scenario) matrices Vi:

Z = Conv [U1, V1, b1], . . . , [UN , VN , bN ] ,

the ARC of LPz can be computationally intractable.

Theorem 3 In the case when Z is a general-typepolytope given by a list of linear inequalities, the ARCof LPz is a computationally intractable problem evenwhen the coefficients of the analysis variables arecertain.

• What can be done when the ARC iscomputationally intractable?

A natural solution in these cases would be toswitch from “exact” ARC’s to approximate ones.

6

Approximated Adjustable RobustCounterpart

When passing from an uncertain problem to its adjustable

Robust Counterpart, we allow the analysis variables v to

tune themselves to the true data ζ. Now, let us impose a

restriction on how the analysis variables can be tuned to the

data; specifically, assume that for u given v is allowed to be

an affine function of the data:

(LDR) v = w+Wζ .

With this approach, we are interested in the decision

variables u that can be extended, by properly chosen w, W ,

to a solution of the infinite system of inequalities

Uu + V (w + W ζ) ≤ b, ∀ ζ = [U, V, b] ∈ Zin variables u, w, W , and the approximate ARC of LPz

becomes the optimization program

(AARC) minu,w,W

cT u : ∀ ζ = [U, V, b] ∈ ZUu + V (w + Wζ) ≤ b

.

Note that the AARC is “in-between” the usual RC of

problem Pz and the Adjustable RC of the problem (AARC

is less conservative than RC; solution of AARC is a

“policy”); to get the RC, one should set to zero the variable

W . Since the AARC seems simpler than the general ARC

of LPz, there are chances to arrive at computationally

tractable “approximate” ARC in the cases where the ARC

is intractable.7

Let the uncertainty set Z be given in the parametricform

Z =

[U, V, b] = [U0, V 0, b0] +

L∑

`=1

ξ`[U `, V `, b`] : ξ ∈ X

,

using the LDR:

v = v(ξ) = v0 +∑

`

ξ`v`.

The AARC of an uncertain LP becomes:

minu,v0,v1,...,vL

cT u

[U0 + Σ ξ` U `

]u + [V 0 + Σ ξ` V `][v0 + Σ ξ` v`]

≤ [b0 + Σ ξ` b`], ∀ ξ ∈ X .

For which perturbation set X is this SIP tractable?

8

Theorem 4 Consider an uncertain LP with fixedrecourse and a cone-representable perturbation set:

X = ξ | ∃ω : Aξ + Bω − d ∈ K ⊂ RL.

When the cone K is

1. a nonnegative orthant Rn+

2. a (finite) direct product of Lorentz cones

Lm =

x ∈ Rm : xm ≥√

x21 + · · ·+ x2

m−1

3. a semidefinite cone (cone of positive semidefinitesymmetric matrices),

then: the AARC of uncertain LP is an explicitLinear, Conic Quadratic or Semidefinite Programmingproblem, respectively, of sizes polynomial in those of thedescription of the perturbation set and of theparametrization mapping.

Bad News: In the non-fixed resource case, the ARCcan become NP-hard.

Remedy ??

9

Tight Approximation on AARC

Recall the “perturbation-based” model of AARC:

minu,v0,v1,...,vL

cT u

[U0 + Σ ξ` U `

]u + [V 0 + Σ ξ` V `][v0 + Σ ξ` v`]

≥ [b0 + Σ ξ` b`], ∀ ξ ∈ X .

with a ∩ ellipsoids perturbation set:

X = Xρ ≡ ξ | ξT S` ξ ≤ ρ2, ` = 1, . . . , L,

with ρ > 0, S` º 0, Σ X` Â 0.

Note that this ellipsoidal uncertainty allows for a widevariety of symmetric (with respect to the origin) andconvex perturbation sets. For example,

• setting ` = 1, the perturbation set is an ellipsoidcentered at the origin;

• setting L = dim ξ and ξT S`ξ = a−2` ξ2

` , ` = 1, . . . , L,we get, as the perturbation set, the boxX = |ξ`| ≤ ρa`, ` = 1, . . . , L centered at theorigin;

• choosing as S` dyadic matrices g`gT` , we can get, as

the perturbation set (centered at the origin)polytope X = ξ : |gT

` ξ| ≤ ρ, ` = 1, . . . , L.11

Theorem 5 Consider an uncertain LP with anellipsoidal perturbation set:

X = Xρ ≡ξ | ξT S`ξ ≤ ρ2, ` = 1, . . . , L

,

where ρ > 0, S` º 0, Σ S` Â 0 along with the

Semidefinite program

(SDP) minλ1,...,λm,x=[u,v0,v1,...,vL]

cT u

s.t.

Γi(x)− ρ−2Σλi

`S` βi(x)

βTi (x) αi(x)− Σλi

`(x)

º 0,

λi ≥ 0, i = 1, . . . , m,

where x ≡ [u, v0, v1, . . . , vL] and

• αi(x) ≡ [U0i u + V 0

i v0 − b0i ]

• β`i (x) ≡ [U`

i u+V 0i v`+V `

i v0−b`i ]

2 , ` = 1, . . . , L

• Γ(`,k)i (x) ≡ V k

i v`+V `i vk

2 , `, k = 1, . . . , L.

Then: Problem (SDP) is a “conservativeapproximation” to the AARC: whenever x can beextended, by some λ1, . . . , λm, to a feasible solution ofthis semidefinite program, x is feasible for the AARC.

What is the level of conservativeness?12

(SDP) minλ1,...,λm,x=[u,v0,v1,...,vL]

cT u

s.t.

Γi(x)− ρ−2Σλi

`S` βi(x)

βTi (x) αi(x)− Σλi

`(x)

º 0,

λi ≥ 0, i = 1, . . . , m,

Theorem 6 In the case of simple ellipsoidaluncertainty (i.e. L = 1), problem (SDP) is exactlyequivalent to the AARC.

(Proof based on S-lemma)

Theorem 7 In the case of L > 1, (SDP) is a tightapproximation of the AARC. Specifically, the projectionon the x-space of the feasible set of (SDP) iscontained in the feasible set of the AARC, theperturbation level being ρ, and contains the feasible setof the AARC, the perturbation level being Ωρ, where

Ω = 0(1) ln L

In particular, the optimal value in (SDP) is in-betweenthe optimal values of the AARC’s corresponding to theperturbation levels ρ and Ωρ.

Proof based on Approx. S-Lemma

13

Application: Inventory Model

Consider a single product inventory system, which iscomprised of a warehouse and of I factories. Theplanning horizon is T periods. At a period t:

• dt is the demand for the product. All the demandsmust be satisfied;

• v(t) is the amount of the product in the warehouseat the beginning of the period (v(1) is given);

• pi(t) – the i’th order of the period – is the amountof the product that will be produced during theperiod by factory i and used to satisfy the demandof the period (and, perhaps, to replenish thewarehouse); these are the decision variables.

• Pi(t) is the maximal prod. capacity of factory i;

• ci(t) is the cost of producing one unit of theproduct at factory i.

Other parameters of the problem are:

• Vmin is the minimal level of inventory that must beat the warehouse at each given moment;

• Vmax is the maximal storing capacity of thewarehouse;

• Qi is the total maximal production capacity of thei’th factory throughout the planning horizon.

1

Our goal is to minimize the total production cost overall factories and the entire planning horizon. TheLinear Programming problem modeling this is thefollowing:

minpi(t),v(t),F

F

T∑t=1

I∑

i=1

ci(t)pi(t) ≤ F

0 ≤ pi(t) ≤ Pi(t), 1 ≤ i ≤ I, 1 ≤ t ≤ TT∑

t=1

pi(t) ≤ Q(i), 1 ≤ i ≤ I

v(t + 1) = v(t) +I∑

i=1

pi(t)− dt, 1 ≤ t ≤ T

Vmin ≤ v(t) ≤ Vmax, 2 ≤ t ≤ T + 1 .

We can eliminate the v variables:

minpi(t),F

F

T∑t=1

I∑

i=1

ci(t)pi(t) ≤ F

0 ≤ pi(t) ≤ Pi(t), 1 ≤ i ≤ I, 1 ≤ t ≤ TT∑

t=1

pi(t) ≤ Q(i), 1 ≤ i ≤ I

Vmin ≤ v(1) +t∑

s=1

I∑

i=1

pi(s)−t∑

s=1

ds ≤ Vmax, 1 ≤ t ≤ T .

2

The decision on orders pi(t) is made at the beginning ofperiod t, and we are allowed to make these decisions onthe basis of demands dr, r ∈ It, where It is a givensubset of the segment 1, . . . , t. The only uncertaindata in the problem are the demands dt, and all weknow is that

dt ∈ [d∗t − θd∗t , d∗t + θd∗t ] , t = 1, . . . , T ,

with given positive θ and d∗t . Applying the AARCmethodology, we restrict our decision-making policywith affine decision rules

pi(t) = π0i,t +

r∈It

πri,tdr ,

where the coefficients πri,t are our new decision

variables. With this approach, we get an AARCproblem which is itself an LP.

3

Illustrative Example: The Data

There are 3 factories for production of umbrellas, andone warehouse. The decisions concerning productionare made every two weeks, and we are planningproduction for one year. Thus, the time horizon is24 periods. The nominal demand d∗t is seasonal,reaching its maximum in winter, and behaves accordingto the following function:

d∗t = 1000(

1− 12

sin(

π(t− 1)12

)), t = 1, . . . , 24.

We assume that the uncertainty level is θ, i.e.,dt ∈ [(1− θ)d∗t , (1 + θ)d∗t ].

4

The production costs per unit of the product dependon the factory and on time, and follow the sameseasonal pattern as the demand, i.e., rise in winter andfall in summer. Cost per unit for a factory i at a periodt is given by:

ci(t) = αi

(1− 1

2sin

(π(t− 1)

12

)), t = 1, . . . , 24

α1 = 1, α2 = 1.5, α3 = 2 .

The maximal production capacity of each one of thefactories at each two-week period is Pi(t) = 567 units,and the total production capacity of each one of thefactories for a year is Qi = 13600. The inventory atthe warehouse should not be less than 500 (Vmin) units,and cannot exceed 2000 (Vmax) units.

The initial inventory at warehouse is v(1) = 1250.

5

The Experiments

In every one of the experiments, the correspondingmanagement policy was tested against a given number(100) of simulations; in every one of the simulations,the actual demand dt of period t was drawn at random,according to the uniform distribution, from thesegment [(1− θ)d∗t , (1 + θ)d∗t ], where θ was the“uncertainty level” characteristic for the experiment.The demands of distinct periods were independent ofeach other.

We conducted two series of experiments.

1. We checked the influence of the demanduncertainty θ on the total production costscorresponding to the management policy yielded bythe optimal solution to the AARC. We compared

this cost to the “Perfect Hindsight” (PH) one, i.e.,the cost we would have paid in the case when allthe demands were known to us in advance and wewere using the corresponding optimal managementpolicy as given by the optimal solution of certainLP.

6

The Results

• The influence of the uncertainty level on themanagement cost.

Here we tested the robustly adaptive managementpolicy with the standard information basis againstdifferent levels of uncertainty, specifically, the levels of20%, 10% and 5%.

AARC cost PH cost

Uncertainty Mean Mean “price of

robustness”

5% 24615 23782 3.5%

10% 25211 23607 5.9%

20% 26860 23735 13%

Std. in range 100–780.

8

The Results

• The influence of the “information basis” on themanagement cost: Robust Counterpart

An interesting question is what is the uncertainty levelwhich still allows for a priori decision. To get ananswer, we solved the RC of our uncertain problem forthe uncertainty levels 20%, 10% and 5% and all thetime got infeasible RC’s. Only at the uncertaintylevel as small as 2.5%, the RC becomes feasible andyields the management costs as follows:

RC cost PH cost

Uncertainty Mean Mean price ofrobustness

2.5% 25287 23842 6%

AARC → 2%

Note that even at this unrealistically small uncertaintylevel, the price of robustness for the policy yielded bythe RC is larger by 6% than the PH cost (while for theAARC this difference is just 2%).

10

The Price of Robustness?

Example: Robust Optimization of Multi-Period

Production Planning under Uncertainty

2

Problem definition

Single-product production planning problem

Finite horizon T

Uncertain demand

Discrete periods

Expected (nominal) demand follows a typical life-cycle

pattern: 2 3 4 50.17 0.25 1.23 0.18 0.004 0.0002d t t t t t t

-2

0

2

4

6

8

10

12

14

16

0 1 2 3 4 5 6 7 8 9 10 11 12

Period

Ex

pe

cte

d d

em

an

d

Nominal

Poly. (Nominal)

Retirement

Maturity

Growth

Introduction

3

Problem definition (cont.)

In each period t=1,…,T :

Holding and shortage costs incurred for each unit of

surplus or shortage, respectively

Income is realized through sales

In each period t=1,…,T-1 :

Unsatisfied demand is backlogged

In period T :

Salvage value is gained for surplus units

Unsatisfied demand is lost

4

Mathematical model

The model:

Denote

planning horizonT

production cost per unitctproduction quantity in unitsqt

selling price per unitmtdemand in unitsdt

salvage value per unitsTnominal demand in unitsdt

holding cost per unithtinventory levelIt

shortage cost per unitptinitial inventory (non-negative) k

0

1

, ; max 0, ; max 0,t

t t

t t t t t tI I q d q d I I I I I

1

0

1

, max 0, max 0,

1

max 0, max 0,

T

t t T T T T

t

Tq

t t t t t t

t

F q d m d m I s I

Max

c q h I p I

1 1

2 2t t

t t

q d

q dq d

q d

SalesSalvage

Production Holding Shortage

5

LP model

The piecewise linear model (1) can be written as

the following LP model:

0

1 1 1

0

1

,

. .

01,..., 2

, 0

T T t

t t T T t t t t t t i i

t t i

t

t i i

i

t t

q IMax F

s t

m d m I c q p I h I I q d F

I I q dt T

I q

9

Affinely Adjustable Robust

Counterpart (AARC)

Two types of decision variables: adjustable and

non-adjustable

Non-adjustable should be determined before the

uncertain data is revealed

Adjustable can depend on past realizations of

the uncertain data

In the AARC the dependence of the adjustable

variables on past data is a linear decision rule

(LDR) as follows:

1

2

0

04t

t

r

t t t r

r H

r

t t t r

r H

q d

I d

12

The price of robustness

One of the limitations of the RO methodology is

its conservative optimal solutions. Its guaranteed

objective value might be inferior as compared

with the nominal solution (NOM)

The difference between the nominal and the

robust objective values is called the price of

robustness (POR)

13

The actual price of robustness

In practice it is rare that the demand takes the worst

case scenario values POR is inappropriate

We base the comparison on L demand simulations:

- the average "actual" profit

according to the policy of method A

Define the actual price of robustness (APOR) for

method A as the difference between AP(qNOM) and

AP(qA)

1

1,

Ll

A A

l

AP q F q dL

14

Simulations

For several uncertainty levels ρ three sets of L=100

demand vectors were generated

Each vector consists of T=12 entries

The demand vectors entries of each set were generated

from a Beta distribution with specific shape parameters

supported by the uncertainty set for the demand

Robust Solutions of Uncertainly Affected

Linear Dynamic Systems

♣ Generic application: Affine control of uncertainty-affected Linear Dynami-cal Systems.♠ Consider Linear Time-Varying Dynamical system

xt+1 = Atxt + Btut + Rtdt

yt = Ctxt

x0 = z(S)

• xt: state; • ut: control • yt: output;• dt: uncertain input; • z: initial state

to be controlled over finite time horizon t = 0, 1, ..., T .♠ Assume that a “desired behaviour” of the system is given by a systemof convex inclusions

Diw − bi ∈ Qi, i = 1, ..., m

on the state-control trajectory

w = (x0, x1, ..., xT+1, u0, u1, ..., uT ),

and the goal of the control is to minimize a given linear objective f (w).

24

xt+1 = Atxt + Btut + Rtdt

yt = Ctxt

x0 = z(S)

♠ Restricting ourselves with affine output-based control laws

ut = ξt0 +t∑

τ=0Ξtτyτ , (∗)

the problem of interest is

(!) Find an affine control law (∗) which ensures that the resulting state-control trajectory w satisfies the system of convex inclusions

Diw − bi ∈ Qi, i = 1, ..., m

and minimizes, under this restriction, a given linear objective f (w).

Dynamics (S) makes w a known function of inputs d = (d0, d1, ..., dT ), theinitial state z and the parameters ξ of the control law (∗):

w = W (ξ; d, z).

Consequently, (!) is the optimization problem

minξ

f (W (ξ; d, z)) : DiW (ξ; d, z) − bi ∈ Qi, i = 1, ..., m (U)

25

open loop dynamics:

xt+1 = Atxt + Btut + Rtdt

yt = Ctxt

x0 = z

control law: ut = ξt0 +t∑

τ=0Ξtτyτ

⇓w := (u0, ..., uT , x0, ..., xT+1) = W (ξ; d, z)

⇓min

ξf (W (ξ; d, z)) : DiW (ξ; d, z) − bi ∈ Qi, i = 1, ..., m (U)

Note: Due to presence of uncertain input trajectory d and possibleuncertainty in the initial state, (U) is an uncertain problem.Difficulty: While linearity of the dynamics and the control law makeW (ξ; d, z) linear in (d, z), the dependence of W (·, ·) on the parametersξ = ξt0, Ξtτ0≤τ≤t≤T of the control law is highly nonlinear⇒ (U) is not a bi-affine problem, which makes inapplicable the theorywe have developed. In fact, (U) seems to be intractable already whenthere is no uncertainty in d, z!

26

Remedy: suitable re-parameterization of affine control laws.

♣ Consider a closed loop system along with its model:

closed loop system: model:xt+1 = Atxt + Btut+Rtdt

xt+1 = Atxt + Btut

yt = Ctxtyt = Ct

xt

x0 = z x0 = 0ut = Ut(y0, ..., yt)

♠ Observation: We can run the model in an on-line fashion, so thatat time t, before the decision on ut should be made, we have in ourdisposal purified outputs

vt = yt − yt.

♠ Fact I [Equivalence]: Every transformation (d, z) 7→ w which can beobtained from an affine control law based on outputs:

ut = ξt0 +t∑

τ=0Ξtτyτ (∗)

can be obtained from an affine control law based on purified outputs:

ut = ηt0 +t∑

τ=0Htτvτ (∗∗)

and vice versa.

28

system: model:xt+1 = Atxt + Btut+Rtdt

xt+1 = Atxt + Btut

yt = Ctxtyt = Ct

xt

x0 = z x0 = 0control law:

vt = yt − yt

ut = ηt0 +t∑

τ=0Htτvτ (∗∗)

(S)

♠ Fact II [bi-affinity]: The state-control trajectory w = W (η; d, z) of (S)is affine in (d, z) when the parameters η = ηt0, Htτ0≤τ≤t≤T of the controllaw (∗∗) are fixed, and is affine in η when (d, z) is fixed.

29

♠ Corollary: With parameterization (∗∗) of affine control laws, the problem

Find an affine control law (∗) which ensures that the resulting state-controltrajectory w satisfies the system of convex inclusions

Diw − bi ∈ Qi, i = 1, ..., m

and minimizes, under this restriction, a given linear objective f (w).

becomes an uncertain bi-affine optimization problem and as such can be processedvia the CRC approach.

In particular, in the case when Qi are one-dimensional, the CRC of the problemis computationally tractable, provided that the normal range U of (d, z) and theassociated cone L are so. If U , L and the norms used to measure distances arepolyhedral, CRC is just an explicit LP program.

30

22

Supply chain control – GRC implementation for control problems

xtj=amount echelon j orders from j-1 at the beginning of

period tYt

j =inventory level in echelon j at the end of period tzj =initial inventory level at echelon jdt =external demand at period t

TL(j) =I(j) +M(j-1) +L(j) the delay between the time an order is placed and received in echelon j.TM(j) =I(j+1) +M(j) the delay between the time an order is placed and shippedfrom echelon

23

Supply chain control – GRC implementation for control problems

Main objective : minimizing costSub objective:stabilizing the systemProblem Characteristics:

Finite horizonMulti echelonDelaysBackloggingDemand must be satisfied and is uncertain

24

Supply chain control

Eliminating the equalities recursively will give us a LP problem of the form we discussed.How do we control this system? What are the consequences of different types of control?

25

LV control

Lets assume we take the control suggested by Love [Love,1979] usingtarget inventory:Resulting in the LP problem:

26

ILV control

We can further improve this method by making the reference inventory a decision variable rather than a constant.

29

GRC control

Applying the GRC to the AARC problem assuming

Robust Counterpart (AARC)

ytj = yt

j(d, z) affine functionxt

j = xtj(d, z)

constraints must hold ∀ d ∈ D, z ∈ Z

CRC A typical constraint is of the form

Fi(d, z) ∈ Ki (Ki = interval)

Its CRC version being

When Dt, Zj are polyhedral (*) can be converted to linearinequalities.

( ) ( )( ) ( ) ( )jjm

jijtt

n

titii ZzDdKzdF ,dist,dist,,dist

11∑∑==

+≤∗ αα

The purified outputs corresponding to the dynamic system(1) – (3) are here

The affine control law is here

where (non anticipativity)

( )

⎪⎪⎪

⎪⎪⎪

<

=−

=

∑−

=

mjz

mjdz

vj

mMTt

tzm

jt

0

10 if

∑ ∑= =

+=m

l

Tljtx

ljtxj

t vx1 1

,,,,0

τττηη

tjtxl ≥∀= τητ 0,,

30

Example

[Love, 1979], Oscillating demand:

Horizon: n=20Echelons: m=3Cost: c=2, p=3,h=1Initial inventory: z=12Lead time: L=2No other delays

31

Inventory Behavior – “amplification of oscillation”

RCOFV: 1207

AARCOFV: 1165

GRCOFV: 1065

LVOFV: 4795

ILVOFV: 3140