Introduction to Optimization: Basic...

25
Introduction to Optimization Basic concepts Instructor: Wotao Yin Department of Mathematics, UCLA Spring 2015 based on Chong-Zak, 4th Ed.

Transcript of Introduction to Optimization: Basic...

Page 1: Introduction to Optimization: Basic conceptswotaoyin/zjuopt/ZJU_opt_Lec1_basics_of_optimization.pdf · Introduction to Optimization Basic concepts Instructor: Wotao Yin Department

Introduction to Optimization

Basic concepts

Instructor: Wotao YinDepartment of Mathematics, UCLA

Spring 2015

based on Chong-Zak, 4th Ed.

Page 2: Introduction to Optimization: Basic conceptswotaoyin/zjuopt/ZJU_opt_Lec1_basics_of_optimization.pdf · Introduction to Optimization Basic concepts Instructor: Wotao Yin Department

Goals of this lecture

The general form of optimization:

minimize f (x)

subject to x ∈ Ω

We study the following topics:

• terminology

• types of minimizers

• optimality conditions

Page 3: Introduction to Optimization: Basic conceptswotaoyin/zjuopt/ZJU_opt_Lec1_basics_of_optimization.pdf · Introduction to Optimization Basic concepts Instructor: Wotao Yin Department

Unconstrained vs constrained optimization

minimize f (x)

subject to x ∈ Ω

Suppose x ∈ Rn , Ω is called the feasible set.

• if Ω = Rn , then the problem is called unconstrained.

• otherwise, the problem is called constrained.

In general, more sophisticated techniques are needed to solve constrained

problems.

Page 4: Introduction to Optimization: Basic conceptswotaoyin/zjuopt/ZJU_opt_Lec1_basics_of_optimization.pdf · Introduction to Optimization Basic concepts Instructor: Wotao Yin Department

Types of solutions

• x∗ is a local minimizer if there is ε > 0 such that f (x) ≥ f (x∗) for all

x ∈ Ω \ {x∗} and ‖x − x∗‖ < ε

• x∗ is a global minimizer if f (x) ≥ f (x∗) for all x ∈ Ω \ {x∗}

• If “≥” is replaced with “>”, then they are strict local minimizer and strict

global minimizer, respectively.

x1: strict global minimizer; x2: strict local minimizer; x3: local minimizer

Page 5: Introduction to Optimization: Basic conceptswotaoyin/zjuopt/ZJU_opt_Lec1_basics_of_optimization.pdf · Introduction to Optimization Basic concepts Instructor: Wotao Yin Department

Convexity and global minimizers

• A set Ω is convex if λx + (1− λ)y ∈ Ω for any x, y ∈ Ω and λ ∈ [0, 1].

• A function is convex if

f (λx + (1− λ)y) ≤ λf (x) + (1− λ)f (y)

for any x, y ∈ Ω and λ ∈ [0, 1].

A function is convex if and only if its graph is convex.

• An optimization problem is convex if both the objective function and

feasible set are convex.

• Theorem: Any local minimizer of a convex optimization problem is a

global minimizer.

Page 6: Introduction to Optimization: Basic conceptswotaoyin/zjuopt/ZJU_opt_Lec1_basics_of_optimization.pdf · Introduction to Optimization Basic concepts Instructor: Wotao Yin Department

Derivatives

• First-order derivative: row vector

• Gradient of f : ∇f = (Df )T , which is a column vector.

• A gradient represents the slope of the tangent of the graph of function. It

gives the linear approximation of f at a point. It points toward the

greatest rate of increase.

Page 7: Introduction to Optimization: Basic conceptswotaoyin/zjuopt/ZJU_opt_Lec1_basics_of_optimization.pdf · Introduction to Optimization Basic concepts Instructor: Wotao Yin Department

• Hessian (i.e., second-derivative) of f :

which is a symmetric matrix. (F(x))ij = ∂2f∂xi∂xj

= ∂2f∂xj∂xi

.

• For one-dimensional function f (x) where x ∈ R, it reduces to f ′′(x).

• F(x) is the Jacobian of ∇f (x), that is, F(x) = J (∇f (x)).

• Alternative notation: H (x) and ∇2f (x) are also used for Hessian.

• A Hessian gives a quadratic approximation of f at a point.

• Gradient and Hessian are local properties that help us recognize local

solutions and determine a direction to move at toward the next point.

Page 8: Introduction to Optimization: Basic conceptswotaoyin/zjuopt/ZJU_opt_Lec1_basics_of_optimization.pdf · Introduction to Optimization Basic concepts Instructor: Wotao Yin Department

Example

Consider

f (x1, x2) = x31 + x2

1 − x1x2 + x22 + 5x1 + 8x2 + 4

Then,

∇f (x) =

[3x2

1 + 2x1 − x2 + 5

−x1 + 2x2 + 8

]∈ R2

and

F(x) =

[6x1 + 2 −1

−1 2

]∈ R2×2.

Observation: if f is a quadratic function (remove x31 in the above example),

∇f (x) is a linear vector and F(x) is a symmetric constant matrix for any x.

Page 9: Introduction to Optimization: Basic conceptswotaoyin/zjuopt/ZJU_opt_Lec1_basics_of_optimization.pdf · Introduction to Optimization Basic concepts Instructor: Wotao Yin Department

Taylor expansion

Suppose φ ∈ Cm (m times continuously differentiable). The Taylor expansion

of φ at a point a is

φ(b) = φ(0) + φ′(a)h +φ′′(a)

2!h2 + ∙ ∙ ∙+

φm

m!hm + o(hm).1

There are other ways to write the last two terms.

Example: Consider x,d ∈ Rn and f ∈ C2. Define φ(α) = f (x + αd). Then,

φ′(α) = ∇f (x + αd)T d

φ′′(α) = dT F(x + αd)T d

Hence,

f (x + αd) = f (x) +(∇f (x + αd)T d

)α+ o(α)

= f (x) +(∇f (x + αd)T d

)α+

dT F(x + αd)T d2

α2 + o(α2).

1o(α) collects the term(s) that is “asymptotically smaller than α” near 0, that is, o(α)α→ 0, as α ↓ 0.

Page 10: Introduction to Optimization: Basic conceptswotaoyin/zjuopt/ZJU_opt_Lec1_basics_of_optimization.pdf · Introduction to Optimization Basic concepts Instructor: Wotao Yin Department

Feasible direction

• A vector d ∈ Rn is a feasible direction at x ∈ Ω if d 6= 0 and x + αd ∈ Ω

for some small α > 0.

(It is possible that d is an infeasible step, that is, x + d 6∈ Ω. But if there is

some room in Ω to move from x toward d, then d is a feasible direction.)

d1 is feasible, d2 is infeasible

• If Ω = Rn or x lies in the interior of Ω, then any d ∈ Rn \ {0} is a feasible

direction

• Feasible directions are introduced to establish optimality conditions,

especially for points on the boundary of a constrained problem

Page 11: Introduction to Optimization: Basic conceptswotaoyin/zjuopt/ZJU_opt_Lec1_basics_of_optimization.pdf · Introduction to Optimization Basic concepts Instructor: Wotao Yin Department

First-order necessary condition

Let C1 be the set of continuously differentiable functions.

Proof: Let d by any feasible direction. First-order Taylor expansion:

f (x∗ + αd) = f (x∗) + αdT∇f (x∗) + o(α).

If dT∇f (x∗) < 0, which does not depend on α, then f (x∗ + αd) < f (x∗) for

all sufficiently small α > 0 (that is, all α ∈ (0, α) for some α > 0). This is a

contradiction since x∗ is a local minimizer.

Page 12: Introduction to Optimization: Basic conceptswotaoyin/zjuopt/ZJU_opt_Lec1_basics_of_optimization.pdf · Introduction to Optimization Basic concepts Instructor: Wotao Yin Department

Proof: Since any d ∈ Rn \ {0} is a feasible direction, we can set

d = −∇f (x∗). From Theorem 6.1, we have dT∇f (x∗) = −‖∇f (x∗)‖2 ≥ 0.

Since ‖∇f (x∗)‖2 ≥ 0, we have ‖∇f (x∗)‖2 = 0 and thus ∇f (x∗) = 0.

Comment: This condition also reduces the problem

minimize f (x)

to solving the equation

∇f (x∗) = 0.

Page 13: Introduction to Optimization: Basic conceptswotaoyin/zjuopt/ZJU_opt_Lec1_basics_of_optimization.pdf · Introduction to Optimization Basic concepts Instructor: Wotao Yin Department

x1 fails to satisfy the FONC; x2 satisfies the FONC

Page 14: Introduction to Optimization: Basic conceptswotaoyin/zjuopt/ZJU_opt_Lec1_basics_of_optimization.pdf · Introduction to Optimization Basic concepts Instructor: Wotao Yin Department

Second-order necessary condition

In FONC, there are two possibilities

• dT∇f (x∗) > 0;

• dT∇f (x∗) = 0.

In the first case, f (x∗ + αd) > f (x∗) for all sufficiently small α > 0.

In the second case, the vanishing dT∇f (x∗) allows us to check higher-order

derivatives.

Page 15: Introduction to Optimization: Basic conceptswotaoyin/zjuopt/ZJU_opt_Lec1_basics_of_optimization.pdf · Introduction to Optimization Basic concepts Instructor: Wotao Yin Department

Let C2 be the set of twice continuously differentiable functions.

Proof: Assume that ∃ a feasible direction d with dT∇f (x∗) = 0 and

dT F(x∗)d < 0. By 2nd-order Taylor expansion (with a vanishing 1st order

term), we have

f (x∗ + αd) = f (x∗) +dT F(x∗)d

2α2 + o(α2),

where by our assumption dT F(x∗)d < 0. Hence, for all sufficiently small

α > 0, we have f (x∗ + αd) < f (x∗), which contradicts that x∗ is a local

minimizer.

Page 16: Introduction to Optimization: Basic conceptswotaoyin/zjuopt/ZJU_opt_Lec1_basics_of_optimization.pdf · Introduction to Optimization Basic concepts Instructor: Wotao Yin Department
Page 17: Introduction to Optimization: Basic conceptswotaoyin/zjuopt/ZJU_opt_Lec1_basics_of_optimization.pdf · Introduction to Optimization Basic concepts Instructor: Wotao Yin Department

The necessary conditions are not sufficient

Counter examples

f (x) = x3, f ′(x) = 3x2, f ′′(x) = 6x

f (x) = x21 − x2

2

0 is a saddle point: ∇f (0) = 0 but

neither a local minimizer nor maximizer

By SONC, 0 is not a local minimizer!

Page 18: Introduction to Optimization: Basic conceptswotaoyin/zjuopt/ZJU_opt_Lec1_basics_of_optimization.pdf · Introduction to Optimization Basic concepts Instructor: Wotao Yin Department

Second-order sufficient condition

Comments:

• part 2 states F(x∗) is positive definite: xT F(x∗)x > 0 for x 6= 0.

• the condition is not necessary for strict local minimizer.

Proof: For any d 6= 0 and ‖d‖ = 1, we have dT F(x∗)d ≥ λmin(F(x∗)) > 0.

Use the 2nd order Taylor expansion

f (x∗+αd) = f (x∗)+α2

2dT F(x∗)d+o(α2) ≥ f (x∗)+

α2

2λmin(F(x∗))+o(α2).

Then, ∃ α > 0, regardless of d, such that f (x∗ + αd) > f (x∗), α ∈ (0, α).

Page 19: Introduction to Optimization: Basic conceptswotaoyin/zjuopt/ZJU_opt_Lec1_basics_of_optimization.pdf · Introduction to Optimization Basic concepts Instructor: Wotao Yin Department

Graph of f (x) = x21 + x2

2

The point 0 satisfies the SOSC.

Page 20: Introduction to Optimization: Basic conceptswotaoyin/zjuopt/ZJU_opt_Lec1_basics_of_optimization.pdf · Introduction to Optimization Basic concepts Instructor: Wotao Yin Department

Roles of optimality conditions

• Recognize a solution: given a candidate solution, check optimality

conditions to verify it is a solution.

• Measure the quality of an approximate solution: measure how “close” a

point is to being a solution

• Develop algorithms: reduce an optimization problem to solving a

(nonlinear) equation (finding a root of the gradient).

Later, we will see other forms of optimality conditions and how they lead

to equivalent subproblems, as well as algorithms

Page 21: Introduction to Optimization: Basic conceptswotaoyin/zjuopt/ZJU_opt_Lec1_basics_of_optimization.pdf · Introduction to Optimization Basic concepts Instructor: Wotao Yin Department

Quiz questions

1. Show that for Ω = {x ∈ Rn : Ax = b}, d 6= 0 is a feasible direction at

x ∈ Ω if and only if Ad = 0.

2. Show that for any unconstrained quadratic program, which has the form

minimize f (x) :=12

xT Qx − bT x,

if x∗ satisfies the second-order necessary condition, then x∗ is a global

minimizer.

3. Show that for any unconstrained quadratic program with Q ≥ 0 (Q is

symmetric and positive semi-definite), x∗ is a global minimizer if and only

if x∗ satisfies the first-order necessary condition. That is, the problem is

equivalent to solving Qx = b.

4. Consider minimize cT x, subject to x ∈ Ω. Suppose that c 6= 0 and the

problem has a global minimizer. Can the minimizer lie in the interior of Ω?

Page 22: Introduction to Optimization: Basic conceptswotaoyin/zjuopt/ZJU_opt_Lec1_basics_of_optimization.pdf · Introduction to Optimization Basic concepts Instructor: Wotao Yin Department

1. Show that for Ω = {x ∈ Rn : Ax = b}, d 6= 0 is a feasible direction at

x ∈ Ω if and only if Ad = 0.

Proof: d 6= 0 is a feasible direction ⇐⇒ ∃α > 0 such that A(x + αd) = b

⇐⇒ Ad = 0 (since Ax = b).

Page 23: Introduction to Optimization: Basic conceptswotaoyin/zjuopt/ZJU_opt_Lec1_basics_of_optimization.pdf · Introduction to Optimization Basic concepts Instructor: Wotao Yin Department

2. Show that for any unconstrained quadratic program, which has the form

minimize f (x) :=12

xT Qx − bT x,

if x∗ satisfies the second-order necessary condition, x∗ is a global minimizer.

Proof: Since the problem is unconstrained, any point is an interior point (of

the feasible set Rn). By SONC (Corollary 6.2), ∇f (x∗) = Qx∗ − b = 0 and

F(x∗) = Q > 0, that is dT Qd ≥ 0 for any d ∈ Rn .

Pick any x ∈ Rn \ {x∗} and set d = x − x∗. Then

f (x)− f (x∗) = (12

xT Qx − bT x)− (12

x∗T Qx∗ − bT x∗)

= (12

xT Qx − x∗T Qx)− (12

x∗T Qx∗ − x∗T Qx∗)

=12

xT Qx − x∗T Qx +12

x∗T Qx∗

=12

dT Qd ≥ 0.

Therefore, x∗ is a global minimizer (not necessarily strict).

Page 24: Introduction to Optimization: Basic conceptswotaoyin/zjuopt/ZJU_opt_Lec1_basics_of_optimization.pdf · Introduction to Optimization Basic concepts Instructor: Wotao Yin Department

3. Show that for any unconstrained quadratic program with Q ≥ 0 (Q is

symmetric and positive semi-definite), x∗ is a global minimizer if and only if x∗

satisfies the first-order necessary condition. That is, the problem is equivalent

to solving Qx = b.

Proof: The problem is unconstrained, so the FONC is ∇F(x∗) = Qx∗− b = 0.

“⇐=” The assumption Q ≥ 0 along with the FONC means that x∗ satisfies

the SONC. By the last quiz question, x∗ is a global minimizer.

“=⇒” As a global minimizer, x∗ is automatically a local minimizer and thus

must satisfy the first-order necessary condition.

Three possible cases:

• if Qx = b has the unique solution x∗, it is also the strict global minimizer;

• if Qx = b has the infinitely many solutions, they are all global minimizers

(they achieve the save optimal objective);

• if Qx = b has no solution, then the optimization problem is unbounded

(the objective can be made as small as possible).

Page 25: Introduction to Optimization: Basic conceptswotaoyin/zjuopt/ZJU_opt_Lec1_basics_of_optimization.pdf · Introduction to Optimization Basic concepts Instructor: Wotao Yin Department

4. Consider minimize cT x, subject to x ∈ Ω. Suppose that c 6= 0 and the

problem has a global minimizer. Can the minimizer lie in the interior of Ω?

Answer: No.

Proof: (Proof by contradiction.) Suppose x∗ (exists by assumption) lies in the

interior of Ω. Then there exists a ball B = {x : ‖x − x∗‖ < ε} ∈ Ω, for some

ε > 0, such that cT x ≥ cT x∗ for all x ∈ B. However, pick x = x∗ − ε2 c ∈ B,

and we have cT x = cT x∗ − ε2‖c‖2 < cT x∗ since c 6= 0. A contradiction is

reached.