Optimisation Lecture 3 - Easter 2017mike/optimisation/lecture3.pdf · 2017. 5. 3. · If the...

5
Optimisation Michael Tehranchi Lecture 3 - Easter 2017 Lagrangian necessity Consider the problem minimise f (x) subject to g(x)= b, x X. Let L(x, λ)= f (x)+ λ > (b - g(x)) be the Lagrangian. Notice that for any Lagrange multiplier λ Λ we have inf xX, g(x)=b f (x)= inf xX, g(x)=b [f (x)+ λ > (b - g(x))] = inf xX, g(x)=b L(x, λ) inf xX L(x, λ) by the inclusion {x X : g(x)= b}⊆ X . We will say that the Lagrangian method works if there exists a Lagrange multiplier λ * such that there is equality, that is, inf xX,g(x)=b f (x) = inf xX L(x, λ * ). When does the Lagrangian method work? To answer this question, we need to define some terms. We say that a function ψ : R m R has a supporting hyperplane at a point b R m if there exists a λ R m such that ψ(c) ψ(b)+ λ > (c - b) for all c R m . The situation is illustrated in the case m = 1 below. Now return to the optimisation problem at hand. Define a function ϕ on R m by ϕ(c)= inf xX, g(x)=c f (x). 1

Transcript of Optimisation Lecture 3 - Easter 2017mike/optimisation/lecture3.pdf · 2017. 5. 3. · If the...

Page 1: Optimisation Lecture 3 - Easter 2017mike/optimisation/lecture3.pdf · 2017. 5. 3. · If the Lagrangian method works and if the value function is di erentiable, then the gradient

Optimisation Michael TehranchiLecture 3 - Easter 2017

Lagrangian necessityConsider the problem

minimise f(x) subject to g(x) = b, x ∈ X.

Let

L(x, λ) = f(x) + λ>(b− g(x))

be the Lagrangian. Notice that for any Lagrange multiplier λ ∈ Λ we have

infx∈X, g(x)=b

f(x) = infx∈X, g(x)=b

[f(x) + λ>(b− g(x))]

= infx∈X, g(x)=b

L(x, λ)

≥ infx∈X

L(x, λ)

by the inclusion {x ∈ X : g(x) = b} ⊆ X. We will say that the Lagrangian method works ifthere exists a Lagrange multiplier λ∗ such that there is equality, that is,

infx∈X,g(x)=b

f(x) = infx∈X

L(x, λ∗).

When does the Lagrangian method work? To answer this question, we need to definesome terms.

We say that a function ψ : Rm → R has a supporting hyperplane at a point b ∈ Rm ifthere exists a λ ∈ Rm such that

ψ(c) ≥ ψ(b) + λ>(c− b)

for all c ∈ Rm. The situation is illustrated in the case m = 1 below.

Now return to the optimisation problem at hand. Define a function ϕ on Rm by

ϕ(c) = infx∈X, g(x)=c

f(x).

1

Page 2: Optimisation Lecture 3 - Easter 2017mike/optimisation/lecture3.pdf · 2017. 5. 3. · If the Lagrangian method works and if the value function is di erentiable, then the gradient

The function ϕ is called the value function for the problem. Of course, we are really interestedthe case c = b, but we will now see that it is useful to let the right-hand side of our problemvary.

Theorem (Lagrangian necessity). The Lagrangian method works for the problem if andonly if the value function has a supporting hyperplane at b.

Proof: The Lagrangian method works iff there exists a λ such that

ϕ(b) = infx∈X

[f(x) + λ>(b− g(x))]

The value function has a supporting hyperplane at b if and only if there exists a λ such that

ϕ(b) = infc∈Rm

[φ(c) + λ>(b− c)].

Therefore, the equivalence of the two hypotheses is proven by noting the equality

infx∈X

[f(x) + λ>(b− g(x))] = infc∈Rm

infx∈X,g(x)=c

[f(x)︸ ︷︷ ︸φ(c)

+λ>(c− g(x)︸ ︷︷ ︸=0

) + λ>(b− c)]

= infc∈Rm

[φ(c) + λ>(b− c)].

�Shadow prices

We now consider another interpretation of Lagrange multipliers. We start with a littleresult:

Theorem. Suppose that ψ : Rm → R is differentiable and that for a fixed b ∈ Rm thereexists a λ ∈ Rm such that

ψ(c) ≥ ψ(b) + λ>(c− b)for all c ∈ Rm; that is, there is a supporting hyperplane at b. Then the gradient of ψ at b isequal to λ, that is,

∂ψ

∂bi= λi for all i.

Proof. Fix a ∈ Rm and ε > 0. By the supporting hyperplane assumption we have that

ψ(b+ εa)− ψ(b)

ε≥ λ>a.

Taking the limit as ε ↓ 0 and the assumption of differentiability yields the inequalitym∑i=1

∂ψ

∂biai ≥

m∑i=1

λiai.

Since a was arbitrary, we could replace a with −a in the above inequality to conclude thatm∑i=1

∂ψ

∂biai ≤

m∑i=1

λiai.

Hence there is equality. And since a was arbitrary, we have ∂ψ∂bi

= λi for all i as claimed. �

A corollary of the above theorem and the proof of Lagrangian necessity is this:2

Page 3: Optimisation Lecture 3 - Easter 2017mike/optimisation/lecture3.pdf · 2017. 5. 3. · If the Lagrangian method works and if the value function is di erentiable, then the gradient

If the Lagrangian method works and if the value function is differentiable, then the gradientof the value function is the Lagrange multiplier.

We are now ready to give an economic interpretation of this fact. Consider a factory ownerwho makes n different products out of m raw materials.

• He needs to choose amount xi to make of the i-th product for each i = 1, . . . , n.• Given a vector of amounts x = (x1, . . . , xn)> of products to manufacture, the factory

requires the amount gj(x) of the j-th raw material for j = 1, . . . ,m.• The amount of the j-th raw material available is bj.• Only non-negative amounts of products can be produced.• Given the amounts x of products, the profit earned is f(x).

The factory owner then tries to solve the problem to

maximise f(x) subject to g(x) ≤ b, x ≥ 0.

Now suppose that the factory owner is offered more raw material. How much should shepay?

Let ϕ be the value function for the problem. It would be in the owner’s interest to buyan amount ε = (ε1, . . . , εm)> of raw materials if

ϕ(b+ ε)− ϕ(b) ≥ cost of additional raw material.

When ε is small, the left-hand side is approximately

ϕ(b+ ε)− ϕ(b) =m∑j=1

∂ϕ

∂bjεj

so highest price the factory owner would be willing to pay for a small amount of the j-thraw material is ∂ϕ

∂bj. For this reason, the quantity ∂φ

∂bjis called the shadow price of the j-th

raw material.(Notice that if the j-th constraint is not tight, so that gj(x

∗) < bj, then factory owner does

not use all of the available j-th raw material. Hence its shadow price ∂ϕ∂bj

is zero since there

is no extra profit in acquiring a little more. On the other hand, the j-th slack variable zj ispositive and by complementary slackness we have λj = 0. So the shadow price interpretationmakes sense in this case as well.)

We have now shown that if the Lagrangian method works and if the value function isdifferentiable, then the Lagrange multipliers can be interpreted as the shadow prices of rawmaterials.

Sufficient conditions for the existence of a supporting hyperplane.How can we check that the value function have a supporting hyperplane? We need to

define a few terms.A subset C ⊆ Rn is convex if

x, y ∈ C implies θx+ (1− θ)y ∈ C for all 0 ≤ θ ≤ 1.

On the next page is an example of a convex set on the left, and a set that is not convex onthe right.

3

Page 4: Optimisation Lecture 3 - Easter 2017mike/optimisation/lecture3.pdf · 2017. 5. 3. · If the Lagrangian method works and if the value function is di erentiable, then the gradient

A function ψ : Rm → R is convex if

ψ(θx+ (1− θ)y) ≤ θψ(x) + (1− θ)ψ(y) for all x, y ∈ Rm and 0 ≤ θ ≤ 1.

Exercise. Show that a function ψ : Rm → R is convex if and only if the set

C = {(x, y) : ψ(x) ≤ y} ⊆ Rm+1

is convex. (The set C defined above and illustrated below is called the epigraph of ψ.)

A useful and interesting (but not examinable) characterisation of convex functions is this:

Theorem. A function is convex if and only if it has a supporting hyperplane at each point1.

Now, to answer the question posed at the start of this section. Consider a minimisationproblem. The Lagrangian method works in the sense that there exists a Lagrange multiplier

1One direction of the proof is easy. Suppose ψ has a supporting hyperplane at each point. Fix x, y ∈ Rm

and 0 ≤ θ ≤ 1. Let b = θx+ (1− θ)y, and let λ ∈ Rm be such that

ψ(c) ≥ ψ(b) + λ>(c− b)for all c. Letting first c = x and then c = y in the above inequality yields

θψ(x) + (1− θ)ψ(y) ≥ ψ(b) + λ>(θx+ (1− θ)y − b) = ψ(b)

showing that ψ is convex. (This might resemble the proof of Jensen’s inequality you saw in IA Probability.)The proof of the other direction is more involved in general, but is reasonably easy in the case where ψ is

twice-differentiable. Suppose ψ is convex and fix b. By the second order mean-value theorem, there exists a4

Page 5: Optimisation Lecture 3 - Easter 2017mike/optimisation/lecture3.pdf · 2017. 5. 3. · If the Lagrangian method works and if the value function is di erentiable, then the gradient

λ∗ for the problem if the value function ϕ is convex. A problem on the example sheet is toverify that if

(1) the set X is convex,(2) the objective function f is convex, and(3) the functional constraint is

• either g(x) = b and g is linear, or• g(x) ≤ b and g is convex.

then ϕ is convex.

0 ≤ θ ≤ 1 such that

ψ(c) = ψ(b) + λ>(c− b) +1

2(c− b)>Hψ∗(c− b)

≥ ψ(b) + λ>(c− b)where λ is the gradient of ψ at b, and Hψ∗ is the Hessian of ψ evaluated at the point b∗ = θb+ (1− θ)c. Wehave used the fairly easy-to-prove fact that a twice-differentiable function is convex if and only if its Hessianmatrix is non-negative definite everywhere.

5