of 36/36
Lecture 3 Convex functions (Basic properties; Calculus; Closed functions; Continuity of convex functions; Subgradients; Optimality conditions) 3.1 First acquaintance Deﬁnition 3.1.1 [Convex function] A function f : M R deﬁned on a nonempty subset M of R n and taking real values is called convex, if the domain M of the function is convex; for any x, y M and every λ [0, 1] one has f (λx + (1 - λ)y) λf (x) + (1 - λ)f (y). (3.1.1) If the above inequality is strict whenever x 6= y and 0 <λ< 1, f is called strictly convex. A function f such that -f is convex is called concave; the domain M of a concave function should be convex, and the function itself should satisfy the inequality opposite to (3.1.1): f (λx + (1 - λ)y) λf (x) + (1 - λ)f (y), x, y M,λ [0, 1]. The simplest example of a convex function is an aﬃne function f (x)= a T x + b – the sum of a linear form and a constant. This function clearly is convex on the entire space, and the “convexity inequality” for it is equality; the aﬃne function is also concave. It is easily seen that the function which is both convex and concave on the entire space is an aﬃne function. Here are several elementary examples of “nonlinear” convex functions of one variable: 53
• date post

11-Feb-2017
• Category

Documents

• view

246

1

Embed Size (px)

Transcript of Lecture 3 Convex functions

• Lecture 3

Convex functions

(Basic properties; Calculus; Closed functions; Continuity of convex functions; Subgradients;Optimality conditions)

3.1 First acquaintance

Definition 3.1.1 [Convex function] A function f : M R defined on a nonempty subsetM of Rn and taking real values is called convex, if

the domain M of the function is convex;

for any x, y M and every [0, 1] one has

f(x+ (1 )y) f(x) + (1 )f(y). (3.1.1)

If the above inequality is strict whenever x 6= y and 0 < < 1, f is called strictly convex.

A function f such that f is convex is called concave; the domain M of a concave functionshould be convex, and the function itself should satisfy the inequality opposite to (3.1.1):

f(x+ (1 )y) f(x) + (1 )f(y), x, y M, [0, 1].

The simplest example of a convex function is an affine function

f(x) = aTx+ b

the sum of a linear form and a constant. This function clearly is convex on the entirespace, and the convexity inequality for it is equality; the affine function is also concave.It is easily seen that the function which is both convex and concave on the entire space isan affine function.

Here are several elementary examples of nonlinear convex functions of one variable:

53

• 54 LECTURE 3. CONVEX FUNCTIONS

functions convex on the whole axis:x2p, p being positive integer;

exp{x};

functions convex on the nonnegative ray:xp, 1 p;xp, 0 p 1;x lnx;

functions convex on the positive ray:1/xp, p > 0;

lnx.

To the moment it is not clear why these functions are convex; in the mean time we willderive a simple analytic criterion for detecting convexity which immediately demonstratesthat the above functions indeed are convex.

More examples of convex functions: norms. Recall that a real-valued function (x)on Rn is called a norm, if it is nonnegative everywhere, positive outside of the origin, ishomogeneous:

(tx) = |t|p(x)

and satisfies the triangle inequality

(x+ y) (x) + (y).

We are about to prove that every norm is convex:

Proposition 3.1.1 Let (x) be a real-valued function on Rn which is positively homogeneousof degree 1:

(tx) = t(x) x Rn, t 0.

is convex if and only if it is sub-additive:

(x+ y) (x) + (y) x, y Rn.

In particular, a norm (which by definition is positively homogeneous of degree 1 and is sub-additive) is convex.

Proof is immediate:

(x+ (1 )y) (x) + ((1 )y) triangle inequality= (x) + (1 )(y) homogeneity,

for any x, y Rn and 0 1.

• 3.1. FIRST ACQUAINTANCE 55

The most important (to us) examples of norms are so-called lp-norms:

|x|p =(

ni=1

|xi|p)1/p

, 1 p ,

(the right hand side of the latter relation for p = is, by definition, maxi |xi|) though itis not clear in advance that these functions are indeed norms. However, you certainly knowquite well three members of the above family of norms:

The Euclidean norm |x| = |x|2 =xTx =

ni=1 x

2i . This is indeed a norm due to the

Cauchy-Schwartz inequality.

The l1-norm |x|1 =ni=1 |xi|.

The l-norm or uniform norm (or Tchebychev norm) |x| = max1in |xi|.

Epigraph of a convex function. A very convenient equivalent definition of a convexfunction is in terms of its epigraph. Given a real-valued function f defined on a nonemptysubset M of Rn, we define its epigraph as the set

Epi (f) = {(t, x) Rn+1 | x M, t f(x)};

geometrically, to define the epigraph, you should take the graph of the function the surface{t = f(x), x M} in Rn+1 and add to this surface all points which are above it. Theequivalent, more geometrical, definition of a convex function is given by the following simple

Proposition 3.1.2+

[Definition of convexity in terms of the epigraph]A function f defined on a subset of Rn is convex if and only if its epigraph is a nonempty

convex set in Rn+1.

We will see later that the behavior of a general convex function on the boundary of itsdomain is sometimes out of control. The following definition will be useful in the futureanalysis.

Definition 3.1.2 [Closed convex function] A convex function f is called closed if its epi-graph is a closed set.

Note that the function which is convex and continuous on a closed domain is a closedfunction. For instance, the norms are closed convex functions. Further, all differentiableconvex functions are closed with Dom f = Rn. The function f(x) = 1

x, x [0,+] is

convex and closed, however, its domain, which is (0,) is open.When speaking about convex functions, it is extremely convenient to think that the func-

tion outside its domain also has a value, namely, takes the value +; with this convention,we can say that

• 56 LECTURE 3. CONVEX FUNCTIONS

a convex function f on Rn is a function taking values in the extended real axis R {+}such that the domain Dom f of the function the set of those xs where f(x) is finite isnonempty, and for all x, y Rn and all [0, 1] one has

f(x+ (1 )y) f(x) + (1 )f(y). (3.1.2)

If the expression in the right hand side involves infinities, it is assigned the value according tothe standard and reasonable conventions on what are arithmetic operations in the extendedreal axis R {+} {}.

The following simple observation is maybe the most useful mathematical statement Iknow:

Proposition 3.1.3 [Jensens inequality] Let f be convex. Then for any convex combination

Ni=1

ixi

one has

f(Ni=1

ixi) Ni=1

if(xi).

The proof is immediate: the points (f(xi), xi) clearly belong to the epigraph of f ; since f isconvex, its epigraph is a convex set, so that the convex combination

Ni=1

i(f(xi), xi) = (Ni=1

if(xi),Ni=1

ixi)

of the points also belongs to Epi (f). By definition of the epigraph, the latter means exactlythat

Ni=1 if(xi) f(

Ni=1 ixi).

Note that the definition of convexity of a function f is exactly the requirement on f tosatisfy the Jensen inequality for the case of N = 2; we see that to satisfy this inequality forN = 2 is the same as to satisfy it for all N .

The following statement is a trivial corollary of the Jensen inequality:

Corollary 3.1.1 Let f be a convex function and let x be a convex combination of the pointsx1, ..., xN . Then

f(x) max1iN

f(xi).

In other words, if is a convex hull of x1, ..., xN , i.e.

= Conv {x1, ..., xN} {x Rn|x =Ni=1

ixi, 0,Ni=1

i = 1},

then maxx f(x) max1iN f(xi).

• 3.1. FIRST ACQUAINTANCE 57

Convexity of level sets of a convex function

The following simple observation is also very useful:

Proposition 3.1.4 [Convexity of level sets] Let f be a convex function with the domain M .Then, for any real , the set

lev (f) = {x M | f(x) }

the level set of f is convex.

The proof takes one line: if x, y lev (f) and [0, 1], then f(x + (1 )y) f(x) +(1 )f(y) + (1 ) = , so that x+ (1 )y lev (f).

Note that the convexity of level sets does not characterize convex functions; there arenonconvex functions which share this property (e.g., any monotone function on the axis). Theproper characterization of convex functions in terms of convex sets is given by Proposition3.1.2 convex functions are exactly the functions with convex epigraphs. Convexity of levelsets specifies a wider family of functions, the so called quasiconvex ones.

Proposition 3.1.5+

[Closeness of level sets] If a convex function f is closed, then all itslevel sets are closed.

Recall that an empty set is closed (and, by the way, is open).

Example 3.1.1 [ -ball] The unit ball of norm the set

{x Rn | x 1},

same as any other -ball{x | x a r}

(a Rn and r 0 are fixed) is convex and closed.In particular, Euclidean balls (| |-balls associated with the standard Euclidean norm

| | = | |2) are convex and closed.

Let us look at the lp-balls. For p = 2 we get the usual Euclidean norm; of course,you know how the Euclidean ball looks. When p = 1, we get

|x|1 =ni=1

|xi|,

and the unit ball is the hyperoctahedron

V = {x Rn |ni=1

|xi| 1}

When p =, we get|x| = max

1in|xi|,

and the unit ball is the hypercube

V = {x Rn | 1 xi 1, 1 i n}.

It makes sense to draw the unit | |1- and | |-balls in R2.

• 58 LECTURE 3. CONVEX FUNCTIONS

Example 3.1.2 [Ellipsoid] Let Q be a n n matrix which is symmetric (Q = QT )and positive definite (xTQx 0, with being = if and only if x = 0). Then, for anynonnegative r, the Q-ellipsoid of radius r centered at a the set

{x | (x a)TQ(x a) r2}

is convex and closed.

The simplest way to prove that an ellipsoid is convex and closed is as follows: given apositive definite symmetric matrix Q, one can associate with it the Q-inner product

x, y = xTQy

which, as it is immediately seen, satisfies the characteristic properties bilinearity,symmetry and positivity of the standard inner product xT y (in fact these threeproperties of a Q-inner product, taken together, are exactly equivalent to symmetryand positive definiteness of Q). It follows that the Q-norm the function

|x|Q =xTQx

is a norm. It is clearly seen that a Q-ellipsoid is nothing but a ball in the norm | |Q,so that its convexity is given by Example 3.1.1.

Example 3.1.3+

[-neighborhood of a convex set]Let M be a convex set in Rn, and let > 0. Then, for any norm on Rn, the-neighborhood of M , i.e., the set

M = {y Rn | dist(y,M) infxMy x }

is convex and closed.

3.2 How to detect convexity

In an optimization problem

f(x) min | gj(x) 0, j = 1, ...,m

convexity of the objective f and the constraints gi is crucial: it turns out that problemswith this property possess nice theoretical properties (e.g., the local necessary optimalityconditions for these problems are sufficient for global optimality); and what is much moreimportant, convex problems can be efficiently (both in theoretical and, to some extent, inthe practical meaning of the word) solved numerically, which is not, unfortunately, the casefor general nonconvex problems. This is why it is so important to know how one can detectconvexity of a given function. This is the issue we are coming to.

The scheme of our investigation is typical for mathematics. Let me start with the examplewhich you know from Analysis. How do you detect continuity of a function? Of course, there

• 3.2. HOW TO DETECT CONVEXITY 59

is a definition of continuity in terms of and , but it would be an actual disaster if each timewe need to prove continuity of a function, we were supposed to write down the proof thatfor any positive there exists positive such that .... In fact we use another approach: welist once for ever a number of standard operations which preserve continuity, like addition,multiplication, taking superpositions, etc., and point out a number of standard examplesof continuous functions like the power function, the exponent, etc. To prove that theoperations in the list preserve continuity, same as to prove that the standard functions arecontinuous, this takes certain effort and indeed is done in terms; but after this effort isonce invested, we normally have no difficulties with proving continuity of a given function:it suffices to demonstrate that the function can be obtained, in finitely many steps, from ourraw materials the standard functions which are known to be continuous by applying ourmachinery the combination rules which preserve continuity. Normally this demonstrationis given by a single word evident or even is understood by default.

This is exactly the case with convexity. Here we also should point out the list of operationswhich preserve convexity and a number of standard convex functions.

3.2.1 Operations preserving convexity of functions

These operations are as follows:

[stability under taking weighted sums] if f, g are convex (and closed) functions on Rn,then their linear combination f + g with nonnegative coefficients again is convex(and closed), provided that it is finite at least at one point;

[this is given by straightforward verification of the definition]

[stability under affine substitutions of the argument] the superposition (x) = f(Ax+b)of a convex (and closed) function f on Rn and affine mapping x 7 Ax + b from Rminto Rn is convex (and closed), provided that it is finite at least at one point.

[we can prove it directly by verifying the definition: let x1 and x2 in Rm and yi = Axi+b,

i = 1, 2. Then for 0 1 we have:

(x1 + (1 )x2) = f(A(x1 + (1 )x2) + b) = f(y1 + (1 )y2) f(y1) + (1 )f(y2) = (x1) + (1 )(x2).

The closeness of the epigraph of follows from the continuity of the affine mapping.]

[stability under taking pointwise sup] upper bound sup f() of any family of convex(and closed) functions on Rn is convex (and closed), provided that this bound is finiteat least at one point.

[to understand it, note that the epigraph of the upper bound clearly is the intersectionof epigraphs of the functions from the family; recall that the intersection of any familyof convex (closed) sets is convex (closed)]

• 60 LECTURE 3. CONVEX FUNCTIONS

[Convex Monotone superposition] Let f(x) = (f1(x), ..., fk(x)) be vector-functionon Rn with convex components fi, and assume that F is a convex function on R

k

which is monotone, i.e., such that z z always implies that F (z) F (z). Then thesuperposition

(x) = F (f(x)) = F (f1(x), ..., fk(x))

is convex on Rn, provided that it is finite at least at one point.

Remark 3.2.1 The expression F (f1(x), ..., fk(x)) makes no evident sense at a point xwhere some of fis are +. By definition, we assign the superposition at such a pointthe value +.

[To justify the rule, note that if (0, 1) and x, x Dom, then z = f(x), z =f(x) are vectors from Rk which belong to DomF , and due to the convexity of thecomponents of f we have

f(x+ (1 )x) z + (1 )z;

in particular, the left hand side is a vector from Rk it has no infinite entries, andwe may further use the monotonicity of F :

(x+ (1 )x) = F (f(x+ (1 )x)) F (z + (1 )z)

and now use the convexity of F :

F (z + (1 )z) F (z) + (1 )F (z)

to get the required relation

(x+ (1 )x) (x) + (1 )(x).

]

Two more rules are as follows:

[stability under partial minimization] if f(x, y) : RnxRmy is convex (as a functionof z = (x, y); this is called joint convexity) and the function

g(x) = infyf(x, y)

is proper, i.e., is > everywhere and is finite at least at one point, then g isconvex

[this can be proved as follows. We should prove that if x, x Dom g and x =x+ (1)x with [0, 1], then x Dom g and g(x) g(x) + (1)g(x).Given positive , we can find y and y such that (x, y) Dom f , (x, y) Dom fand g(x) + f(x, y), g(y) + f(x, y). Taking weighted sum of these twoinequalities, we get

g(x) + (1 )g(y) + f(x, y) + (1 )f(x, y)

• 3.2. HOW TO DETECT CONVEXITY 61

[since f is convex]

f(x+ (1 )x, y + (1 )y) = f(x, y + (1 )y)

(the last follows from the convexity of f). The concluding quantity in thechain is g(x), and we get g(x) g(x) + (1 )g(x) + . In particular,x Dom g (recall that g is assumed to take only the values from R and thevalue +). Moreover, since the resulting inequality is valid for all > 0, wecome to g(x) g(x) + (1 )g(x), as required.]

the conic transformation of a convex function f on Rn the function g(y, x) =yf(x/y) is convex in the half-space y > 0 in Rn+1.

We have just learned (cf Example 3.1.1) that a (unit) ball of the norm is aconvex set. The inverse is also true: any convex and closed set M Rn which is alsosymmetric an spans the whole Rn can be used to generate a norm on Rn:

Example 3.2.1 Consider the support function of a convex set M :

M (x) = sup{yTx| y M}.

Note that M is convex and closed as the maximum of linear functions. Further, thisfunction is positively homogenous of degree 1:

M (tx) = tM (x), x DomM , t 0,

and if the set M is bounded then DomM = Rn.

Let us consider the function (x, ) = supyM(y, x, ), where

(y, x, ) = yTx 2|y|22.

This function is convex and closed. Let us look at its properties.If M is bounded then Dom = Rn. Consider the case M = Rn. Clearly, Dom

contains only points with 0. If = 0, the only possible value of x is zero,since otherwise the function (y, x, 0) is unbounded. Finally, if > 0, then the point

maximizing (y, x, ) with respect to y is y = x and (x, ) =|x|222 .

When summing up,

(x, ) =

{0, if x = 0, = 0;|x|222 if > 0,

and the domain of is the set Rn{ > 0}{0, 0}. This set is neither open nor close,nevertheless, is a closed convex function. Note that this function is not continuousat the origin:

lim0

(x, ) =

1

2|x|22.

Now we know what are the basic operations preserving convexity. Let us look what are thestandard functions these operations can be applied to. A number of examples was alreadygiven, but we still do not know why the functions in the examples are convex. The usualway to check convexity of a simple given by a simple formula function is based ondifferential criteria of convexity. Let us look what are these criteria.

• 62 LECTURE 3. CONVEX FUNCTIONS

3.2.2 Differential criteria of convexity

From the definition of convexity of a function it immediately follows that convexity is one-dimensional property: a proper (i.e., finite at least at one point) function f on Rn takingvalues in R {+} is convex if and only if its restriction on any line, i.e., any function ofthe type g(t) = f(x+ th) on the axis, is either convex, or is identically +.

It follows that to detect convexity of a function, it, in principle, suffices to know how todetect convexity of functions of one variable. This latter question can be resolved by thestandard Calculus tools. Namely, in the Calculus they prove the following simple

Proposition 3.2.1 [Necessary and Sufficient Convexity Condition for smooth functions onthe axis] Let (a, b) be an interval on the real axis (we do not exclude the case of a = and/or b = +). Then

(i) A differentiable everywhere on (a, b) function f is convex on (a, b) if and only if itsderivative f is monotonically nondecreasing on (a, b);

(ii) A twice differentiable everywhere on (a, b) function f is convex on (a, b) if and onlyif its second derivative f is nonnegative everywhere on (a, b).

With the Proposition, you can immediately verify that the functions listed as examples ofconvex functions in Section 3.1 indeed are convex. The only difficulty which you may meet isthat some of these functions (e.g., xp, p 1, and xp, 0 p 1, were claimed to be convexon the half-interval [0,+), while the Proposition speaks about convexity of functions onintervals. To overcome this difficulty, we will use the following simple

Proposition 3.2.2 Let M be a convex set and f be a function with Dom f = M . Assumethat f is convex on riM and is continuous on M , i.e.,

f(xi) f(x), i,

whenever xi, x M and xi x as i. Then f is convex on M .

In fact, for functions of one variable there is a differential criterion of convexitywhich does not preassume any smoothness (we will not prove this criterion):

Proposition 3.2.3 [convexity criterion for univariate functions]

Let g : R R {+} be a function. Let the domain = {t | g(t) < } of thefunction be a convex set which is not a singleton, i.e., let it be an interval (a, b) withpossibly added one or both endpoints ( a < b ). g is convex if and only if itsatisfies the following 3 requirements:

1) g is continuous on (a, b);

2) g is differentiable everywhere on (a, b), excluding, possibly, a countable set ofpoints, and the derivative g(t) is nondecreasing on its domain;

3) at each endpoint u of the interval (a, b) which belongs to g is upper semicon-tinuous:

g(u) lim supt(a,b),tug(t).

• 3.2. HOW TO DETECT CONVEXITY 63

Proof of Proposition 3.2.2: Let x, y M and z = x + (1 )y, [0, 1], and letus prove that

f(z) f(x) + (1 )f(y).As we know, there exist sequences xi riM and yi riM converging, respectively to x andto y. Then zi = xi + (1 )yi converges to z as i, and since f is convex on riM , wehave

f(zi) f(xi) + (1 )f(yi);passing to limit and taking into account that f is continuous on M and xi, yi, zi converge,as i, to x, y, z M , respectively, we obtain the required inequality.

From Propositions 3.2.1.(ii) and 3.2.2 we get the following convenient necessary andsufficient condition for convexity of a smooth function of n variables:

Corollary 3.2.1 [Convexity Criterion for smooth functions on Rn]Let f : Rn R {+} be a function. Assume that the domain M of f is a convex set

and that f is

continuous on Mand

twice differentiable on riM .

Then f is convex if and only if its Hessian is positive semidefinite on riM :

hTf (x)h 0 x riM h Rn.

Proof. The only if part is evident: if f is convex and x M = riM , then thefunction of one variable

g(t) = f(x+ th)

(h is an arbitrary fixed direction in Rn) is convex in certain neighborhood of the pointt = 0 on the axis (recall that affine substitutions of argument preserve convexity).Since f is twice differentiable in a neighborhood of x, g is twice differentiable in aneighborhood of t = 0, so that g(0) = hT f (x)h 0 by Proposition 3.2.1.

Now let us prove the if part, so that we are given that hT f (x)h 0 for everyx riM and every h Rn, and we should prove that f is convex.

Let us first prove that f is convex on the relative interior M of the domain M .First, clearly, M is a convex set. Since, as it was already explained, the convexity ofa function on a convex set is one-dimensional fact, all we should prove is that everyone-dimensional function

g(t) = f(x+ t(y x)), 0 t 1

(x and y are from M ) is convex on the segment 0 t 1. Since f is continuous onM M , g is continuous on the segment; and since f is twice continuously differentiableon M , g is continuously differentiable on (0, 1) with the second derivative

g(t) = (y x)T f (x+ t(y x))(y x) 0.

• 64 LECTURE 3. CONVEX FUNCTIONS

Consequently, g is convex on [0, 1] (Propositions 3.2.1.(ii) and 3.2.2). Thus, f is convex

on M . It remains to note that f , being convex on M and continuous on M , is convex

on M by Proposition 3.2.2.

Applying the combination rules preserving convexity to simple functions which pass the in-finitesimal convexity tests, we can prove convexity of many complicated functions. Consider,e.g., an exponential posynomial a function

f(x) =Ni=1

ci exp{aTi x}

with positive coefficients ci (this is why the function is called posynomial). How could weprove that the function is convex? This is immediate:

exp{t} is convex (since its second order derivative is positive and therefore the firstderivative is monotone, as required by the infinitesimal convexity test for smooth functionsof one variable);

consequently, all functions exp{aTi x} are convex (stability of convexity under affine sub-stitutions of argument);

consequently, f is convex (stability of convexity under taking linear combinations withnonnegative coefficients).

And if we were supposed to prove that the maximum of three posynomials is convex?Ok, we could add to our three steps the fourth, which refers to stability of convexity undertaking pointwise supremum.

3.3 Lipschitz continuity of convex functions

Convex functions possess very nice local properties.

Theorem 3.3.1 [Boundedness and Lipschitz continuity of convex function]Let f be a convex function and let K be a closed and bounded set contained in the relative

interior of the domain Dom f of f . Then f is Lipschitz continuous on K there existsconstant L the Lipschitz constant of f on K such that

|f(x) f(y)| L|x y| x, y K. (3.3.3)

In particular, f is bounded on K.

Remark 3.3.1 All three assumptions on K (1) closeness, (2) boundedness and the as-sumption (3) K ri Dom f are essential, as it is seen from the following four examples:

f(x) = 1/x, DomF = (0,+),K = (0, 1]. We have (2), (3) and not (1); f is neitherbounded, nor Lipschitz continuous on K.

f(x) = x2, Dom f = R, K = R. We have (1), (3) and not (2); f is neither boundednor Lipschitz continuous on K.

• 3.3. LIPSCHITZ CONTINUITY OF CONVEX FUNCTIONS 65

f(x) = x, Dom f = [0,+), K = [0, 1]. We have (1), (2) and not (3); f is not

Lipschitz continuous on K 1), although is bounded.

consider the following function of two variables:

f(x, y) =

{0 if x2 + y2 < 1,(x, y) if x2 + y2 = 1,

where is an arbitrary nonnegative function on the unit circle. The domain of thisfunction is the unit disk, which is closed and convex. However, f(x, y) is neitherbounded nor Lipschitz continuous on Dom f 2).

Proof of Theorem 3.3.1. We will start with the following local version of theTheorem.

Proposition 3.3.1 Let f be a convex function, and let x be a point from the relativeinterior of the domain Dom f of f . Then

(i) f is bounded at x: there exists a positive r such that f is bounded in the r-neighborhood Ur(x) of x in the affine hull of Dom f :

r > 0, C : |f(x)| C x Ur(x) = {x Aff (Dom f) | |x x| r};

(ii) f is Lipschitz continuous at x, i.e., there exists a positive and a constant Lsuch that

|f(x) f(x)| L|x x| x, x U(x).

Implication Proposition 3.3.1 Theorem 3.3.1 is given by standard Analysisreasoning. All we need is to prove that if K is a bounded and closed (i.e., a compact)subset of ri Dom f , then f is Lipschitz continuous on K (the boundedness of f on Kis an evident consequence of its Lipschitz continuity on K and boundedness of K).Assume, on contrary, that f is not Lipschitz continuous on K; then for every integer ithere exists a pair of points xi, yi K such that

f(xi) f(yi) i|xi yi|. (3.3.4)

Since K is compact, passing to a subsequence we can ensure that xi x K andyi y K. By Proposition 3.3.1 the case x = y is impossible by Propositionf is Lipschitz continuous in a neighborhood B of x = y; since xi x, yi y, thisneighborhood should contain all xi and yi with large enough indices i; but then, fromthe Lipschitz continuity of f in B, the ratios (f(xi) f(yi))/|xi yi| form a boundedsequence, which we know is not the case. Thus, the case x = y is impossible. The casex 6= y is even less possible since, by Proposition, f is continuous on Dom f at boththe points x and y (note that Lipschitz continuity at a point clearly implies the usualcontinuity at it), so that we would have f(xi) f(x) and f(yi) f(y) as i .

1)indeed, we have limt+0f(0)f(t)

t = limt+0 t1/2 = +, while for a Lipschitz continuous f the ratios

t1(f(0) f(t)) should be bounded2)note that in this case the function f is not closed. It is clear that to close the function f we should

put (x, y) 0

• 66 LECTURE 3. CONVEX FUNCTIONS

Thus, the left hand side in (3.3.4) remains bounded as i. In the right hand sideone factor i tends to , and the other one has a nonzero limit |x y|, so that theright hand side tends to as i; this is the desired contradiction.Proof of Proposition 3.3.1.

10. We start with proving the above boundedness of f in a neighborhood of x.This is immediate: we know that there exists a neighborhood Ur(x) which is containedin Dom f (since, by assumption, x is a relative interior point of Dom f). Now, wecan find a small simplex of the dimension m = dim Aff (Dom f) with the verticesx0, ..., xm in Ur(x) in such a way that x will be a convex combination of the vectors xiwith positive coefficients, even with the coefficients 1/(m+ 1):

x =mi=0

1

m+ 1xi

3).

We know that x is the point from the relative interior of (see the proof of The-orem 1.1.1.(ii)); since spans the same affine set as Dom f (m is the dimension ofAff (Dom f)!), it means that contains Ur(x) with certain r > 0. Now, in

= {mi=0

ixi | i 0,i

i = 1}

f is bounded from above by the quantity max0im f(xi) by Jensens inequality:

f(mi=0

ixi) mi=0

if(xi) maxif(xi).

Consequently, f is bounded from above, by the same quantity, in Ur(x).

3 As an exercise, let us verify that the required exists. Let us act as follows: first, the case of Dom fbeing a singleton is evident, so that we can assume that Dom f is a convex set of dimension m 1.Let us take arbitrary affine basis y0, ..., ym in M = Aff (Dom f) and then pass from this basis to the setz0 = y0, z1 = y0 + (y1 y0), z2 = y0 + (y2 y0), ..., zm = y0 + (ym y0) with some > 0. The vectorszi clearly belong to M and form an affine basis (the latter follows from the fact that the vectors zi z0,i = 1, ...,m, are times the vectors yi y0; the latter vectors form a basis in the linear subspace L suchthat M = y0 + L; consequently, the vectors zi z0, i = 1, ...,m, also form a basis in L, whence, by thesame Corollary, z0, ..., zm form am affine basis in M). Choosing > 0 small enough, we may enforce all thevectors z0, ..., zm be in the (r/10)-neighbourhood of the vector z0. Now let

be the convex hull of z0, ..., zm;this is a simplex with the vertices contained in the neighbourhood of z0 of the radius r/10 (of course, weare speaking about the ball in M). This neighbourhood is an intersection of a Euclidean ball, which is aconvex set, and M , which also is convex; therefore the neighbourhood is convex. Since the vertices of arecontained in it, the entire is contained in the neighbourhood. Now let z = (m+ 1)1

mi=0 zi;

clearlyis contained in the 2 (r/10) = r/5 neighbourhood of z in M . Setting = [x z] + , we get the simplexwith the vertices xi = zi + x z which is contained in the r/5-neighbourhood of x in M and is such that(m+ 1)1

mi=0 xi (m+ 1)1

mi=0[zi + x z] = z + x z = x, as required.

I gave this awful explanation to demonstrate how many words we need to make rigorous evident rec-ommendations like let us take a small simplex with the average of vertices equal to x. The explanationsof this type should (and will) be omitted, since they kill even the most clear reasoning. Yet, you are supposedto be able to do this routine work by yourselves; and to this end you should remember what is the exactmeaning of the words we are using and what are the basic relations between the corresponding concepts.

• 3.3. LIPSCHITZ CONTINUITY OF CONVEX FUNCTIONS 67

20. Now let us prove that if f is above bounded, by some C, in Ur(x), then it in factis below bounded in this neighborhood (and, consequently, is bounded in Ur). Indeed,let x Ur, so that x Aff (Dom f) and |x x| r. Setting x = x [x x] = 2x x,we get x Aff (Dom f) and |x x| = |x x| r, so that x Ur. Since x = 12 [x+x

],we have

2f(x) f(x) + f(x),

whence

f(x) 2f(x) f(x) 2f(x) C, x Ur(x),

and f indeed is below bounded in Ur.

(i) is proved.

30. (ii) is an immediate consequence of (i). Indeed, let us prove that f is Lipschitzcontinuous in the neighborhood Ur/2(x), where r > 0 is such that f is bounded inUr(x) (we already know from (i) that the required r does exist). Let |f | C in Ur,and let x, x Ur/2, x 6= x. Let us extend the segment [x, x] through the point xuntil it reaches, at certain point x, the (relative) boundary of Ur; then we will get

x (x, x); |x x| = r.

By the convexity of f we have for = |xx||xx| (0, 1),

f(x) f(x) (f(x) f(x)) |x x|f(x) f(x)|x x|

.

The second factor in the right hand side does not exceed the quantity (2C)/(r/2) =4C/r; indeed, the numerator is, in absolute value, at most 2C (since |f | is bounded byC in Ur and both x, x

belong to Ur), and the denominator is at least r/2 (indeed, xis at the distance at most r/2 from x, and x is at the distance exactly r from x, sothat the distance between x and x, by the triangle inequality, is at least r/2). Thus,we have

f(x) f(x) (4C/r)|x x|, x, x Ur/2;

swapping x and x, we come to

f(x) f(x) (4C/r)|x x|,

whence

|f(x) f(x)| (4C/r)|x x|, x, x Ur/2,

as required in (ii).

Let us show that convex functions possess a kind of differentiability. In what follows, for thesake of conciseness, we will assume that the the domain of f is full=dimensional, meaningthat Aff (Dom f) = Rn. This said, you are expected to be able to translate the correspondingstatements for the case when Aff (Dom f) is smaller than the whole Rn (so that interiorbecomes relative interior, h Rn transforms into h L, such that Aff (Dom f) = a + L,etc.) .

• 68 LECTURE 3. CONVEX FUNCTIONS

Definition 3.3.1 [Directional derivative] Let x Dom f . We call the function f differen-tiable in the direction h at x if the following limit exists:

f h(x) = limt0

f(x+ ht) f(x)t

. (3.3.5)

Theorem 3.3.2 [Differentiability of convex functions] Convex function f is differentiablein any direction h Rn at any point x int Dom f .

Proof: Let x int Dom f . Consider the function

(t) =f(x+ ht) f(x)

t, t > 0.

Let (0, 1] and let t (0, ] be small enough to have x + h int Dom f . Then byconvexity of f ,

f(x+ ht) = f((1 )x+ (x+ ht)) (1 )f(x) + f(x+ ht).

Therefore

(t) =1

t[f(x+ ht) f(x)] 1

t[f(x+ ht) f(x)],

so that (t) decreases as t 0 and due to the local Lipschitz property is below bounded sothat the limit in (3.3.5) exists.

Theorem 3.3.1 says that a convex function f is bounded on every compact (i.e., closedand bounded) subset of the interior of Dom f . In fact there is much stronger statement onthe below boundedness of f : f is below bounded on any bounded subset of Rn! This resultsis an immediate consequence of the following lower bound:

Proposition 3.3.2 [Global lower bound] Let f be a convex function and x int Dom f .Then f h(x) is convex positive homogenous (of degree 1) function of h, and for any y Dom f

f(y) f(x) + f yx(x). (3.3.6)

Proof: Let us prove first that the directional derivative is homogenous. Indeed, for anyh Rn and > 0

f h(x) = limt0

f(x+ ht) f(x)t

= lim0

f(x+ h) f(x)

= f h(x).

Further, for any h1, h2 Rn, and [0, 1], by the convexity of f we get

f h1+(1)h2(x) = limt0

1

t[f(x+ (h1 + (1 )h2)t) f(x)]

limt0

1

t{[f(x+ th1) f(x)] + (1 )[f(x+ th2) f(x)]}

= f h1(x) + (1 )fh2

(x).

• 3.4. SUBGRADIENTS OF CONVEX FUNCTIONS 69

Thus f h(x) is convex in h. Finally, let t (0, 1], y Dom f and

yt = x+ t(y x) = (1 t)x+ ty.

Then

f(y) = f(yt +

1

t(1 t)(yt x)

) f(yt) +

1 tt

[f(yt) f(x)],

(the latter inequality is nothing but the convexity property of f : f(yt) tf(y)+(1 t)f(x))and we conclude (3.3.6) taking the limit as t 0.

We are now ready to introduce a good surrogate of the notion of the gradient for a convexfunction. Let f be a convex function, and let x Dom f . It may happen that there existsan affine minorant dTx a of f which coincides with f at x:

f(y) dTy a y, f(x) = dTx a.

From the equality in the latter relation we get a = dTx f(x), and substituting this repre-sentation of a into the first inequality, we get

f(y) f(x) + dT (y x) y. (3.4.7)

Thus, if f admits an affine minorant which is exact at x, then there exists d which gives riseto inequality (3.4.7). Vice versa, if d is such that (3.4.7) takes place, then the right handside of (3.4.7), regarded as a function of y, is an affine minorant of f which is exact at x.

Now note that (3.4.7) express certain property of a vector d.

Definition 3.4.1 [Subgradient of convex function] Let f be a convex function. A vector his called subgradient of function f at a point x Dom f if for any y Domf we have

f(y) f(x) + hT (y x).

The set f(x) of all subgradients of f at x is called the subdifferential of f at the point x.

Subgradients of convex functions play an important role in the theory and numerical methodsof Convex Programming they are quite reasonable surrogates of the gradients. The mostelementary properties of the subgradients are summarized in the following statement:

Proposition 3.4.1 Let f be a convex function and x be a point from int Dom f . Then(i) f(x) is a closed convex set which for sure is nonempty and bounded(ii) for any h Rn

f h(x) = max{hTd| d f(x)}.In other words, the directional derivative is nothing but the support function of the set

f .(iii) If f is differentiable at x, then f(x) is the singleton comprised of the usual gradient

of f at x.

• 70 LECTURE 3. CONVEX FUNCTIONS

Proof. (i): Closeness and convexity of f(x) are evident (3.4.7) is an infinite system ofnonstrict linear inequalities with respect to d, the inequalities being indexed by y Rn.Nonemptiness of f(x) for the case when x int Dom f this is the most important factabout the subgradients is readily given by our preceding results. Indeed, we should provethat if x int Dom f , then there exists an affine minorant of f which is exact at x. Notethat the point (f(x), x) belongs to the boundary of epi (f). Hence, in view of Theorem 1.2.1there is a linear form (, d) which properly separates x and epi (f):

dTy dTx f(x)

for any (, y) epi (f). Note that we can take

|d|2 + 2 = 1 (3.4.8)

and such that (, d) Aff (epi (f)). And since for any f(x) the point (, x) belongsto the epigraph of f , we conclude that 0.

Now recall that a convex function is locally Lipschitz continuous on the interior of itsdomain (Proposition 3.3.1). This means that there exist some > 0 and M > 0 such thatthe ball of the radius , centered at x belongs to Dom f and for any y in this ball

f(y) f(x) M |y x|.

Thus, for any y in the ball

dT (y x) (f(y) f(x)) M |y x|.

When choosing y = x + d we get |d|2 M|d| and in view of the normalizing equation(3.4.8),

11 +M2

.

Now we can choose h = d/ to get

f(y) f(x) + hT (y x).

Finally, if h f(x), h 6= 0, then by choosing y = x+ h/|h| we obtain:

|h| = hT (y x) f(y) f(x) M |y x| = M,

what implies the boundedness of f(x).(ii) Note that (as f 0(x) = 0)

f h(x) f 0(x) = f h(x) = limt0

f(x+ ht) f(x)t

hTd, (3.4.9)

for any vector d from f(x). Therefore, the subdifferential of the function f h(x) at h = 0exists and

f(x) hf 0(x). (3.4.10)

• 3.4. SUBGRADIENTS OF CONVEX FUNCTIONS 71

As f h(x) is convex in h (cf Proposition 3.3.2) so that

f yx(x) = fyx(x) f 0(x) dT (y x)

for any d hf 0(x), and by Proposition 3.3.2 we have for any y Dom f ,

f(y) f(x) + f yx(x) f(x) + dT (y x) for d hf 0(x).

We conclude that hf0(x) f(x) and by (3.4.10), hf 0(x) f(x).

Let now dh hf h(x). Then for any v Rn and > 0 we have

f v(x) = fv(x) f h(x) + dTh (v h),

so that when we obtain f v(x) dTh v what means that dh hf 0(x).On the other hand, when 0 we get f h(x) dThh 0 and by (3.4.9) we conclude that

dThh = fh(x), what implies (ii).

(iii): If x int Dom f and f is differentiable at x, then f(x) f(x) by Proposition3.3.2 (notice that f p(x) = p

Tf(x) in this case). To prove thatf(x) is the only subgradientof f at x, note that if d f(x), then, by definition,

f(y) f(x) dT (y x) y

Substituting yx = th, h being a fixed direction and t being > 0, dividing both sides of theresulting inequality by t and passing to limit as t +0, we get

hTf(x) hTd.

This inequality should be valid for all h Rn, which is possible if and only if d = f(x).Proposition 3.4.1 explains why subgradients are good surrogates of gradients: at a point

where the gradient exists, it is the only subgradient, but, in contrast to the gradient, asubgradient exists basically everywhere (for sure in the interior of the domain of the function).E.g., let us look at the simple function

f(x) = |x|

on the axis. It is, of course, convex (as maximum of two linear forms x and x). Wheneverx 6= 0, f is differentiable at x with the derivative +1 for x > 0 and 1 for x < 0. At thepoint x = 0 f is not differentiable; nevertheless, it must have subgradients at this point(since 0 is an interior point of the domain of the function). And indeed, it is immediatelyseen (why?) that the subgradients of |x| at x = 0 are exactly the reals from the segment[1, 1]. Thus,

|x| =

{1}, x < 0,[1, 1], x = 0,{+1}, x > 0.

Note also that if x is a boundary point of the domain of a convex function, even a goodone, the set of subgradients of f at x may be empty, as it is the case with the function

f(y) ={y, y 0,

+, y < 0;

• 72 LECTURE 3. CONVEX FUNCTIONS

it is clear that there is no non-vertical supporting line to the epigraph of the function at thepoint (0, f(0)), and, consequently, there is no affine minorant of the function which is exactat x = 0.

It is worth to note that the subgradient property is characteristic of convex functions.We have also the following simple

Lemma 3.4.1 Let a continuous function f be such that for any x int Dom f the subdif-ferential f(x) is not empty. Then f is convex.

Proof takes a line: let x, y int Dom f , 0 1 and z = x+ (y x)( int Dom f). Leth f(z), then

f(y) f(z) + hT (y z) = f(z) + (1 )hT (y x);f(x) f(z) + hT (x z) = f(z) hT (y x),

andf(y) + (1 )f(x) f(z).

We conclude now by Proposition 3.2.2.

As we already know from Proposition 3.4.1.(ii), the directional derivative f h(x) is the sup-port function of the subdifferential f(x). This basic observation is the basis of our futuredevelopments. A part of Convex Analysis deals with subgradient calculus rules for com-puting subgradients of composite functions, like sums, superpositions, maxima, etc., givensubgradients of the operands. For instance, the list of the operations preserving convexity ofSection 3.2.1 can be completed with the corresponding rules of subgradient computations.These rules extend onto nonsmooth convex case the standard Calculus rules and are verynice and instructive. We present here some of them:

[taking weighted sums] if f, g are convex functions on Rn and , > 0 then thesubgradient of the function h(x) = f(x) + g(x) satisfies

h(x) = f(x) + g(x) (3.4.11)

for any x int Domh. [ Let x int Dom f int Dom g. Then for any h Rn wehave by Proposition 3.4.1.(ii):

f h(x) = fh(x) + g

h(x)

= max{hTd1| d1 f(x)}+ max{hTd2| d2 g(x)}= max{hT (d1 + d2)| d1 f(x), d2 g(x)}= max{hTd| d f(x) + g(x)}

Using Corollary 1.2.1 we obtain (3.4.11).]

• 3.4. SUBGRADIENTS OF CONVEX FUNCTIONS 73

[affine substitution of the argument] let the function f(y) be convex with Dom f Rm.Consider the affine operator

A(x) = Ax+ b | Rn Rm

and (x) = f(A(x)).Then for any x Dom = {x Rn| A(x) Dom f}

(x) = ATf(A(x)).

[The proof of the subdifferential relation is immediate: if y = A(x) then for any h Rnwe have

h(x) = fAh(y) = max{dTAh| d f(y)} = max{dTh| d ATf(y)}.

Now by Proposition 3.4.1.(ii) and Corollary 1.2.1 we get (x) = ATf(A(x)).]

[taking pointwise sup] let f be an upper bound supi fi of a family of convex functionson Rn. Then its subgradient at any x int Dom f satisfies

f = Conv {fi|i I(x)}, (3.4.12)

where I(x) = {i : fi(x) = f(x)} the set of functions fi which are active at x.[let x iint Dom fi, and assume that I(x) = 1, .., k. Then for any h Rn we haveby Proposition 3.4.1.(ii)

f h(x) = max1ik

f ih(x) = max1ik

max{hTdi| di fi(x)}.

Note that for any numbers a1, ..., ak, ... we have

max ai = max{i

iai| k},

where k = { 0,i i = 1} is the standard simplex in Rk. Thus

f h(x) = maxk

ki=1

i max{hTdi| di fi(x)}

= max

{hT(

ki=1

idi

)| di fi(x), k

}

= max

{hTd|d =

ki=1

idi, di fi(x), k}

= max{hTd| d Conv {fi(x), 1 i k

}.]

Unfortunately, the above rule does not have a closed form. However, it allows tocompute elements of the subgradient.

• 74 LECTURE 3. CONVEX FUNCTIONS

One more rule, which we state without proof, is as follows:

Lemma 3.4.2 Let f be an upper bound supA f of an arbitrary family A of convexand closed functions. Then for any x from Dom f = Dom f

f(x) Conv {f(x)| (x)},

where (x) = {| f(x) = f(x)} Furthermore, if the set A is compact (in some metric)and the function f(x) is closed, then

f(x) = Conv {f(x)| (x)},

Example 3.4.1

Consider the function

f(x) =mi=1

|aTi x bi|.

Let for x Rn,

I(x) = {i : aTi x bi < 0},I+(x) = {i : aTi x bi > 0},I0(x) = {i : aTi x bi = 0}.

Thenf(x) =

iI(x)

ai +

iI+(x)ai +

iI0(x)

[ai, ai].

Consider the function f(x) = max1in x(i) (here x(i) are the components of x) anddenote I(x) = {i : x(i) = f(x)}. Then f(x) = Conv {ei| i I(x)} (here ei arethe orths of the canonical basis of Rn). In particular, for x = 0 the subdifferentialis the standard simplex a convex hull of the origin and canonical orths: f(x) =Conv {ei| 1 i n}.

For the Euclidean norm f(x) = |x| we have

f(x) =

{ {x|x|

}for x 6= 0,

B2(0, 1) = {x Rn| |x| 1} for x = 0.

For the l-norm f(x) = |x| = max1in |x(i)| we have

f(x) = Conv {{ei| i I+(x)} {ej| j I(x)}} ,

where I+(x) = {i : x(i) = |x|}, I(x) = {i : x(i) = |x|}. In particular, f(0) =B1(0, 1) = {x Rn| |x|1 1}.

• 3.5. OPTIMALITY CONDITIONS 75

For the l1-norm f(x) = |x|1 =ni=1 |x(i)| we have

f(x) =iI+

ei iI

ei +iI0

[ei, ei].

where I+(x) = {i : x(i) > 0}, I(x) = {i : x(i) < 0} and I0(x) = {i : x(i) = 0}. Inparticular, f(0) = B(0, 1) = {x Rn| |x| 1}.

Maximum eigenvalue of a symmetric matrix. f(x) = max(A(x)), where

A(x) = A0 + x1A1 + ...+ xnAn,

with m m symmetric matrices A1, ..., An m m and x Rn. Here max(A) standsfor the maximal eigenvalue of A. We can express f as the pointwise supremum ofconvex functions, using Rayleighs variational definition of the maximal eigenvalue ofa symmetric matrix A: max(A(x)) = sup|y|=1 y

TAy. Here the index set A is the unitsphere: A = {y Rn| |y| = 1}. Each of the functions fy(x) = yTA(x)y is affine in xfor fixed y, as can be easily seen from

yTA(x)y = yTA0y + x1yTA1y + ...+ xny

TAny,

so it is differentiable with gradient fy(x) = (yTA1y, ..., yTAny).The active functions yTA(x)y are those associated with the eigenvectors y correspondingto the maximum eigenvalue. Hence to find a subgradient, it suffices to compute computean eigenvector y with eigenvalue max, normalized to have unit norm, and take

g = (yTA1y, ..., yTAny).

Therefore, by Lemma 3.4.2 we come to

f(x) = Conv {fy(x)| A(x)y = max(A(x))y, |y| = 1}.

3.5 Optimality conditions

As it was already mentioned, optimization problems involving convex functions possess nicetheoretical properties. One of the most important of these properties is given by the following

Theorem 3.5.1 [Unimodality] Let f be a convex function on a convex set M Rn, andlet x M Dom f be a local minimizer of f on M :

(r > 0) : f(y) f(x) y M, |y x| < r. (3.5.13)

Then x is a global minimizer of f on M :

f(y) f(x) y M. (3.5.14)

Moreover, the set ArgminM f of all local ( global) minimizers of f on M is convex.If f is strictly convex (i.e., the convexity inequality f(x+(1)y) f(x)+(1)f(y)

is strict whenever x 6= y and (0, 1)), then the above set is either empty or is a singleton.

• 76 LECTURE 3. CONVEX FUNCTIONS

Proof. 1) Let x be a local minimizer of f on M and y M , y 6= x; we should provethat f(y) f(x). There is nothing to prove if f(y) = +, so that we may assume thaty Dom f . Note that also x Dom f for sure by definition of a local minimizer.

By the convexity of f , for all (0, 1) we have for x = y + (1 )x:

f(x ) f(x) (f(y) f(x)).

Since x is a local minimizer of f , the left hand side in this inequality is nonnegative forall small enough values of > 0. We conclude that the right hand side is nonnegative, i.e.,f(y) f(x).

2) To prove convexity of ArgminQ f , note that ArgminM f is nothing but the level setlev (f) of f associated with the minimal value minM f of f on M ; as a level set of a convexfunction, this set is convex (Proposition 3.1.4).

3) To prove that the set ArgminQ f associated with a strictly convex f is, if nonempty, asingleton, note that if there were two distinct minimizers x, x, then, from strict convexity,we would have

f(1

2x +

1

2x) 0}.

• 3.5. OPTIMALITY CONDITIONS 77

Geometrically, this is the set of all directions leading from x inside M , so that a smallenough positive step from x along the direction keeps the point in M . From the convexityof M it immediately follows that the tangent cone indeed is a (convex) cone (not necessaryclosed). E.g., when x is an interior point of M , then the tangent cone to M at x clearly isthe entire Rn. A more interesting example is the tangent cone to a polyhedral set

M = {x | Ax b} = {x | aTi x bi, i = 1, ...,m}; (3.5.15)

for x M the corresponding tangent cone clearly is the polyhedral cone

{h | aTi h 0 i : aTi x = bi} (3.5.16)

corresponding to the active at x (i.e., satisfied at the point as equalities rather than as strictinequalities) constraints aTi x bi from the description of M .

Now, for the functions in question (i.e., convex on M and differentiable at x) the neces-sary and sufficient condition of Theorem 3.5.2 for x to be a minimizer of f on M reads asfollows:

(*) the derivative of f taken at x along every direction from TM(x) should be nonneg-

ative:

hTf(x) 0 h TM(x). (3.5.17)

Note that the necessity of (3.5.17) is an evident fact which has nothing in commonwith convexity: assuming that x is a local minimizer of f on M , we note that if therewere h TM (x) with hTf(x) < 0, then we would have

f(x + th) < f(x)

for all small enough positive t. On the other hand, x + th M for all small enoughpositive t due to h TM (x). Combining these observations, we conclude that in everyneighborhood of x there are points from M with strictly better than the one at x

values of f ; this contradicts the assumption that x is a local minimizer of f on M .The sufficiency is immediately given by the (sub-)gradient inequality: if x M ,

then h = x x TM (x) and

f(x) f(x) + (x x)f(x) f(x).

Condition (*) says that whenever f is convex on M and differentiable at x M , thenecessary and sufficient condition for x to be a minimizer of f on M is that the linear formgiven by the gradient f(x) of f at x should be nonnegative at all directions from thetangent cone TM(x

). The linear forms nonnegative at all directions from the tangent conealso form a cone; it is called the cone normal to M at x and is denoted NM(x

). Thus, (*)says that the necessary and sufficient condition for x to minimize f on M is the inclusionf(x) NM(x). What does this condition actually mean, it depends on what is thenormal cone: whenever we have an explicit description of it, we have an explicit form of theoptimality condition.

• 78 LECTURE 3. CONVEX FUNCTIONS

E.g., when TM(x) = Rn (it is the same as to say that x is an interior point of M), then

the normal cone is comprised of the linear forms nonnegative at the entire space, i.e., it isthe trivial cone {0}; consequently, for the case in question the optimality condition becomesthe Fermat rule f(x) = 0, as we already know.

When M is the polyhedral set (3.5.15), the tangent cone is the polyhedral cone (3.5.16);it is comprised of all directions which have nonpositive inner products with all ai comingfrom the active, in the aforementioned sense, constraints. The normal cone is comprised ofall vectors which have nonnegative inner products with all these directions, i.e., of

all vectors a such that the inequality hTa 0 is a consequence of the inequal-ities hTai 0, i I(x) {i | aTi x = bi}.

From the Homogeneous Farkas Lemma (Lemma 2.5.1) we conclude that the normal coneNM(x

) is simply the conic hull of the vectors ai, i I(x):

NM(x) = {z Rn| z =

iI(x)

iai, i 0}.

Thus, in the case in question (*) reads:x M is a minimizer of f on M if and only if there exist nonnegative reals i associated

with active (those from I(x)) values of i such that

f(x) +

iI(x)i ai = 0.

The next result is of main importance in optimization it is the basis of the cutting planescheme we consider in the sequel.

Theorem 3.5.3 [Support vectors for level sets] For any x Dom f , all vectors d f(x)satisfy

dT (x y) 0, for any y lev f(x)(f),

where lev f(x)(f) is the level set of f:

lev f(x)(f) = {y Dom f | f(y) f(x)}

(we say that such vectors d are supporting to the set lev f(x)(f) at x).

Proof: indeed, if f(y) f(x) and d f(x), then f(x) + dT (y x) f(y) f(x).We have the following trivial though useful

Corollary 3.5.1 Let M Dom f be a convex closed set, x M and

x = argminxM

f(x).

Then for any d f(x) we have: dT (x x) 0.

• 3.5. OPTIMALITY CONDITIONS 79

We conclude for now our study of minima of a convex function with the followingoptimality condition for constrained optimization:

Theorem 3.5.4 [Karush-Kuhn-Tucker] Let f, gi i = 1, ...,m be differentiable convexfunctions. Suppose that there exists a point x such that

(Slater condition) gi(x) < 0, i = 1, ...,m.

A point x is a solution to the optimization problem

min f(x) s.t. gi(x) 0, i = 1, ...,m (3.5.18)

if and only if there exist nonnegative real i, i=1,...,m such that

f (x) +mi=1

igi(x) = 0

and

(Complementary slackness) igi(x) = 0, i = 1, ...,m.

Proof: we reduce the constrained problem (3.5.18) to an unconstrained though notsmooth optimization problem. To this end look at the following construction: considerthe parametric maxtype function

f(t;x) = max{f(x) t; gi, i = 1, ...,m}

and the function h(t) = minx f(t;x).

Lemma 3.5.1 Let f be the optimal value of the optimization problem (3.5.18). Then

h(t) 0 for all t f;h(t) > 0 for all t < f.

Proof of the lemma: let x be an optimal solution to (3.5.18). If t f = f(x)then

h(t) f(t;x) = max{f(x) t; gi(x)} max{f t; gi(x)} 0.

On the other hand, if t < f and h(t) 0 then there exists x Rn such that

f(x) t < f, and gi(x) 0, i = 1, ...,m.

Thus, f cannot be the optimal value of (3.5.18).

Now we are done: indeed, in view of the lemma, x is an optimal solution of theproblem (3.5.18) if and only if it is a global minimizer of the function

(x) = max{f(x) f; gi, i = 1, ...,m}

We have from Theorem 3.5.2 that this is the case if and only if 0 (x). As weknow this is the case (cf. the corresponding rule (3.4.12)) if 0 belongs to the convex

• 80 LECTURE 3. CONVEX FUNCTIONS

envelope of those among the vectors f (x), f 1(x), ..., f m(x

) which are active at x,i.e.

0f(x) +

iI

igi(x) = 0, 0, 0 +

iI

i = 1,

where I = {1 i m | gi(x) = 0}. Further, if we had 0 = 0 thenmi=1 ig

i(x) = 0

and iI

igi(x) iI

i[gi(x) + gi(x

)T (x x)] = 0,

what is a contradiction. Therefore, 0 6= 0 and we can set i = i/0 for i I.

The above results demonstrate that the fact that a point x Dom f is a global minimizerof a convex function f depends only on the local behavior of f at x. This is not the casewith maximizers of a convex function. In fact, such a maximizer, if exists, in all nontrivialcases should belong to the boundary of the domain of the function:

Theorem 3.5.5 Let f be convex, and let M be the domain of f . Assume that f attains itsmaximum on M at a point x from the relative interior of M . Then f is constant on M .

Proof. Let y M ; we should prove that f(y) = f(x). There is nothing to prove if y = x,so that we may assume that y 6= x. Since, by assumption, x riM , we can extend thesegment [x, y] through the endpoint x, keeping the left endpoint of the segment in M ; inother words, there exists a point y M such that x is an interior point of the segment[y, y]:

x = y + (1 )y

for certain (0, 1). From the definition of convexity

f(x) f(y) + (1 )f(y).

Since both f(y) and f(y) do not exceed f(x) (x is a maximizer of f on M !) and boththe weights and 1 are strictly positive, the indicated inequality can be valid only iff(y) = f(y) = f(x).

The next two theorems give further information on maxima of convex functions:

Theorem 3.5.6 Let f be a convex function on Rn and E be a subset of Rn. Then

supConvE

f = supEf. (3.5.19)

In particular, if S Rn is convex and compact set, then the upper bound of f on S isequal to the upper bound of f on the set of extreme points of S:

supSf = sup

Ext(S)f (3.5.20)

• 3.5. OPTIMALITY CONDITIONS 81

Proof. To prove (3.5.19), let x ConvE, so that x is a convex combination of pointsfrom E (Theorem 1.1.3 on the structure of convex hull):

x =i

ixi [xi E, i 0,i

i = 1].

Applying Jensens inequality (Proposition 3.1.3), we get

f(x) i

if(xi) i

i supEf = sup

Ef,

so that the left hand side in (3.5.19) is the right hand one; the inverse inequality isevident, since ConvE E.

To derive (3.5.20) from (3.5.19), it suffices to note that from the Krein-MilmanTheorem (Theorem 1.3.1) for a convex compact set S one has S = Conv Ext(S).

The last theorem on maxima of convex functions is as follows:

Theorem 3.5.7 Let f be a convex function such that the domain M of f is closedand does not contain lines. Then

(i) If the setArgmax

Mf {x M | f(x) f(y) y M}

of global maximizers of f is nonempty, then it intersects the set Ext(M) of the extremepoints of M , so that at least one of the maximizers of f is an extreme point of M .

Proof. Let us start with (i). We will prove this statement by induction on thedimension of M . The base dimM = 0, i.e., the case of a singleton M , is trivial, sincehere M = ExtM = ArgmaxM f . Now assume that the statement is valid for the caseof dimM p, and let us prove that it is valid also for the case of dimM = p+ 1. Letus first verify that the set ArgmaxM f intersects with the (relative) boundary of M .Indeed, let x ArgmaxM f . There is nothing to prove if x itself is a relative boundarypoint of M ; and if x is not a boundary point, then, by Theorem 3.5.5, f is constant onM , so that ArgmaxM f = M ; and since M is closed, any relative boundary point of M(such a point does exist, since M does not contain lines and is of positive dimension)is a maximizer of f on M , so that here again ArgmaxM f intersects riM .

Thus, among the maximizers of f there exists at least one, let it be x, which belongsto the relative boundary of M . Let H be the hyperplane which properly supports Mat x (see Section 1.2.4), and let M = M H. The set M is closed and convex (sinceM and H are), nonempty (it contains x) and does not contain lines (since M doesnot). We have maxM f = f(x) = maxM f (note that M

M), whence

6= ArgmaxM

f ArgmaxM

f.

Same as in the proof of the Krein-Milman Theorem (Theorem 1.3.1), we have dimM |x|}

of R2.

For which , the function

ln(t2 x2) + t x

attains its minimum on the angle

{(t, x) R2 | t > |x|}

of R2? What is the value of this minimum?

Exercise 3.7.3 Let 0 < p < 1. Show that the function

xpy1p

is convex on the domain {(x, y) | x, y > 0}.

Let n be a positive integer and let 0 < 1/n. Prove that the function

f(x) = (x1x2...xn)

is convex on {x Rn | x1, ..., xn 0}.

• 3.7. EXERCISES: CONVEX FUNCTIONS 87

Hint: please, do not differentiate! Try to use the conic transformation first.

Exercise 3.7.4 Prove that the following functions are convex on the indicated domains:

x2y

on {(x, y) R2 | y > 0}

ln(exp{x}+ exp{y}) on the 2D plane.

Exercise 3.7.5 A function f defined on a convex set M is called log-convex on M , if ittakes real positive values on M and the function ln f is convex on M . Prove that

a log-convex on M function is convex on M

the sum (more generally, linear combination with positive coefficients) of two log-convexfunctions on M also is log-convex on the set.

Hint: use the result of the previous Exercise + your knowledge on operations preservingconvexity.

Exercise 3.7.6 Consider a Linear Programming program

cTx min | Ax b

with mn matrix A, and let x be an optimal solution to the problem. It means that x is aminimizer of differentiable convex function f(x) = cTx with convex differentiable constraintAx b, so that the necessary and sufficient condition of optimality given by Theorem 3.5.4should be satisfied at x. What does this condition mean in terms of the data A, b, c?

Exercise 3.7.7 Find the minimizer of a linear function

f(x) = cTx

on the set

Vp = {x Rn |ni=1

|xi|p 1};

here p, 1 < p

• 88 LECTURE 3. CONVEX FUNCTIONS