Generating Eigenvalue Bounds Using Optimizationhwolkowi/henry/reports/genereigbnds.pdf ·...

Generating Eigenvalue Bounds UsingOptimization

Henry Wolkowicz

Dedicated to the memory of Professor George Isac

Abstract This paper illustrates how optimization can be used to derive known andnew theoretical results about perturbations of matrices and sensitivity of eigenval-ues. More specifically, the Karush-Kuhn-Tucker conditions, the shadow prices, andthe parametric solution of a fractional program are used to derive explicit formulaefor bounds for functions of matrix eigenvalues.

1 Introduction

Many classical and new inequalities can be derived using optimization techniques.One first formulates the desired inequality as the maximum (minimum) of a functionsubject to appropriate constraints. The inequality, along with conditions for equalityto hold, can then be derived and proved, provided that the optimization problem canbe explicitly solved.

For example, consider the Rayleigh principle

max = max{x,Ax : x Rn,x= 1}, (1)

where A is an n n Hermitian matrix, max is the largest eigenvalue of A, , isthe Euclidean inner product, and is the associated norm. Typically, this princi-ple is proved by maximizing the quadratic function x,Ax subject to the equalityconstraint, x2 = 1. An explicit solution can be found using the classical and wellknown, Euler-Lagrange multiplier rule of calculus. (See Example 1 below). It is an

Henry WolkowiczDepartment of Combinatorics and Optimization, University of Waterloo, Waterloo, Ontario N2L3G1, Canada (Research Report CORR 2009-02)

1

2 Henry Wolkowicz

interesting coincidence that is the standard symbol used in the literature for botheigenvalues and Lagrange multipliers; and the eigenvalue and Lagrange multipliercoincide in the above derivation. Not so well known are the multiplier rules forinequality constrained programs. The Holder inequality

x,y=n

i=1

xiyi (

n

i=1

xpi

)1/p( n

i=1

yqi

)1/q

,

where x,y Rn+, p > 1,q = p/(p 1), can be proved by solving the optimiza-tion problem h(x) := maxy{i xiyi : i yqi 1 0,yi 0,i}. The John multi-plier rule yields the explicit solution. (See [8] and Example 3 below). The classi-cal arithmetic-geometric mean inequality (1 . . .n)1/n 1n(1 + . . .+ n), wherei > 0, i = 1, . . . ,n, can be derived by solving the geometric programming problem

max{i1 :n

i=1

i = 1,i 0,i}.

Convexity properties of the functions, which arise when reformulating the in-equalities as programming problems, can prove very helpful. For example, convex-ity can guarantee that sufficiency, rather than only necessity, holds in optimalityconditions. The quasi-convexity of the function

( f ) =

f d

(1/ f )d , (2)

where and are two nontrivial positive measures on a measurable space X , can beused to derive the Kantorovich inequality, [1]. (We prove the Kantorovich inequalityusing optimization in Example 2.) We rely heavily on the convexity and pseudo-convexity of the functions.

Optimality conditions, such as the Lagrange and Karush-Kuhn-Tucker multiplierrules, are needed to numerically solve mathematical programming problems. Thepurpose of this paper is to show how to use optimization techniques to generateknown, as well as new, explicit eigenvalue inequalities. Rather than include all pos-sible results, we concentrate on just a few, which allow us to illustrate several usefultechniques. For example, suppose that A is an nn complex matrix with real eigen-values 1 . . . n. A lower bound for k can be found if we can explicitly solvethe problem

min ksubject to ni=1 i = traceA

ni=1 2i traceA2ki 0, i = 1, . . . ,k1ik 0, i = k + 1, . . . ,n.

(3)

We can use the Karush-Kuhn-Tucker necessary conditions for optimality to find theexplicit solution. (See Theorem 6.) Sufficiency guarantees that we actually have the

Generating Eigenvalue Bounds Using Optimization 3

solution. This yields the best lower bound for k based on the known data. (Furtherresults along these lines can be found in [4].)

In addition, the Lagrange multipliers, obtained when solving the program (3),provide shadow prices. These shadow prices are sensitivity coefficients with respectto perturbations in the right-hand sides of the constraints. We use these shadowprices to improve the lower bound in the case that we have additional informationabout the eigenvalues. (See e.g. Corollaries 2 and 3.)

1.1 Outline

In Section 2 we introduce the optimality conditions and use them to prove the wellknown: (i) Rayleigh Principle; (ii) Holder inequality; and (iii) Kantorovich inequal-ity. In Section 3 we show how to use the convex multiplier rule (or the Karush-Kuhn-Tucker conditions) to generate bounds for functions of the eigenvalues of an n nmatrix A with real eigenvalues. Some of these results have appeared in [7, 12, 4]. In-cluded are bounds for k,k + and k. We also show how to use the Lagrangemultipliers (shadow prices) to strengthen the bounds. Section 4 uses fractional pro-gramming techniques to generate bounds for the ratios (k)/(k +). Some ofthe inequalities obtained here are given in [7, 12] but with proofs using elementarycalculus techniques rather than optimization.

2 Optimality Conditions

2.1 Equality Constraints

First, consider the program

min{ f (x) : hk(x) = 0,k = 1, . . . ,q, x U}, (4)

where U is an open subset of Rn and the functions f , hk, k = 1, . . . ,q, are continu-ously differentiable. The function f is called the objective function of the program.The feasible set, denoted by F , is the set of points in Rn which satisfy the con-straints. Then, the classical Euler-Lagrange multiplier rule states, e.g. [8],

Theorem 1. Suppose that aRn solves (4) and that the gradientsh1(a), . . . ,hq(a)are linearly independent. Then,

f (a)+q

k=1

khk(a) = 0, (5)

for some (Lagrange multipliers) k R,k = 1, . . . ,q.

4 Henry Wolkowicz

Example 1. Suppose that A is an nn Hermitian matrix with eigenvalues 1 . . .n. To prove the Rayleigh Principle (1), consider the equivalent program

minimize

{

x,Ax : 1n

i=1

x2i = 0,x Rn}

. (6)

Since the objective function is continuous while the feasible set is compact, theminimum is attained at some a F Rn. If we apply Theorem 1, we see that thereexists a Lagrange multiplier R such that

2Aa2 a = 0,

i.e. a is an eigenvector corresponding to the eigenvalue equal to the Lagrange mul-tiplier . Since the objective function

a,Aa= a,a= ,

we conclude that must be the largest eigenvalue and we get the desired result.If we now add the constraint that x be restricted to the n 1 dimensional subspaceorthogonal to a, then we recover the second largest eigenvalue. Continuing in thismanner, we get all the eigenvalues. More precisely, if a1,a2, . . . ,ak are k mutuallyorthonormal eigenvectors corresponding to the k largest eigenvalues of A,1 . . .k, then we solve (6) with the added constraints

x,ai= 0, i = 1, . . . ,k.

The gradients of the constraints are necessarily linearly independent since the vec-tors x, and ai, i = 1, . . . ,k, are (mutually) orthonormal. Now if a is a solution, then(5) yields

2Aa2 a +k

i=1

iai = 0,

for some Lagrange multipliers ,i, i = 1, . . . ,k. However, taking the inner productwith fixed ai, and using the fact that

Aa,ai= a,Aai= ia,ai= 0,

we see that i = 0, i = 1, . . . ,k, and so Aa = a, i.e. a is the eigenvector correspond-ing to the (k+1)st largest eigenvalue. This argument also shows that A necessarilyhas n (real) mutually orthonormal eigenvectors.

Example 2. Consider the Kantorovich inequality, e.g. [1, 3],


1 x,Axx,A1x 14

1n

+

n1

2

, (7)

where A is an nn positive definite Hermitian matrix with eigenvalues 1 . . . n > 0,xRn, and x= 1. This inequality is useful in obtaining bounds for the rateof convergence of the method of steepest descent, e.g. [5]. To prove the inequalitywe consider the following (two) optimization problems

min(max) f1(a) :=(

ni=1 a2i i)(

ni=1 a2i 1i

)

subject to g(a) := 1ni=1 a2i = 0,(8)

where a = (ai) Rn,ai = x,ui and ui, i = 1, . . . ,n, is an orthonormal set of eigen-vectors of A corresponding to the eigenvalues i, i = 1, . . . ,n, respectively. Thus,f1(a) is the middle expression in (7). Suppose that the vector a = (ai) solves (8).Then, the necessary conditions of optimality state that ( is the Lagrange multiplier)

aii

(

j

a2j1j

)

+ ai1i

(

j

a2j j

)

ai = 0, i = 1, . . . ,n; i

a2i = 1. (9)

Thus,

f2(a) := i

(

j

a2j1j

)

+ 1i

(

j

a2j j

)

= , if ai 6= 0. (10)

On the other hand, if we multiply (9) by ai and sum over i, we get

= 2

(

j

a2j j

)(

j

a2j1j

)

= 2 f1(a). (11)

By (10) and (11), we can replace f1(a) in (8) by the middle expression in (10), i.e.by f2(a). The new necessary conditions for optimality (with playing the role ofthe Lagrange multiplier again and ai 6= 0) are

a ji j

+ a j jia j = 0, j = 1, . . . ,n.

Now, if both a j 6= 0,ai 6= 0, we get

f3(a) :=i j

+ ji

= . (12)

And, multiplying (12) by a j and summing over j yields

= j

a2j

(

i j

+ ji

)

= f2(a).

6 Henry Wolkowicz

Thus, we can now replace f2(a) (and so f1(a)) in (8) by f3(a). Note that a doesnot appear explicitly in f3(a). However, the i and j must correspond to ai 6= 0 anda j 6= 0. Consider the function

h(x,y) =xy

+yx, (13)

where 0 < x y . Since

ddx

h(x,y) =y(x2 y2)

(xy)2 0 (< 0 if x 6= y),

and, similarly ddy h(x,y) 0 (> 0 if x 6= y)), we see that h attains its maximum atx = and y = , and it attains its minimum at x = y. This shows that 2 f3(a) andthat f3 attains its maximum at

1n +

n1

, i.e. at a1 6= 0 and an 6= 0. The left-hand sideof (7) now follows from 2 f1(a) = f2(a) = f3(a). Now, to have f3(a) = f2(a), wemust choose a1 = an = 12 , and ai = 0,1 < i < n. Substituting this choice of a inf1(a) yields the right-hand side of (7).

2.2 Equality and Inequality Constraints

Now suppose that program (4) has, in addition, the inequality constraints (continu-ously differentiable)

gi(x) 0, i = 1, . . . ,m. (14)Then, we obtain the John necessary conditions of optimality. (See e.g. [8].)

Theorem 2. Suppose that a Rn solves (4) with the additional constraints (14).Then, there exist Lagrange multiplier vectors Rm+1+ , Rq, not both zero, suchthat

0 f (a)+ mi=1 igi(a)+ qj=1 jh j(a) = 0,

igi(a) = 0, i = 1, . . . ,m.(15)

The first condition in (15) is dual feasibility. The second Condition in (15) iscalled complementary slackness. It shows that either the multiplier i = 0 or theconstraint is binding, i.e. gi(a) = 0. The Karush-Kuhn-Tucker conditions (e.g. [8])assume a constraint qualification and have 0 = 1.

Example 3. Holders inequality states that if x,y Rn++, are (positive) vectors, p >1, and q = p/(p1), then


x,y=n

i=1

xiyi (

i

xpi

)1/p(

i

yqi

)1/q

= xpyq.

We now include a proof of this inequality using the John Multiplier Rule. (Thisproof corrects the one given in [8].)

Fix y = (yi) Rn++ and consider the program

min f (x) := ni=1 xiyisubject to g(x) := ni=1 x

pi 1 0

hi(x) := xi 0, i = 1, . . . ,n.

Holders inequality follows if the optimal value is yq. Since the feasible set iscompact, the minimum is attained at say a = (ai) Rn+. Then, there exist constants(Lagrange multipliers) 0 0,1 0,i 0, not all zero, such that

0yi + 1pap1i i = 0, i 0,i1g(a) = 0, iai = 0,i.

This implies that, for each i we have

0yi + 1pap1i = i = 0, if ai > 0,0yi = i 0, if ai = 0.

Since yi 0 and 0 0, we conclude that 0yi = i = 0, if ai = 0. Therefore, we get

0yi + 1 pap1i = i = 0,i. (16)

The remainder of the proof now follows as in [8]. More precisely, since not all themultipliers are 0, if 0 = 0, then 1 > 0. This implies that

g(a) = 0, (17)

and, by (16) that a = 0, contradiction. On the other hand, if 1 = 0, then 0 > 0which implies y = 0, contradiction. Thus, both 0 and 1 are positive and we canassume, without loss of generality, that 0 = 1. Moreover, we conclude that (17)holds. From (16) and (17) we get

f (a) = ni=1 aiyi= 1 pni=1 a

pi

= 1 p.

Since q = p/(p1), (16) and (17) now imply thatn

i=1

yqi =n

i=1

(1 p)qaqi = (1 p)q = f (a)q.

8 Henry Wolkowicz

2.3 Sensitivity Analysis

Consider now the convex (perturbed) program

(P)

() = min f (x)subject to gi(x) i, i = 1, . . . ,m,

h j(x) = j, j = m+ 1, . . . ,q,x U,

(18)

where U is an open subset of Rn, and the functions f and gi, i = 1, . . . ,m, are convexand h j, j = m+1, . . . ,q, are affine. The generalized Slater Constraint Qualification(CQ) for (P) states that

there exists x intU such thatgi(x) < i, i = 1, . . . ,m, and h j(x) = j, j = m+ 1, . . . ,q.

(19)

We can now state the convex multiplier rule and the corresponding shadow priceinterpretation of the multipliers. (See e.g. [8, 9].)

Theorem 3. Suppose that the CQ in (19) holds for (P0) in (18). Then,

(0) = min{ f (x)+m

i=1

igi(x)+q

j=m+1

jh j(x) : x U}, (20)

for some j R, j = m + 1, . . .q, and i 0, i = 1, . . . ,m. If a F solves (P0),then in addition

igi(a) = 0, i = 1, . . . ,m. (21)

Theorem 4. Suppose that a F . Then, (20) and (21) imply that a solves (P0).Theorem 5. Suppose that a1 and a2 are solutions to (P1) and (P2), respectively,with corresponding multiplier vectors 1 and 2. Then,

(2 1, 2) f (a1) f (a2) (2 1, 1). (22)

Note that since the functions are convex and the problem (20) is an unconstrainedminimization problem, we see that if a F solves (P0), then (20) and (21) areequivalent to the system

f (a)+ mi=1 igi(a)+ qj=m+1 jh j(a) = 0

i 0, igi(a) = 0, i = 1, . . . ,m.(23)


Moreover, since f (ai)= ( i), when ai solves (P i), (22) implies that i ( i),i.e. the negative of the multiplier i is in the subdifferential of the perturbationfunction () at i. In fact (see [9])

(0) = { : is a multiplier vector for (P0)}.

If is unique, this implies that is differentiable at 0 and(0) = . Note that

(a) = { Rn : ( , a) () (a)}.

We will apply the convex multiplier rule in the sequel. Note that the necessity of(23) requires a constraint qualification, such as Slaters condition, while sufficiencydoes not. Thus, in our applications we do not have to worry about any constraintqualification. For, as soon as we can solve (23), the sufficiency guarantees optimal-ity. Note that necessity is used in numerical algorithms.

3 Generating Eigenvalue Bounds

We consider the nn matrix A which has real eigenvalues 1 . . . n. We haveseen how to apply optimization techniques in order to prove several known inequal-ities. Now suppose that we are given several facts about the matrix A, e.g. n, traceAand/or detA etc... In order to find upper (lower) bounds for f ( ), a function of theeigenvalues, we could then maximize (minimize) f ( ) subject to the constraintscorresponding to the given facts about A. An explicit solution to the optimizationproblem would then provide the, previously unknown, best upper (lower) boundsto f ( ) given these facts. To be able to obtain an explicit solution we must choosesimple enough constraints and/or have a lot of patience.

Suppose we wish to obtain a lower bound for k, the k-th largest eigenvalue,given the facts that

K := traceA, m :=Kn

, L := traceA2, s2 :=Lnm2.

Then we can try and solve the program

min ksubject to (a) ni=1 i = K,

(b) ni=1 2i L,(c) ki 0, i = 1, . . . ,k1,(d) ik 0, i = k + 1, . . . ,n.

(24)

This is a program in the variables i with n,k,K and L fixed. We have replacedthe constraint 2i = L with 2i L. This increases the feasible set of vectors = (i) and so the solution of (24) still provides a lower bound for k. However, theprogram now becomes a convex program. Note that (traceA)2 = (i)2 n 2i =

10 Henry Wolkowicz

ntraceA2, by the Cauchy-Schwartz inequality, with equality if and only if 1 = 2 =. . . = n. Thus, if (traceA)2 = ntraceA2, then we can immediately conclude thati = traceA/n, i = 1, . . . ,n. Moreover, if nL 6= K2, then nL > K2, and we can alwaysfind a feasible solution to the constraints which strictly satisfies 2i < L, and hencewe can always satisfy the generalized Slater CQ.

Theorem 6. If K2 < nL and 1 < k n, then the (unique) explicit solution to (24) is

1 = . . . = k1 = m+ s(

nk+1k1

)12 ,

k = . . . = n = m s(

k1nk+1

)12 ,

(25)

with Lagrange multipliers for the constraints (a) to (d) in (24) being

= mns(

k1nk+1

)12 1n ,

=(

k1nk+1

)12 1

2ns ,

i = 0, i = 1, . . . ,k1,i = 1nk+1 , i = k + 1, . . . ,n,

(26)

respectively.

Proof. Since (24) is a convex program, the Karush-Kuhn-Tucker conditions are suf-ficient for optimality. Thus, we need only verify that the above solution satisfies boththe constraints and (23). However, let us suppose that the solution is unknown be-forehand, and show that we can use the necessity of (23) to find it. We get

+ 2 i i = 0, i = 1, . . . ,k1 (27a)

1 + + 2 k +k1i=1

in

i=k+1

i = 0 (27b)

+ 2 i + i = 0, i = k + 1, . . . ,n, (27c)

R, 0,(

n

i=1

2i L)

= 0,i 0,i(ik) = 0, i = 1, . . . ,n. (27d)

Now, if = 0, then

= i = j, i = 1, . . . ,k1, j = k + 1, . . . ,n.

This implies that they are all 0, (or all > 0 if k = n) which contradicts (27b). Thus, > 0 and, by (27d),

i

2i = L. (28)

From (27a) to (27d), we now have

i =2

+i

2, i = 1, . . . ,k1,


j =2 j

2, j = k + 1, . . . ,n,

k =2 1

2

k1i=1

i2

+n

j=k+1

j2

.

Suppose i0 > 0, where 1 i0 k1. Then (27a) and (27d) imply that k = i0 =2 +

i02 . On the other hand, since we need

i =2

+i

2 k =

2

+i02

>2

, i = 1, . . . ,k1,

we must have i > 0, i = 1, . . . ,k1 and, by complementary slackness,

i = k = + i

2, i = 1, . . . ,k1. (29)

But then

k =2 1

2

k1j=1

j2

+n

j=k+1

j2

=2

+i

2, i = 1, . . . ,k1,

which implies nj=k+1 j > 0. But j > 0 implies j = k. This yields 1 = . . . =k = . . . = n, a contradiction since we assumed nL > K2. Thus, we conclude that

i = 0, i = 1, . . . ,k1.

Now if j0 > 0, for some k + 1 j0 n, then j0 = 2 1

2 + nj=k+1

j2 . Since

j = 2 j2 k, we must have j > 0, for all j = k + 1, . . . ,n. Note that j = 0

for all j = k+1, . . . ,n, leads to a contradiction since then j = 2 > k =2

12 .

Thus we have shown that the is split into two parts,

1 = . . . = k1 > k = . . . = n. (30)

The Lagrange multipliers also split into two parts,

1 = . . . = k1 = 0,k+1 = . . . = n = .

We now explicitly solve for 1,k,, , and . From the first two constraints and(28) we get

(k1)1 +(n k + 1)k = K,(k1) 21 +(n k + 1) 2k = L.

(31)

Eliminating one of the variables in (31) and solving the resulting quadratic yields(25). Uniqueness of (25) follows from the necessity of the optimality conditions.It also follows from the strict convexity of the quadratic constraint in the program(24). Using the partition in (30), we can substitute (27c) in (27b) to get

12 Henry Wolkowicz

1 + + 2 (

2) (n k) = 0,

i.e.

=1

n k + 1. (32)

In addition, 1k = 2 (

2

2

)

implies

=

2(1k)=

(

k1n k + 1

) 12 1

2ns, (33)

while

=21 =mns

(

k1n k + 1

) 12

1n. (34)

In the above, we have made use of the necessity of the Karush-Kuhn-Tucker(KKT ) conditions to eliminate non-optimal feasible solutions. Sufficiency of theKKT conditions in the convex case, then guarantees that we have actually found theoptimal solution and so we need not worry about any constraint qualification. Wecan verify our solution by substituting into (27).

The explicit optimal solution yields the lower bound as well as conditions for itto be attained.

Corollary 1. Let 1 < k n. Then

k m(

k1n k + 1

) 12

s, (35)

with equality if and only if 1 = . . . = k1,k = . . . = n.

The above Corollary is given in [12] but with a different proof. From the proofof Theorem 6, we see that = 0 if k = 1 and so the quadratic constraint 2i Lmay not be binding at the optimum. Thus the solution may violate the fact that 2i = traceA2. This suggests that we can do better if we replace the inequalityconstraint by the equality constraint 2i = L. We, however, lose the nice convexityproperties of the problem. However, applying the John conditions, Theorem 2, andusing a similar argument to the proof of Theorem 6, yields the explicit solution

1 = . . . = n1 = m+ s/(n1)12

n = m (n1)12 s,


i.e. we get the lower bound

1 m+ s/(n1)12 ,

with equality if and only if 1 = . . . = n1. (This result is also given in [12] butwith a different proof.)

The Lagrange multipliers obtained in Theorem 6 also provide the sensitivity co-efficients for program (24). (In fact, the multipliers are unique and so the perturba-tion function is differentiable.) This helps in obtaining further bounds for the eigen-values when we have some additional information, e.g. from Gersgorin discs. Wecan now improve our lower bound and also obtain lower bounds for other eigenval-ues.

Corollary 2. Let 1 < k < n. Suppose that we know

k+ik i, (36)

where i 0, i = 1, . . . ,n k. Then

k m(

k1n k + 1

) 12

s+1

n k + 1nki=1

i. (37)

Proof. The result follows immediately from the left-hand side of (22), if we perturbthe constraints in program (24) as given in (36) and use the multipliers i = 1n=k+1 .Note that k remains the k-th largest eigenvalue.

Corollary 3. Let 1 < k < n. Suppose that we know

k+i1k+t i,

for some i 0, i = 1, . . . ,t. Then

k+t m(

k1n k + 1

) 12

s 1n k + 1

t

i=1

i.

Proof. Suppose that we perturb the constraints in program (24) to obtain

k+ik i, i = 1, . . . ,t. (38)

Since i 0, this allows a change in the ordering of the i, for then we can havek+i = k + i > k. Thus the perturbation in the hypothesis is equivalent to (38).From (22), the result follows, since the k-th ordered i has become the (k + t)-thorder i.

14 Henry Wolkowicz

The results obtained using perturbations in the above two Corollaries can be ap-proached in a different way. Since the perturbation function is convex (see e.g.[9]) we are obtaining a lower estimate of the perturbed value () by using the mul-tiplier whose negative is an element of the sub-differential (). We can howeverobtain better estimates by solving program (24) with the new perturbed constraints.

Theorem 7. Under the hypotheses of Corollary 2, we get

k m(

k1n k + 1

) 12

s +1

n k + 1t

j=1

j, (39)

where

s2 = s2

(n k + 1)tj=1 2j (

tj=1 j)2

n(n k + 1) .

Equality holds if and only if

1 = . . . = k1;k+ik =i, i = 1, . . . ,t.

Proof. We replace the last set of constraints in program (24) by the perturbed con-straints (36), for i = k +1, . . . ,k + t. The arguments in the proof of Theorem 6 showthat the solution must satisfy (28) and

1 = . . . = k1;k+ jk = j, j = 1, . . . ,t.

We can assume that k + t = n, since we must have k+t+ j k+t and so we can addthe constraints

k+t+ jk t , j > 1,if required. This leads to the system

(k1)1 + tj=1(k j) = K,(k1) 21 + tj=1(k j)2 = L.

(40)

Let := tj=1 j and := tj=1 2j . Then (40) reduces to

(k1)1 +(n k + 1)k = K := K + ,(k1) 21 +(n k + 1) 2k 2k = L := L .

Thenk = (K (k1)1)/(n k + 1).

Substituting for k yields the quadratic

n(k1) 21 2(k1)(K )1 + K2 2K (n k + 1)L = 0,

which implies


1 =Kn

+

(

n k + 1k1

) 12{

(n k + 1)L + 2n(n k + 1)

(

Kn

)2} 1

2

(41)

and

k =Kn

+

n k + 1(

k1n k + 1

) 12{

Ln (n k + 1) +

2

n(n k + 1) (

Kn

)2} 1

2

.(42)

Note that the partial derivative with respect to j, at j = 0, of the lower boundfor k in (39) is 1/(n k + 1). This agrees with the fact that the correspondingmultiplier is j = 1/(n k + 1).

Corollary 3 can be improved in the same way that Theorem 7 improves Corollary2. We need to consider the program (24) with the new constraints

kik i, i = 1, . . . ,t,

where i 0 and k has replaced k+ t. Further improvements can be obtained if moreinformation is known. For example, we might know that

t+it i, i = 1, . . . ,s,

where l + s < t + s < k or k + s < t + s < n. In these cases we would obtain a resultas in Theorem 7.

In the remainder of this section we consider bounds for k + and k. Toobtain a lower bound for k + we consider the program

minimize k + subject to (a) i = K,

(b) 2i L,(c) ik 0, i = k + 1, . . . , (d) j 0, j = + 1, . . . ,n.

(43)

Note that we have ignored the constraints i k 0, i = 1, . . . ,k 1. From ourprevious work in the proof of Theorem 6, we see that the Lagrange multipliers forthese constraints should all be 0, i.e. we can safely ignore these constraints withoutweakening the bound.

Theorem 8. Suppose that K2 < nL and 1 k < n. Then the explicit solution to(43) is

1. If n > k1, then

1 = . . . = k1 = m+ s(

nk+1k1

)12 ,

k = . . . = n = m s(

k1nk+1

)12 ,

(44)

16 Henry Wolkowicz

with Lagrange multipliers for the constraints

=2 k 2(nk+1) , = 1ns

(

k1nk+1

)12 ,

i = j = 2/(n k + 1), i = k + 1, . . . , 1, j = + 1, . . . ,n, =

2(n+1)nk+1 1.

(45)

2. If n k1, then 1 is the solution of the quadratic (51) and

1 = . . . = k1,k = . . . = 1 = 1 + Kn12(k) = . . . = n = 1 + Kn12(n+1)

(46)

with Lagrange multipliers for the constraints being

=2 1, = 1n1K ,i = 1/( k), i = k + 1, . . . , 1, = 0, j = 1/(n + 1), j = + 1, . . . ,n.

Proof. To simplify notation, we let 2 . The Karush-Kuhn-Tucker conditionsfor (43) yield

(a) + i = 0, i = 1, . . . ,k1,(b) 1 + + k i=k+1 i = 0,(c) + i + i = 0, i = k + 1, . . . , 1,(d) 1 + + + nj=+1 j = 0,(e) + j + j = 0, j = + 1, . . . ,n,( f ) ,i, j 0,

(

n1 2t L)

= 0, i, j,(g) i(ik) = 0, j( j) = 0, i, j.

(47)

First suppose that = 0. If k > 1, we get that = 0 and so i = j = 0, for alli, j. This contradicts (47)(b). If k = 1, we get = j =i, for all i, j. So if 6= 0,we must have 1 = . . . = n = m. So we can let k > 1 and assume that > 0. Thenwe get

i = i = 1, . . . ,k1k = 1

+

i=k+1

i

i = i i = k + 1 . . . , 1

= 1

+

nj=+1 j

j = j j = + 1, . . . ,n

To simplify notation, the index i will now refer to i = k+1, . . . , 1 while the indexj will refer to j = + 1, . . . ,n. Since i0 k, i0 = k + 1, . . . , 1, we get


i 1 i0 . (48)Therefore, there exists at least one i0 > 0. This implies that i0 = k, and

i0 = 1

i=k+1

i.

Now if i1 = 0, then

i1 =

> i0

= i0 = k,

which is a contradiction. We conclude

i0 = k,i0 = 1i=k+1 i, i0 = k + 1, . . . , 1.

Note that if = 0, we get

i = 1/( k), i = k + 1, . . . , 1.

Similarly, since j0 , j0 = + 1, . . . ,n, we get

j 1 j,i.e. at least one j0 > 0 and so j0 = . But if j1 = 0, then

j1 =

> j0

= j0 = ,

a contradiction. We conclude

j0 = , j0 = 1nl+1 j + , j0 = + 1, . . . ,n.

So that if = 0, we also have

j = 1/(n + 1), j = + 1, . . . ,n.

There now remains two cases to consider:

= 0 and > 0.

Since k , we must have

1

i=k+1

i + 2 n

j=+1

j.

Moreover j k = i, for all i, j,

18 Henry Wolkowicz

which implies that j i, for all i, j.

So that if = 0, we must have

k1 > n .

From the expressions for i, j, we get

k = 1 + k1(k) ,= i =

1(k) , i = k + 1, . . . , 1,

= 1 + n(n+1) ,= j =

1(n+1) , j = + 1, . . . ,n.

Thusk = 1 1 (k) , = 1 1 (n+1) .

(49)

After substitution, this yields

1

=Kn1

2. (50)

Since > 0, we can apply complementary slackness and substitute for k and .We get the quadratic

(k1) 21 +( k)(

1 +Kn12( k)

)2

+(n + 1)(

1 +Kn1

2(n + 1)

)2

= L,

(51)or equivalently{

4( k)(n + 1)(k1)+(n l+ 1)(2( k)n)2 +(l k)(2(n + 1)n)2}

21+2K {(n l + 1)(2( k)n)+ (l k)(2(n + 1)n)}1

+{

(n l + 1)K2 +(l k)K24( k)(n + 1)L}

= 0.

Note that the above implies

k + = 21 +Kn1

2

(

1 k +

1n + 1

)

. (52)

In the case that k1 < n , we get > 0. Thus, = k and

i = k, i = k + 1, . . . ,n. (53)

Substitution yields the desired optimal values for . Moreover,


i = j = 1

t=k+1

t = 1n

s=+1

s + .

Let = i and = j, then we get

= = 1 ( k1) = 1 (n ) + .

This implies = = 2/(n k + 1) =

2(n+1)nk+1 1.

(54)

Now if k > 1, we see that

=i

1i= 2

(

k1n k + 1

) 12

/(ns).

Then = k i.

To obtain an upper bound for k, we consider the program

minimize k + subject to i = K,

2i L,ki 0, i = 1, . . . ,k1, j 0, j = + 1, . . . ,n

(55)

Theorem 9. Suppose that K2 < nL and 1 < k < < n. Let m = m, L = L (l k1)m, and s2 = Lk+nl+1 m2. Then the explicit solution to program (55) is

1 = . . . = k = m+ n+1k s = m+1

2k ,

k+1 = . . . = 1 = m = m, = . . . = n = m+ kn+1 s, = m 12(n+1) ,

(56)

with Lagrange multipliers for the four sets of constraints being

= 2m ,

=

1k +

1nl+1

2

ns

i = 1/k, i = 1, . . . ,k1, j = 1/(n + 1), j = + 1, . . . ,n,

(57)

respectively.

20 Henry Wolkowicz

Proof. The proof is similar to that in Theorem 8. Alternatively, sufficiency of theKKT can be used.

The Theorem yields the upper bound

k n12 s

(

1k

+1

n + 1

) 12

.

4 Fractional Programming

We now apply techniques from the theory of fractional programming to derivebounds for the Kantorovich ratio

kk +

. (58)

This ratio is useful in deriving rates of convergence for the accelerated steepestdescent method, e.g. [6].

Consider the fractional program (e.g. [10, 11])

max

{

f (x)g(x)

: x F}

. (59)

If f is concave and g is convex and positive, then h = fg is a pseudo-concave func-tion, i.e. h : Rn R satisfies (y x)t h(x) 0 implies h(y) h(x). The convexmultiplier rules still hold if the objective function is pseudo-convex. We could there-fore generate bounds for the ratio (58) as was done for k in Section 3. However, itis simpler to use the following parametric technique. Let

h(q) := max{ f (x)qg(x) : x F} . (60)

Lemma 1 ([2]). Suppose that g(x) > 0, for all x F , and that q is a zero of h(q)with corresponding solution x F . Then x solves (59).

Proof. Suppose not. Then there exists x F such that

q =f (x)g(x)

0. Then the explicit solution to (61) is

1 = . . . = k = p(n+1+k)(n+1)(1 p

12 )

k(n+1+k)k+1 = . . . = 1 = trace A

2

trace A

= . . . = n = p 1 p12

n+1+k ,

(62)

k =(p + k)(n + 1 p) 12 (n + 1 + k)

2(p + k)(k(n + 1)) 12 +{(p + k)(n +1 p)} 12 (n + 1 + k),

wherep := K

2

L (1)p := K ( k1) LKp := 1 kn+1(n + 1 + k)

(

1k +

k1p2

(

LK

)2 Lp2

)

.

Proof. Let F denote the feasible set of (61), i.e. the set of = (i) Rn satisfyingthe constraints. We consider the following parametric program

(Pq) h(q) := max{(k)q(k + ) : F} .

Then h(q) is a strictly decreasing function of q and, if solves (Pq) with h(q) = 0,then, by the above Lemma 1, solves the initial program (61) also.

22 Henry Wolkowicz

The objective function of (Pq) can be rewritten as min(1 q)k + (1 + q).The Karush-Kuhn-Tucker conditions for (Pq) now yield (with 2 again):

k th

l th

. . .

. . .

. . .(1q)

. . .

. . .1 + q. . .. . .

+

1. . .. . .. . .. . .. . .. . .. . .1

+

1. . .. . .. . .. . .. . .. . .. . .n

+ . . .+

. . .i. . .. . .. . .. . .. . .. . .. . .

+

. . .

. . .

. . .

k1i=1 i. . .. . .. . .. . .. . .

+

. . .

. . .

. . .

. . .

. . .

. . .

ni=+1 i. . .. . .

+ . . .+ . . .

. . .

. . .

. . .

. . .

. . .

. . .

. . . j. . .

= 0

0;i 0,i = 1, . . . ,k1; j 0, j = + 1, . . . ,n; F ;

(

2i K2)

= 0;i (ki) = 0,i; j ( j) = 0, j.Since k and we seek q such that h(q) = 0, we need only consider q > 0.

Further, if = 0, then we get the following cases:

k < i < : 0 = + i implies = 0i < k : 0 = + i i implies = i = 0 < i : 0 = = i + i implies =i = 0i = k : 0 =(1q)+ + k + i implies =i + 1q = i : 0 = +(1 + q)+ + i implies = i (1 + q).

(63)

These equations are inconsistent. Therefore, we can assume > 0, which impliesthat 2i = L.

Now, for i < k, either k = i or i = 0 which implies that i =/ . Similarly,for < i, = i or i =/ . And, for k < i < , i =/ . We can thereforesee that our solution must satisfy

i = k, i = 1, . . . ,ki = , i = k + 1, . . . , 1i = , i = , . . . ,n.

Now rather than continuing in this way, we can apply Lemma 2. Let w = (wi), with


wi =1q

k , i = 1, . . . ,kwi = 0, i = k + 1, . . . , 1wi =

(1+q)n+1 , i = , . . . ,n.

Then(1q)k (1 + q) = (1q)k ki=1 i

(1+q)n+1

ni= i = wT ;

mwT e = m(1q1q)=2mq;wTCw = wT Iw 1n wT eeT w

= (1q)2

k +(1+q)2

n+1 1n(4q2)nwTCw = n(1q)

2

k +n(1+q)2

n+1 4q2.Therefore, Lemma 2 yields

(1q)1 (1 + q)n2mq + s{

n(1q)2k

+n(1 + q)2

n + 1 4q2}

12

, (64)

with equality if and only if = aw+ be,

for some scalars a and b with a 0. And, the right hand side of (64) equals h(q),the maximum value of (Pq).

We now need to find q such that h(q) = 0, i.e.

4m2q2 = s2{

n(1q)2k

+n(1 + q)2

n + 1 4q2}

;

k(n + 1)4m2q2 = (n + 1)s2n(1q)2 + ks2n(1 + q)2 k(n + 1)s24q2;(

k(n + 1)4m2+ s2n(n + 1 + k) k(n +1)s24)

q2

+2s2n((n + 1)+ k)q+s2n((n + 1)+ k)= 0

[(ns24km24s2k)(n + 1)+ ns2k]q2 + 2ns2(k (n + 1))q +ns2(n + 1 + k)= 0

q =ns2(k (n + 1))n2s4(k (n + 1))2 [as above]ns2(n + 1 + k) 12

[as above].

We have chosen the negative radical for the root, since the quantity in [ ] is negativeand we need q > 0. The conditions for equality in (64) yield:

i = a (1q)k + b, i = 1, . . . ,ki = b, i = k + 1, . . . , 1i = a(1+q)n+1 + b, i = , . . . ,n

or

24 Henry Wolkowicz

kk+

=a(

1qk

)

+b+ a(1+q)n+1ba(1q)

k a(1+q)n+1 +2b

=a(1q)(n+1)+ak(1+q)

a(n+1)(1q)a(1+q)k+2bk(n+1).

We now solve for a and b by substituting for i in i = K and 2i = L:

k

(

a(1q)k

)

+ b +( k1)b+(n +1)(a(1 + q)

n + 1

)

+ b = K

or(k + k1 + n +1)b = Ka(1q)+ a(1 +q)

b =2aq + K

n.

And

k

(

a(1q)k

)

+ b2 +( k1)b2 +(n 1)(a(1q)

n + 1 + b2)

= K2

or

[k+k1+n+1]b2+[2a(1q)2a(1+q)]b+a2(1q)2

k+

a2(1 + q)2

n + 1 K = 0

k

(

a1q

k+

2aq + Kn

)2

+(k1)(

2aq + Kn

)2

+(n+1)(a(1 + q)

n + 1 +2aq + K

n

)

K = 0

k

(

a

(

1qk

+2qn

)

+Kn

)2

+( k1)(

2aqn

+Kn

)2

+(n + 1)(

+a

((1 + q)n + 1 +

2qn

)

+Kn

)2

L = 0;

[k

(

1qk

+2qn

)2

+( k1)4q2

n2+(n + 1)

((1q)n + 1 +

2qn

)2

]a2

+a2[k

(

1qk

+2qn

)

Kn

+( k1)2qn

Kn

+(n + 1)(

2qn (1 + q)

n + 1

)

Kn

]

+k

(

Kn

)2

+( k1)(

Kn

)2

+(n + 1)(

Kn

)2

L = 0[4q2

n+

(1q)2k

+(1 + q)2

n + 1

]

a2 + nm2L = 0

a =nm2 + L

4q2n +

(1q)2k +

(1+q)2

(n+1)

12

Substitution for the i yields the desired results.


Let = k/. Then

k =1 + 1

anddkd

=2

( + 1)2> 0.

Thus is isotonic to k. This yields an upper bound to k, see [13]: If (to guarantee > 0) we have (1)L < L, then

k c + k +

{

n+1k (c + k)(n + 1 c)

}12

c + k{

kn+1(c + k)(n + 1 c)

}12

,

where

c =(K)2

L (1).

(These inequalities are also given in [7].) Note that

+ 11 =

k + k

,

is reverse isotonic to . Thus we can derive a lower bound for this ratio.

5 Conclusion

We have used optimization techniques to derive bounds for functions of the eigen-values of an nn matrix A with real eigenvalues. By varying both the function to beminimized (maximized) and the constraints of a properly formulated program wehave been able to derive bounds for the k-th largest eigenvalue, as well as for sums,differences and ratios of eigenvalues. Additional information about the eigenval-ues was introduced to improve the bounds using the shadow prices of the program.Many more different variations remain to be tried.

The results obtained are actually about ordered sets of numbers 1 . . . nand do not depend on the fact that these numbers are the eigenvalues of a matrix.We can use this to extend the bounds to complex eigenvalues. The constraints onthe traces can be replaced by

vi = traceT, (vi)2 traceT T,where vi can take on the real, imaginary, and modulus of the eigenvalues i, and thematrix T can become (A + A)/2, (AA)/2i. Further improvements can be made

26 Henry Wolkowicz

by using improvements of the Schur inequality (vi)2 traceT T . This approachis presented in [12] and [13].

References

1. A. CLAUSING. Kantorovich-type inequalities. Amer. Math. Monthly, 89(5):314, 32730,1982.

2. W. DINKELBACH. On nonlinear fractional programming. Management Sci., 13:492498,1967.

3. L.V. KANTOROVICH and G.P. AKILOV. Functional analysis. Pergamon Press, Oxford,second edition, 1982. Translated from the Russian by Howard L. Silcock.

4. R. KUMAR. Bounds for eigenvalues. Masters thesis, University of Alberta, 1984.5. D.G. LUENBERGER. Optimization by Vector Space Methods. John Wiley, 1969.6. D.G. LUENBERGER. Algorithmic analysis in constrained optimization. In Nonlinear pro-

gramming (Proc. Sympos., New York, 1975), pages 3951. SIAMAMS Proc., Vol. IX, Prov-idence, R. I., 1976. Amer. Math. Soc.

7. J. MERIKOSKI, G.P.H. STYAN, and H. WOLKOWICZ. Bounds for ratios of eigenvaluesusing traces. Linear Algebra Appl., 55:105124, 1983.

8. B.H. POURCIAU. Modern multiplier methods. American Mathematical Monthly, 87(6):433451, 1980.

9. R.T. ROCKAFELLAR. Convex analysis. Princeton Landmarks in Mathematics. PrincetonUniversity Press, Princeton, NJ, 1997. Reprint of the 1970 original, Princeton Paperbacks.

10. S. SCHAIBLE. Fractional programmingstate of the art. In Operational research 81 (Ham-burg, 1981), pages 479493. North-Holland, Amsterdam, 1981.

11. S. SCHAIBLE. Fractional programming. In Handbook of global optimization, volume 2 ofNonconvex Optim. Appl., pages 495608. Kluwer Acad. Publ., Dordrecht, 1995.

12. H. WOLKOWICZ and G.P.H. STYAN. Bounds for eigenvalues using traces. Linear AlgebraAppl., 29:471506, 1980.

13. H. WOLKOWICZ and G.P.H STYAN. More bounds for eigenvalues using traces. LinearAlgebra Appl., 31:117, 1980.

Acknowledgements Research supported by The Natural Sciences and Engineering ResearchCouncil of Canada. The author is grateful to Wai Lung Yeung, University of Waterloo, for his helpwith this paper. His MATLAB file that verifies many of the inequalities in the paper is available atURL: orion.uwaterloo.ca/hwolkowi/henry/reports/ABSTRACTS.html

Generating Eigenvalue Bounds Using Optimizationhwolkowi/henry/reports/genereigbnds.pdf ·...

Documents

Transcript of Generating Eigenvalue Bounds Using Optimizationhwolkowi/henry/reports/genereigbnds.pdf ·...