Lecture 8 - Pitt

35
Lecture 8 Econ 2001 2015 August 19

Transcript of Lecture 8 - Pitt

Page 1: Lecture 8 - Pitt

Lecture 8

Econ 2001

2015 August 19

Page 2: Lecture 8 - Pitt

Lecture 8 Outline

1 Eigenvectors and eigenvalues2 Diagonalization3 Quadratic Forms4 Definiteness of Quadratic Forms5 Uniqure representation of vectors

Page 3: Lecture 8 - Pitt

Eigenvectors and Eigenvalues

DefinitionAn eigenvalue of the square matrix A is a number λ such that A− λI is singular.If λ is an eigenvalue of A, then any x 6= 0 such that (A− λI)x = 0 is called aneigenvector of A associated with the eigenvalue λ.

Therefore:

1 Eigenvalues solve the equation

det(A− λI) = 0.2 Eigenvectors are non trivial solutions to the equation

Ax = λx

Why do we care about eigenvalues and their corresponding eigenvectors?

They enable one to relate complicated matrices to simple ones.They play a role in the study of stability of difference and differential equations.They make certain computations easy.They let us to define a way for matrices to be positive or negative and thatmatters for calculus and optimization.

Page 4: Lecture 8 - Pitt

Characteristic EquationDefinitionIf A is an n × n matrix, the characteristic equation is defined as

f (λ) = det(A− λI) = 0 or f (λ) = det

α11 − λ α12 · · · α1nα21 α22 − λ · · · α2n...

......

αm1 αm2 · · · αmn − λ

= 0

This is a polinomial equation in λ.

ExampleFor a two by two matrix:

A2×2− λ I

2×2=

(a11 − λ a12a21 a22 − λ

)⇒ detA = (a11 − λ)(a22 − λ)− a12a21

Hence the characteristic equation is

λ2 − (a11 + a22)λ+ a11a22 − a12a21 = 0Which typically has two solutions.

Page 5: Lecture 8 - Pitt

Characteristic Polynomial

The characteristic equation f (λ) = det(A− λI) = 0 is a polynomial degree n.By the Fundamental Theorem of Algebra, it has n roots (not necessarilydistinct and not necessarily real).

That isf (λ) = (λ− c1)(λ− c2) · · · (λ− cn)

where c1, . . . , cn ∈ C (the set of complex numbers) and the ci’s are notnecessarily distinct.

Notice that f (λ) = 0 if and only if λ ∈ {c1, . . . , cn}, so the roots are all thesolutions of the equation f (λ) = 0.if λ = ci ∈ R, there is a corresponding eigenvector in Rn .if λ = ci 6∈ R, the corresponding eigenvectors are in Cn \ Rn .

Another way to write the characteristic polynomial is

P(λ) = (λ− r1)m1 · · · (λ− rk )mk ,where r1, r2 . . . , rk are distinct roots (ri 6= rj when i 6= j) and mi are positiveintegers summing to n.

mi is called the multiplicity of root ri .

Page 6: Lecture 8 - Pitt

Distinct EigenvectorsFACTThe eigenvectors corresponding to distinct eigenvalues are distinct.

Proof By Contradiction.Let λ1, . . . , λk be distinct eigenvalues and x1, . . . , xk the associated eigenvectors.Suppose these vectors are linearly dependent (why is this a contradiction?).

WLOG, let the first k − 1 vectors be linearly independent, while xk is a linearcombination of the others.

Thus, ∃αi i = 1, . . . , k − 1 not all zero such that:∑k−1

i=1 αixi = xkMultiply both sides by A and use the eigenvalue property (Ax = λx):

k−1∑i=1

αiλixi = λkxk

Multiply the first equation by λk and subtract it from the second:k−1∑i=1

αi (λi − λk )xi = 0

Since the eigenvalues are distinct λi 6= λk for all i ; hence, we have anon-trivial linear combination of the first k − 1 eigenvectors equal to 0,contradicting their linear independence.

Page 7: Lecture 8 - Pitt

Diagonalization

DefinitionWe say B is diagonalizable if we can find matrices P and D, with D diagonal, suchthat

P−1BP = D

If a matrix is diagonalizable

PP−1BPP−1 = PDP−1

orB = PDP−1

where D is a diagonal matrix.

Why do we care about this?

We can use simple (i.e. diagonal) matrices to ‘represent’more complicatedones.This property is handy in many applications.An example follows: linear difference equations.

Page 8: Lecture 8 - Pitt

Difference Equations Detour

A difference equation is an equation in which “discrete time” is one of theindependent variables.

For example, the value of x today depends linearly on its value yesterday:

xt = axt−1 ∀t = 0, 1, 2, 3, . . .This is a fairly common relationship in time series data and macro.

Given some initial condition x0, this equation is fairly easy to solve using

‘recursion’:

x1 = ax0x2 = ax1 = a2x0...xt−1 = axt−2 = at−1x0xt = axt−1 = atx0

Hence:xt = atx0 ∀t = 0, 1, 2, 3, . . .

Page 9: Lecture 8 - Pitt

Difference Equations Detour Continued

Consider now a two-dimensional linear difference equation:(ct+1kt+1

)=

(b11 b12b21 b22

)(ctkt

)∀t = 0, 1, 2, 3, . . .

given some initial condition c0, k0.

Set yt =(ctkt

)∀t and B =

(b11 b12b21 b22

)and rewrite this more

compactly asyt+1 = Byt ∀t = 0, 1, 2, 3, . . .

where bij ∈ R each i , j .We want to find a solution yt , t = 1, 2, 3, . . . given the initial condition y0.

Such a dynamical system can arise as a characterization of the solution to astandard optimal growth problem (you will see this in macro).

This is hard to solve since the two variables interact with each other as timegoes on.

Things would be much easier if there were no interactions (b12 = 0 = b21)because in that case the two equations would evolve independently.

Page 10: Lecture 8 - Pitt

Difference Equations Detour: The End

We want to solveyt+1 = Byt ∀t = 0, 1, 2, 3, . . .

If B is diagonalizable, there exist an invertible 2× 2 matrix P and a diagonal2× 2 matrix D such that

P−1BP = D =(d1 00 d2

)Then

yt+1 = Byt ∀t ⇐⇒ P−1yt+1 = P−1Byt ∀t⇐⇒ P−1yt+1 = P−1BPP−1yt ∀t⇐⇒ yt+1 = Dyt ∀t

where we defined yt = P−1yt ∀t (this is just a change of variable).Since D is diagonal, after this change of variable to yt we now have to solvetwo independent linear univariate difference equations

yit = d ti y0 ∀t for i = 1, 2

which is easy because we can use recursion.

Page 11: Lecture 8 - Pitt

Diagonalization TheoremTheoremIf A is an n × n matrix that either has n distinct eigenvalues or is symmetric, thenthere exists an invertible n × n matrix P and a diagonal matrix D such that

A = PDP−1

Moveover, the diagonal entries of D are the eigenvalues of A, and the columns ofP are the corresponding eigenvectors.

NotePremultiply by P and postmultiply by P−1, the theorem says:

P−1AP = D

DefinitionTwo square matrices A and B are similar if A = P−1BP for some invertible matrixP.

The theorem says that some square matrices are similar to diagonal matricesthat have eigenvalues on the diagonal.

Page 12: Lecture 8 - Pitt

Diagonalization Theorem: Idea of Proof

We want to show that for a given A there exist a matrix P and a diagonal matrixD such that A = PDP−1, where the diagonal entries of D are the eigenvalues of Aand the columns of P are the corresponding eigenvectors.

Idea of Proof (a real proof is way too diffi cult for me)

Suppose λ is an eigenvalue of A and x is an eigenvector. Thus Ax = λx.If P is a matrix with column j equal to the eigenvector associated with λi , itfollows that AP = PD.The result would then follow if one could guarantee that P is invertible.The proof works by showing that when A is symmetric, A only has realeigenvalues, one can find n linearly independent eigenvectors even if theeigenvalues are not distinct (these results use properties of complex numbers).

See a book for details.

Page 13: Lecture 8 - Pitt

A Few Computational Facts For You To Prove

Facts

detAB = detBA = detA detB

If D is a diagonal matrix, then detD is equal to the product of its diagonalelements.

detA is equal to the product of the eigenvalues of A.

DefinitionThe trace of a square matrix A is given by the sum of its diagonal elements. Thatis, tr (A) =

∑ni=1 aii .

Fact

tr (A) =n∑i=1

λi ,

where λi is the ith eigenvalue of A (eigenvalues counted with multiplicity).

Page 14: Lecture 8 - Pitt

Unitary Matrices

RememberAt is the transpose of A: the (i , j)th entry of At is the (j , i)th entry of A.

DefinitionAn n × n matrix A is unitary if At = A−1.

REMARKBy definition every unitary matrix is invertible.

Page 15: Lecture 8 - Pitt

Unitary MatricesNotationA basis V = {v1, . . . , vn} of Rn is orthonormal

1 if each basis element has unit length (vi · vi = 1 ∀i), and2 distinct basis elements are orthogonal (vi · vj = 0 for i 6= j).

Compactly, this can be written as vi · vj = δij =

{1 if i = j0 if i 6= j .

TheoremAn n × n matrix A is unitary if and only if the columns of A are orthonormal.

Proof.Let vj denote the j th column of A.

At = A−1 ⇐⇒ AtA = I⇐⇒ vi · vj = δij

⇐⇒ {v1, . . . , vn} is orthonormal

Page 16: Lecture 8 - Pitt

Symmetric Matrices Have Orthonormal Eigenvectors

RememberA is symmetric if aij = aji for all i , j , where aij is the (i , j)th entry of A.

TheoremIf A is symmetric, then the eigenvalues of A are all real and there is orthonormalbasis V = {v1, . . . , vn} of Rn consisting of the eigenvectors of A. In this case, P inthe diagonalization theorem is unitary and therefore:

A = PDPt

The proof is also beyond my ability (uses the linear algebra of complex vectorspaces).

Page 17: Lecture 8 - Pitt

Quadratic Forms

Think of a second-degree polynomial that has no constant term:

f (x1, . . . , xn) =n∑i=1

αiix2i +∑i<j

β ijxixj

Let

αij =

{β ij2 if i < jβji2 if i > j

and A =

α11 · · · α1n...

. . ....

αn1 · · · αnn

Then,

f (x) = xtAxThis inspires the idea of a quadratic form.

Page 18: Lecture 8 - Pitt

Quadratic Forms

DefinitionA quadratic form in n variables is a function Q : Rn −→ R that can be written as

Q(x) = xtAxwhere A is a symmetric n × n matrix.

Why do we care about quadratic forms? They show up in many places.

For example, think about a function from Rn to R.We will see that the first derivative of this function is a vector (we take thederivative one component at a time).Thus, the second derivative is a matrix (we take the derivative of the firstderivative one component at a time).This matrix can be thought of as a quadratic form.Thus, a function’s shape could be related to the “sign” of a quadratic form.

What is the sign of a quadratic form anyhow?

Page 19: Lecture 8 - Pitt

Examples of Quadratic Forms

ExampleWhen n = 1 a quadratic form is a function of the form ax2.

ExampleWhen n = 2 it is a function of the form

a11x21 + 2a12x1x2 + a22x22

(remember a12 = a21 by symmetry).

ExampleWhen n = 3, it is a function of the form

a11x21 + a22x22 + a33x

23 + 2a12x1x2 + 2a13x1x3 + 2a23x2x3

Page 20: Lecture 8 - Pitt

Sign of A Quadratic Form

DefinitionA quadratic form Q(x) = xtAx is

1 positive definite if Q(x) > 0 for all x 6= 0.2 positive semi definite if Q(x) ≥ 0 for all x.3 negative definite if Q(x) < 0 for all x 6= 0.4 negative semi definite if Q(x) ≤ 0 for all x.5 indefinite if there exists x and y such that Q(x) > 0 > Q(y).

In most cases, quadratic forms are indefinite.

What does all this mean? Hard to tell, but we can try to look at special casesto get some intuition.

Page 21: Lecture 8 - Pitt

Positive and Negative Definiteness

IdeaThink of positive and negative definiteness as a way one applies to matrices theidea of “positive”and “negative”.

In the one-variable case, Q(x) = ax2 and definiteness follows the sign of a.

Obviously, there are lots of indefinite matrices when n > 1.

Diagonal matrices also help with intuition. When A is diagonal:

Q(x) = xtAx =n∑i=1

aiix2i .

therefore the quadratic form is:

positive definite if and only if aii > 0 for all i , positive semi definite if and onlyif aii ≥ 0 for all inegative definite if and only if aii < 0 for all i , negative semi definite if and onlyif aii ≤ 0 for all i , andindefinite if A has both negative and positive diagonal entries.

Page 22: Lecture 8 - Pitt

Quadratic Forms and Diagonalization

For symmetric matrices, definiteness relates to the diagonalization theorem.

Assume A is symmetric.By the diagonalization theorem:

A = RtDR,

where D is a diagonal matrix with (real) eigenvalues on the diagonal and R isan orthogonal matrix.

For any quadratic form Q(x) = xtAx, by definition, A is symmetric.Then we have

Q(x) = xtAx = xtRtDRx = (Rx)t D (Rx) .

The definiteness of A is thus equivalent to the definiteness of its diagonalmatrix of eigenvalues, D.

Think about why.

Page 23: Lecture 8 - Pitt

Quadratic Forms and Diagonalization: AnalysisA quadratic form is a function Q(x) = xtAx where A is symmetric.

Since A is symmetric, its eigenvalues λ1, . . . , λn are all real.Let V = {v1, . . . , vn} be an orthonormal basis of eigenvectors of A withcorresponding eigenvalues λ1, . . . , λn . By an earlier theorem, the P in thediagonalization theorem is unitary.

Then: A = UtDU where D =

λ1 0 · · · 00 λ2 · · · 0...

.... . .

...0 0 · · · λn

and U is unitary

We know that any x ∈ Rn can be written as x =∑n

i=1 γ ivi .Then, one can rewrite a quadratic form as follows:

Q(x) = Q(∑

γ ivi)=(∑

γ ivi)tA(∑

γ ivi)=(∑

γ ivi)tUtDU

(∑γ ivi

)=

(U∑

γ ivi)tD(U∑

γ ivi)=(∑

γ iUvi)tD(∑

γ iUvi)

= (γ1, . . . , γn)D

γ1...γn

=∑

λiγ2i

Page 24: Lecture 8 - Pitt

Quadratic Forms and Diagonalization

The algebra on the previous slide yields the following result.

TheoremThe quadratic form Q(x) = xtAx is

1 positive definite if λi > 0 for all i .2 positive semi definite if λi ≥ 0 for all i .3 negative definite if λi < 0 for all i .4 negative semi definite if λi ≤ 0 for all i .5 indefinite if there exists j and k such that λj > 0 > λk .

REMARKWe can check definiteness of a quadratic form using the eigenvalues of A.

Page 25: Lecture 8 - Pitt

Principal Minors

DefinitionA principal submatrix of a square matrix A is the matrix obtained by deleting anyk rows and the corresponding k columns.

DefinitionThe determinant of a principal submatrix is called the principal minor of A.

DefinitionThe leading principal submatrix of order k of an n × n matrix is obtained bydeleting the last n − k rows and column of the matrix.

DefinitionThe determinant of a leading principal submatrix is called the leading principalminor of A.

Principal minors can be used in definitess tests.

Page 26: Lecture 8 - Pitt

Another Definiteness Test

TheoremA matrix is

1 positive definite if and only if all its leading principal minors are positive.2 negative definite if and only if its odd principal minors are negative and itseven principal minors are positive.

3 indefinite if one of its kth order leading principal minors is negative for an evenk or if there are two odd leading principal minors that have different signs.

This classifies definiteness of quadratic forms without finding the eigenvaluesof the corresponding matrices.

Think about these conditions when applied to diagonal matrices and see ifthey make sense in that case.

Page 27: Lecture 8 - Pitt

Back to Linear Algebra DefinitionsAll vectors below are elements of X (a vector space) and all scalars are realnumbers.

The linear combination of x1, . . . , xn with coeffi cients α1, . . . , αn :

y =n∑i=1

αixi

The set of all linear combinations of elements of V = {v1, . . . , vk}

spanV = {x ∈ X : x =k∑i=1

λivi with v ∈V }

A set V ⊂ X spans X if spanV = X .A set V ⊂ X is linearly dependent if

∃v1, . . . , vn ∈ V and α1, . . . , αn not all zero such thatn∑i=1

αivi = 0

A set V ⊂ X is linearly independent if it is not linearly dependent.Thus, V ⊂ X is linearly independent if and only if:

n∑i=1

αivi = 0 with each vi ∈ V ⇒ αi = 0 ∀i

.

A basis of X is a linearly independent set of vectors in X that spans X .

Page 28: Lecture 8 - Pitt

Vectors and Basis

Any vector can be uniquely written as a finite linear combination of theelements of some basis of the vector space to which it belongs.

TheoremLet V be a basis for a vector space X over R. Every vector x ∈ X has a uniquerepresentation as a linear combination of a finite number of elements of V (with allcoeffi cients nonzero).

Haven’t we proved this yet? Not at this level of generality.

The unique representation of 0 is 0 =∑

i∈∅ αivi .

Page 29: Lecture 8 - Pitt

Any vector has a unique representation as linear combination of finitely manyelements of a basis.

Proof.Since V spans X , any x ∈ X can be written as a linear combination of elements ofV . We need to show this linear combination is unique.

Let x =∑s∈S1

αsvs and x =∑s∈S2

βsvs

where S1 is finite, αs ∈ R, αs 6= 0, and vs ∈ V for each s ∈ S1 andwhere S2 is finite, βs ∈ R, βs 6= 0, and vs ∈ V for each s ∈ S2.Define S = S1 ∪ S2,

αs = 0 for s ∈ S2 \ S1 and βs = 0 for s ∈ S1 \ S2Then

0 = x− x =∑s∈S1

αsvs −∑s∈S2

βsvs =∑s∈S

αsvs −∑s∈S

βsvs =∑s∈S(αs − βs )vs

Since V is linearly independent, we must have αs − βs = 0, so αs = βs , for alls ∈ S . s ∈ S1 ⇔ αs 6= 0⇔ βs 6= 0⇔ s ∈ S2So S1 = S2 and αs = βs for s ∈ S1 = S2, and the representation is unique.

Page 30: Lecture 8 - Pitt

A Basis Always ExistsTheoremEvery vector space has a (Hamel) basis.

This follows from the axiom of choice (did we talk about this?).An equivalent result says that if a linearly independent set is not a basis, onecan always “add” to it to get a basis.

TheoremIf X is a vector space and V ⊂ X is a linearly independent set, then V can beextended to a basis for X . That is, there exists a linearly independent set W ⊂ Xsuch that

V ⊂W ⊂ spanW = X

There can be many bases for the same vector space, but they all have thesame number of elements.

TheoremAny two Hamel bases of a vector space X have the same cardinality (arenumerically equivalent).

Page 31: Lecture 8 - Pitt

Standard Basis

DefinitionThe standard basis for Rn consists of the set of N vectors ei , i = 1, . . . ,N, whereei is the vector with component 1 in the ith position and zero in all other positions.

e1 =

100...0

e2 =

010...0

en−1 =

00...10

en =

00...01

1 A standard basis is a linearly independent set that spans Rn .2 Elements of the standard basis are mutually orthogonal. When this happens,we say that the basis is orthogonal.

3 Each basis element has unit length. When this also happens, we say that thebasis is orthonormal.

Verify all these.

Page 32: Lecture 8 - Pitt

Orthonormal Bases

We know an orthonormal basis exists for Rn (the standard basis).

FactOne can always find an orthonormal basis for a vector space.

FactIf {v1, . . . , vk} is an orthonormal basis for V then for all x ∈ V ,

x =k∑i=1

αivi =k∑i=1

(x · vi ) vi

This follows from the properties on the previous slide (check it).

Page 33: Lecture 8 - Pitt

Dimension and Basis

DefinitionThe dimension of a vector space X , denoted dimX , is the cardinality of any basisof X .

Notation ReminderFor V ⊂ X , |V | denotes the cardinality of the set V .

FactMm×n , the set of all m × n real-valued matrices, is a vector space over R.A basis is given by

{Eij : 1 ≤ i ≤ m, 1 ≤ j ≤ n} where (Eij )k` ={1 if k = i and ` = j0 otherwise

The dimension of the vector space of m × n matrices is mn.Proving this is an exercise.

Page 34: Lecture 8 - Pitt

Dimension and Dependence

TheoremSuppose dimX = n ∈ N. If A ⊂ X and |A| > n, then A is linearly dependent.

Proof.If not, A is linearly independent and can be extended to a basis V of X :

A ⊂ V ⇒ |V | ≥ |A| > na contradiction

Intuitively, if A’s dimension is larger than the dimension of X there must besome lineraly dependent elements in it.

TheoremSuppose dimX = n ∈ N, V ⊂ X, and |V | = n.

If V is linearly independent, then V spans X , so V is a basis.

If V spans X , then V is linearly independent, so V is a basis.

Prove this as part of Problem Set 8.

Page 35: Lecture 8 - Pitt

Tomorrow

We illustrate the formal connection between linear functions and matrices. Thenwe move to some useful geometry.

1 Linear Functions2 Linear Functions and Matrices3 Analytic Geometry in Rn