Linear Algebra Notes

download Linear Algebra Notes

of 28

description

Notes on the basics of linear algebra for graduate study.

Transcript of Linear Algebra Notes

  • Probability and Statistics Bai Huang

    I. Matrix Algebra

    1. Vector Spaces1.1 Real Vectors

    : The set of (finite) real numbers m: The m-dimensional Euclidean space

    (Cartesian product of s: )

    A Cartesian product A1 A2 Am:All possible ordered pairs, whose i eleent is fro A, i = ,, .

    A (real) vector: A particular element in m, denoted by x = (x1, x2, , xm)z xs: The elements or components of the vector xz m: The order of the vector x

    The arithmetic operations for two vectors:

    z Addition: x + y = (x + y, x + y, , x + y)a. x and y must be of the same order.b. [commutativity] x + y = y + xc. [associativity] (x + y) + z = x + (y + z)

    z Scalar multiplication: cx : = (cx1, cx2, , cxm), where c is a constantTwo vectors x and y are collinear if either x = 0 or y = 0 or y = cx.

    z Inner product: < x , >= xym1 = xy = yxa. Properties:

    i) < , > = < , x >ii) < , + z > = < x , > + < x , >iii) < x , > = c < x , >iv) 0, with = 0 iff x = 0

    b. Norm: || x || : = < x , x >1/2It is the geometric idea of length of the vector x.

    c. A vector x is said to be normalized if ||x|| = 1

    Any nonzero vector x can be normalized to x by x =

    ||||. Since x

    has unit length. We focus only on the direction then.d. Two vectors x and y are orthogonal if = 0. We write xy.

  • e. If, in addition, ||x|| = ||y|| = 1, the two vectors are said to be orthonormal.Example: In m, the unit vectors (or elementary vectors)

    e1 = ( 0 0 0), e2 = (0 0 0),, em = (0 0 0 ),are orthonormal.

    f. Cauchy Schwarz inequality:< , >2 ||x||2 ||y||2 with equality iff x and y are collinear.

    Exercise: Prove C-S inequality using properties of inner product.g. Triangle inequality:

    ||x + y|| ||x|| + ||y|| with equality iff x and y are collinear. Exercise: Prove triangle inequality. [Hint: using C-S inequality]

    xz The angle between two nonzero vectors x and y:

    x-yy

    By the cosine rule, ||x y||2 = ||x||2 + ||y||2 ||x||||y||cos.After simplification, this becomes < , > = ||||||||, thus the angle between x and y is determined by cos = ,|||||||| (0 < < )

    a. If x and y are orthogonal, = 0. Thus cos = 0 = 2 .

    b. The projection of y onto x = ||y||cosg

    ||||

    1.2 Complex Vectors

    A complex number, say u, is denoted by u = a + ib, where a and b are realnumbers and i is the imaginary unit defined by i2 = . We write Re(u) = aand I(u) = b.

    If u = a + ib, v = c + id are two complex numbers, they are said to be equaliff a = c and b = d.

    Addition: u + v = (a + c) + i(b + d)

    Product: uv = (ac bd) + i(ad + bc)

    The complex conjugate of u = a + ib is defined as u = a ib.z (u) = uz (u + v) = u + vz (uv) = uvz uv uv unless uv is a real number

    The modulus of u = a + ib is defined by |u| = (u u)1/2 = a2 + b2.

  • Division:* *

    * 2

    u uv uv= =v vv |v|

    Inner product: < , > = m *i ii=1 u vz Norm: ||u|| = < , >1/2

    : The set of all complex numbers

    1.3 Vector Spaces

    A vector space is a nonempty set of elements (called vectors) together withtwo operations and a set of axioms.z Twp operations:

    a. Addition: For any x, y , x + y b. Scalar multiplication: For any x , and any real (or complex) scalar ,

    x z Axioms:

    a. Additioni) x + y = y + xii) x + (y + z) = (x + y) + ziii) a vector in (denoted by 0) such that x + 0 = x for all xiv) x , a vector in (denoted by -x) such that x + (-x) = 0

    b. Scalar multiplicationi) (x) = ()xii) 1x = x

    c. Distributive lawsi) (x + y) = x + yii) ( + )x = x + x

    z It is the scalar rather than the vector that determines whether the space is realor complex.

    Three commonly used vector spaces:Complex vector space Inner product space Hilbert space

    add inner product add completeness

    n n

    A nonempty subset of a vector space is called a subspace of if, for allx, y , we have x + y and x for any scalar .

    The intersection of two subspaces and in a vector space , denoted by , consists of all vectors that belong to both and .

    The union of two subspaces and in a vector space , denoted by ,consists of all vectors that belong to at least one of and .

  • The sum of two subspaces and in a vector space , denoted by + ,consists of all vectors of the form a + b where a and b .

    A linear combination of the vectors x1, x2,, x in a vector space is a sum ofthe form 1x1 + 2x2 + + xz A finite set of vectors x1, x2,, x (n ) is said to be linearly dependent

    if there exist scalars 1, 2, , , not all zero, such that1x1 + 2x2 + + x = 0; otherwise it is linearly independent.Exercise: For which values of O are the vectors

    ( ,1,0) , (1, ,1) , and (0,1, )O O Oc c c linearly dependent?

    z An arbitrary set of elements A of (containing possibly an infinite numberof vectors) is linearly independent if every nonempty finite subset of A islinearly independent; otherwise it is linearly dependent.

    z Let A be a nonempty set of vectors from a vector space . The set consisting of all linear combinations of vectors in A is called the subspacespanned (or generated) by A.

    z Any set of n linearly independent real vectors x1, x2,, x spans .z If a vector space contains a finite set of n linearly independent vectors

    x1, x2,, x, but any set of n + 1 vectors is linearly dependent, then thedimension of will be n. We write dim () = n.In this case, is said to be finite dimensional. If no such n exists, isthen infinite dimensional. In particular, for = {0}, we say dim () = 0.

    z If is a vector space, finite or infinite, and A is a linearly independent set ofvectors from , then A is a basis of if is spanned by A.

    Example: a. What is the dimension of when the field of scalars is ?b. What is the dimension of when the field of scalars is ?

    A complex vector space is an inner product space if x, y , a complex-valued function < , >, which is called the innerproduct of x and y, such thati) < , > = < , >ii) < + , > = < , > +< , >iii) < , > = < , >iv) < , > 0, with equality iff x = 0z A real vector space is an inner product space if x, y , a real

    number < , > satisfying conditions i) iv).

    Consider a real-valued function (x) defined on an inner product space. It is anorm if (x) satisfiesi) (x) = ||(x)ii) (x) 0 with equality iff x = 0iii) (x + y) (x) + (y)Exercise: Show that ||x|| < , x >1/2 is a norm.

  • The concept of an inner product not only induces the idea of length (the norm) ||x|| < , x >1/2 , but also of distance of two vectors x and y, d (x , y) ||x y||, which satisfiesi) d (x , x) = 0ii) d (x , y) > 0 if x yiii) d (x , y) = d (y , x)iv) d (x , y) d (x , z) + d (z , y)

    In any inner product space, the Cauchy-Schwarz inequality and the triangle inequality hold. Besides, an equality, called the parallelogram theorem, is stated as

    ||x + y||2 + ||x y||2 = ||x||2 + ||y||2Example: Prove the parallelogram inequality in terms of algebra and geometry.

    For two vectors x and y in an inner product space, we say that x and y are orthogonal if < , > = 0, we write xy.z [Pythagorean Theorem]

    If xy in an inner product space, ||x + y||2 = ||x||2 + ||y||2. Exercise: Does the converse hold?

    A is called an orthogonal set if each pair of vectors in A is orthogonal. If, in addition, each vector in A has unit length, then A is called an orthonormal set.Example: Prove that any orthonormal set is linearly independent.

    Two subspaces and of an inner product space are said to be orthogonal if every vector in is orthogonal to every vector in .

    If is a subspace of an inner product space , then the space of all vectors orthogonal to is called the orthogonal complement of , denoted by .Example: Prove that is a subspace of .

    A sequence n n 1{x }f in an inner product space is said to converge in the norm to

    x if ||x x|| 0 as n .

    [Continuity of inner product]

    If n{x } , n{ }y are two sequences in an inner product space such that

    ||x x|| 0, ||y y|| 0, theni) ||x|| ||x||, ||y|| ||y||ii) < x , y > < x , y >

  • A sequence n n 1{x }f in an inner product space is a Cauchy sequence if

    ||x xm|| 0 as n, .

    ( > 0, () > 0 . . |x xm| < , , > () )

    An inner product space is a Hilbert space () if it is complete, namely, everyCauchy sequence in converges in the norm to an element x .

  • 2. Matrices2.1 Matrix Terminology

    A matrix is a rectangular array of numbers, denoted

    11 12 1

    21 22 2,, 1

    1 2

    [ ]

    n

    nm nij i j

    m m mn

    a a aa a a

    A a

    a a a

    ,

    where ija sc are called the entries or elements of A .

    The dimensions of a matrix are the numbers of its rows and columns. So the dimension of matrix A is m nu (m by n); or A is an m nu matrix.Sometimes the dimension is also called the order.

    A matrix with all entries zero is called a zero matrix or null matrix.

    When m = n, A becomes a square matrix.

    Several types of square matrices are listed here:

    z A symmetric matrix is one in which ,ij jia a i j

    z A diagonal matrix is a square matrix with only nonzero elements on the main diagonal.

    z An identity matrix is a diagonal matrix with all entries equal to 1 on the

    main diagonal. It is denoted by I . For example, 3I is a 3 3u identity

    matrix. z A triangular matrix is a square matrix that has only zeros either above or

    below the main diagonal. If the zeros are above the diagonal, the matrix is said to be lower triangular. Otherwise, it is upper triangular.

    z An orthogonal matrix P is a square matrix that satisfies PP P P Ic c .

    Here are some concepts for square matrices: inverse, determinant, and trace.

    z A square matrix A is invertible if , . . n n n n n n n n nB s t A B B A Iu u u u .

    We write 1B A . Also, B is invertible and 1A B .

    a. An inverse, if it exists, is unique.Exercise: Prove this.

  • b. A is invertible The column vectors of A are linearly independent.

    c. Example: i) A zero matrix is non-invertible ii) The inverse of an identity matrix is itself

    iii) The inverse of a diagonal matrix: 1 1

    1 1

    1

    0 0

    0 0n n

    a a

    a a

    , if all 0ia z

    iv) The inverse of a 2 2u matrix:

    1

    1 , if 0a b d b

    ad bcc d c aad bc

    z d. Calculation of the inverse:

    i) Definition Methodii) Formula Methodiii) Elementary Transformation Method

    e. Properties:

    i) 1 1( )A A

    ii) 1 1( ) ( )A A c c

    iii) 1 1 1( )AB B A

    iv) 1 1 1 1 1 1 1( ) ( )A BD C A A B D CA B CA r #Exercise: Try to figure out the special cases for iv) when

    (1) B C ?(2) , ,D I B b C cc ?

    z The determinant of a square matrix is a function of the elements of thematrix A, denoted by det (A) or |A|.a. Definition (expansion by cofactors):

    Using any row, say i, we obtain

    1| | ( 1) | |, 1, ,

    ni j

    ij ijj

    A a A j n

    , where

    ijA is the matrix obtained from A by deleting row i and column j. The

    determinant of ijA is called a minor of A , or the (i,j) minor of A .

    When the correct sign, ( 1)i j , is added, it becomes a cofactor, or the (i,j)

    cofactor. This operation can be done using any column as well.

  • Obviously, it is easy to choose the row or column that has the most zeros to make the expansion. It is unlikely, though, that you will ever calculate any determinant over 3 3u without a computer.

    b. The determinant provides important information when the matrix is thatof the coefficients of a system of linear equations. The system has aunique solution if and only if the determinant is nonzero.

    c. When the determinant corresponds to a linear transformation of a vectorspace, the transformation has an inverse operation if and only if thedeterminant is nonzero.

    d. A is invertible | | 0A z

    e. A geometric interpretation can be given to the value of the determinant ofa square matrix with real entries:The absolute value of the determinant gives the scale factor by whicharea or volume is multiplied under the associated linear transformation,while its sign indicates whether the transformation preserves orientation.

    Example: i) a b

    ad bcc d

    ii)

    a b ce f d f d e

    d e f a b ch i g i g h

    g h iaei afh bdi bfg cdh ceg

    f. Properties:i) Switching two rows or columns changes the determinant sign.

  • ii) Any determinant with two identical rows or columns has value 0.iii) A determinant with a row or column of zeros has value 0.iv) Adding a scalar multiple of one row (or column) to another does not

    change the determinant.

    v) | | | |nn nA AO Ou

    vi) | | | |A Ac

    vii) 1 1| | | |A A

    viii) If 1 2, , , KA A A are all n nu matrices,

    1 2 1 2| | | || | | |K KA A A A A A

    ix)

    11 12 1 11

    22 2 21 22

    1

    1 2

    | |

    n

    nn

    iii

    nn n n nn

    a a a aa a a a

    A a

    a a a a

    x) 11 12 11 11 2222 21 22

    0| | | || |

    0A A A

    A A AA A A

    Exercise: Using the concepts of inverse and determinant, prove the following

    properties for orthogonal matrix P : i) 1P Pc ii) | |P = 1r

    iii) PQ is orthogonal when Q is orthogonal.

    z The trace of a square matrix is defined to be the sum of the entries on the

    mail diagonal. 1

    ( )n

    n n iii

    tr A au

    .z Properties:

    i) The trace is invariant under cyclic permutations, for example

    ( ) ( ) ( ) ( )tr ABCD tr BCDA tr CDAB tr DABC

    ii) ( ) ( )tr A tr Ac

    iii) For two matrices of the same dimensions,

    ,( ) ( ) ( ) ( ) ij ij

    i jtr A B tr AB tr BA tr B A a bc c c c

    iv) ( ) ( ) ( )tr A B tr A tr B

  • v) ( ) ( )tr cA ctr A

    vi)* ( ( )) ( ( ))E tr X tr E X

    An m nu matrix can be viewed as a set of n column vectors in m, or as a set

    of m rows in . Thus, associated with a matrix A are two vector spaces: the column space and the row space.z The column space of A , denoted by colA , consists of all linear

    combinations of the columns of A ,col {x m: x = y for soe y }.

    z The row space of A , denoted by colAc , consists of all linear combinations of the rows of A , or the columns of Ac ,col {y : y = x for soe x m}.

    z The column rank of A is the maximum number of linearly independent columns it contains, namely, the dimension of the vector space that is spanned by its column space (col).

    z The row rank of A is the maximum number of linearly independent rows it contains, namely, the dimension of the vector space that is spanned by its row space (col).

    z [The Rank Theorem] The nontrivial fact that (col) = (col)implies that the column rank of A is equal to its row rank. It follows that rk() = (col).

    z A square n nu matrix A is said to be nonsingular if rk() = n ; otherwise, the matrix is singular. In fact,

    A is invertible A is nonsingular rk() = n | | 0A z

    Example: For a square n nu matrix A , show that | | 0A rk() < .

    z [The Rank Factorization Theorem] Every m nu matrix A of rank r can be written as A BCc , where m rB u and n rC u both have rank r.

    Example: Prove this theorem.

    z Simple properties of rank:Let A be an m nu matrix,i) 0 rk() = rk in (, )

    A is said to be of full rank if rk() = in (, ).ii) rk() = 0 OA iii) rk() = niv) rk() = rk() if 0

  • z Rank inequalities (sum):i) rk( + ) rk() + rk()ii) rk( ) |rk() rk()|Example: Prove i) and ii).

    z Rank inequalities (product):i) col colii) rk() in (rk(), rk())Example 1: Prove i) and ii).Example 2: Let A be an m nu matrix. If m

  • matrix [ ] [ ]ij i jC AB c a bx xc , where ia xc is the thi row of A and jbx is the

    thj column of B .

    z In general, AB BAz .

    andA B commute if AB BA .

    z 0 0 0A A {The three zero matrices may not be of the same dimensions}z r mA I I A A

    z ( ) ( )AB C A BC

    z ( )A B C AC BC

    z ( )AB B Ac c c

    Matrix representations of summation:

    z 1 2(1,1, ,1)( , , , )i nx x x x xLc c z

    2ix x xc

    z i ix y x y y xc c For two real matrices A and B of the same dimension we define the inner

    product as,

    , ( )ij iji j

    A B a b tr A Bc ! , which induces the norm:1/2 2

    ,|| || , ( )ij

    i jA A A a tr AAc !

    A calculation that may help to condense the notation or simplify the coding is the Kronecker product, denoted by . For general matrices A of dimension m nu and B of dimension p qu ,

    11 12 1

    21 22 2

    1 2

    is of dimension ( ) ( )

    n

    n

    m m mn

    a B a B a Ba B a B a B

    A B mp nq

    a B a B a B

    u

    .

    Example: 0 1 0

    0 0 1I

    6 6 6 6

  • z1 1 1( )A B A B

    z ( )A B A Bc c c

    z For square matrices andm m n nA Bu u ,

    a. | | | | | |n mA B A B

    b. ( ) ( ) ( )tr A B tr A tr B

    z ( )( ) ( ) ( )A B C D AC BD

    A complex matrix U can be written as U A iB , where i is the imaginary unit.z The conjugate transpose of U, denoted by , is defined as = .z If is real, then = .z A square matrix is said to be Hermitian if = .z A square matrix is said to be unitary if = .

    2.3 System of Equations

    Consider the set of n linear equations Ax = b, where x constitute the unknowns, A is a know matrix of coefficients, and b is a specified vector of values. We are interested in:(1) whether a solution exists;(2) if so, how to obtain it;(3) if it does exist, then whether it is unique.We only consider a square equation system here (i.e. those with an equal number of equations and unknowns).

    A homogeneous equation system is of the form Ax = 0.z Every homogeneous system has at least one solution, known as the zero

    solution (or trivial solution).z If the system has a nonsingular matrix A, then zero is the only solution.z If the system has a singular matrix, then there is a solution set with an infinite

    number of solutions. This solution set is closed under addition and scalar multiplication.

    A nonhomogeneous equation system is of the form Ax = b, where b is anonzero vector.z A nonhomogeneous equation system has a unique non-trivial solution

    x = A1b if and only if A is nonsingular.z If A is singular, then the system has either no solution or an infinite number

    of solutions.

  • [Gaussian Elimination Algorithm]:

    [: b]m [: x]

    Example: Solve the equation 1

    2

    3

    2 1 3 15 2 6 53 1 4 7

    xxx

    .

    SOLN: 2 1 3 1 1 0 0 35 2 6 5 0 1 0 23 1 4 7 0 0 1 1

    o

    1

    2

    3

    32

    1

    xxx

    .

    [Cramers Rule]:Cramers Rule is an explicit formula for the solution of a system of linear equations, with each variable given by a quotient of two determinants.

    Example: Solve the equation 1

    2

    3

    2 1 3 15 2 6 53 1 4 7

    xxx

    .

    SOLN: 1

    1 1 35 2 67 1 4 21 32 1 3 75 2 63 1 4

    x

    , 2

    2 1 35 5 63 7 4 14 22 1 3 75 2 63 1 4

    x

    ,

    3

    2 1 15 2 53 1 7 7 1

    2 1 3 75 2 63 1 4

    x

    .

    A Least Squares ProblemGiven a vector y and a matrix X, we are interested in expressing y as a linear combination of the columns of X. There are two possibilities.z If y lies in the column space of X, then we shall be able to find a vector b

    such that y = Xb.z Suppose that y is not in the column space of X. Then there is no b such that y

    = Xb holds. We can, however, write y = Xb + e, where e is the difference between y and Xb, or residual.

    z We try to solve for b = arin ee = arin (y Xb)(y Xb).Using the matrix calculus, b is found to be the solution to the nonhomogenous system Xy = XXb. It follows that b = (XX)1Xy.

  • 2.4 Partitioned Matrices

    A partitioned matrix is a matrix of the form m p m qn p n q

    A BZ

    C Du u

    u u

    .

    None of the matrices needs to be square, but A and B must have the same number of rows, A and C must have the same number of columns, and so on. We will mainly focus on partitioned matrices with two row blocks and two column blocks. It can be extended to m row blocks and n column blocks case, such as

    11 12 1

    21 22 2

    1 2

    n

    n

    m m mn

    Z Z ZZ Z Z

    Z Z Z

    .

    As a special case, we say that a square matrix is block-diagonal if it takes the form

    11

    22

    rr

    ZZ

    Z

    ,

    where all diagonal blocks are square, not necessarily of the same order.

    A General PrincipleThe main tool in obtaining the inverse, determinant, and rank of a partitioned matrix is to write the matrix as a product of simpler matrices, that is, matrices of which one (or two) of the four blocks is the null matrix.

    Some Basic Resultsz Partitioned sum: Let

    1 11

    1 1

    A BZ

    C D

    and 2 222 2

    A BZ

    C D

    , then 1 2 1 21 21 2 1 2

    A A B BZ Z Z

    C C D D

    z Partitioned product: 1Z and 2Z are defined above, then

    1 2 1 2 1 2 1 21 2

    1 2 1 2 1 2 1 2

    A A B C A B B DZ Z Z

    C A D C C B D D

    .

    z Partitioned transpose:

    A B A CC D B D

    c c c c c z Trace of partitioned matrix:

    ( ) ( )A B

    tr tr A tr DC D

  • z Elementary row-block operations

    i)O

    On

    m

    I A B C DI C D A B

    ii) O

    O n

    E A B EA EBI C D C D

    iii) O

    m

    n

    I E A B A EC B EDI C D C D

    z Elementary column-block operations

    i)O

    Op

    q

    IA B B AIC D D C

    ii) O

    O q

    FA B AF BIC D CF D

    iii) O

    p

    q

    I FA B A B AFIC D C D CF

    z If A is nonsingular, we have

    1

    1 1

    O OO O

    m m

    n q

    I I A BA B ACA I IC D D CA B

    .

    The matrix 1D CA B is called the Schur complement of A .z Similarly, if D is nonsingular, we have

    1 1

    1

    O OO O

    pm

    nn

    IA BI BD A BD CD C IC DI D

    .

    The matrix 1A BD C is called the Schur complement of D .

    Inverses

    z If A and D are nonsingular, 1 1

    1

    O OO OA A

    D D

    .

    z If B and C are nonsingular, 1 1

    1

    O OO OB C

    C B

    Example: When will the matrix O

    OB

    C

    be orthogonal?

  • z If A and its Schur complement 1E D CA B are nonsingular,1 1 1 1 1 1 1

    1 1 1

    A B A A BE CA A BEC D E CA E

    z If D and its Schur complement 1F A BD C are nonsingular,1 1 1 1

    1 1 1 1 1 1

    A B F F BDC D D CF D D CF BD

    Determinants

    Let A B

    ZC D

    ,

    z If A is nonsingular, 1| | | || |Z A D CA B .

    z If D is nonsingular, 1| | | || |Z D A BD C

    z Let A and D be square matrices, of order m and n, respectively.a. For any m mu matrix E ,

    | |EA EB A B

    EC D C D

    b. For any n mu matrix E ,A B A B

    C EA D EB C D

    Example: Will C D A BA B C D

    always hold?

    2.5 Characteristic Roots and Vectors

    A useful set of results for analyzing a square matrix A, real or complex, arises from the solutions to the set of equations Ax = x.The pairs of solutions are the characteristic roots (or eigenvalues) and their associated characteristic vectors (or eigenvectors) x.It is easy to see that the solution set for x is closed under scalar multiplication. To remove the indeterminancy (apart from sign), x is normalized so that xx = (xx = when x is real).The solution then consists of and the n-1 unknown elements in x.Ax = x (A I)x = 0, which is a homogeneous equation system.It has a nonzero solution if and only if the matrix (A I) is singular,

  • i.e. |A I| = 0. This polynomial in is the characteristic equation of A.Note: For a matrix A of order n, we use 1, 2, , to denote its eigenvalues

    a. 1, 2, , can be real or complex. But for a symmetric matrix A, its eigenvalues are always real numbers.

    b. If appears n > times then it is called a multiple eigenvalue and the number n is the (algebraic) multiplicity of ; if appears only once it is called a simple eigenvalue.

    Example: i) What are the eigenvalues of a diagonal matrix? ii) What are the eigenvalues of a triangular matrix?

    Although the eigenvalues are an excellent way to characterize a matrix, they do not characterize a matrix completely.Two different matrices may have the same eigenvalues.

    Can one eigenvector be associated with two distinct eigenvalues?Can two distinct vectors be associated with the same eigenvalue?Example: Prove or think of an example.

    Any linear combination of eigenvectors associated with the same eigenvalue is an eigenvector for that eigenvalue.

    Exercise: Prove this statement.

    The geometric multiplicity of an eigenvalue is the dimension of the space spanned by the associated eigenvectors. This dimension cannot exceed the algebraic multiplicity.

    Example: i) Find the eigenvalues and eigenvectors for the following two matrices:

    a.

    1 3 00 1 02 1 5

    b.

    3 0 0

    0 0 6

    0 6 1

    ii) For the matrix in a., what are the (algebraic) multiplicity and thegeometric multiplicity of each eigenvalue? What about the matrix in b.?

    iii) Are the eigenvectors for each matrix linearly independent? Are they orthogonal?

    Eigenvectors associated with distinct eigenvalues are linearly independent, but not necessarily orthogonal.

    If O is an eigenvalue of A , tO is an eigenvalue of tA , with the same eigenvector(s).If x is an eigenvector of A , t x ( 0t z ) is also an eigenvector of A , associated with the same eigenvalue.

  • Do A and Ac have the same eigenvalues? Do they have the same eigenvectors? Either provide a proof or a counterexample.

    A is nonsingular if and only if all its eigenvalues are nonzero.Example: Prove this theorem.

    [Approximate inverse of singular matrix] If A is singular, then there exists a scalar 0H z such that A IH is nonsingular.Example: Prove this theorem.

    Let O be a simple eigenvalue of a square matrix A , so that x xA O for some eigenvector x. If A and B commute, x is an eigenvector of B too.Example: Prove this theorem.

    Similarityz Two matrices are of the same dimension and the same rank are said to be

    equivalent.If A and B are equivalent matrices, then there exist nonsingular matrices E and F such that B EAF .

    z When, in addition, A and B are square and there exists a nonsingular matrix T such that 1T AT B , then they are said to be similar.

    Example: Prove the claim that similar matrices have the same set of eigenvalues. Do they have the same set of eigenvectors as well?

    Properties:For an n nu matrix A with eigenvalues 1, 2, , ,z

    1| | n iiA O

    zkA has eigenvalues 1 2, , ,

    k k knO O O

    z1

    ( ) n iitr A O Properties for symmetric matrix A :

    z The eigenvalues are all realz Eigenvectors associated with distinct eigenvalues are orthogonal.

    (not sufficient and necessary)z The eigenvectors span z The rank is equal to the number of nonzero eigenvalues

    Example:a. Find a square matrix that does not possess this property.b. Prove the theorem that the rank of any matrix A equals the number of

    nonzero eigenvalues of A Acz The matrix can be diagonalized

  • A matrix A can be diagonalized if there exists a matrix T such that 1T AT / , where / is a diagonal matrix containing the eigenvalues of A

    The factorization theorems try to diagonalize the matrices. If a matrix cannot be diagonalized, we ask how close to a diagonal representation we can get.Because of the central role of factorization theorems, let us list them below:If A is an m nu matrix of rank r , thenz A BCc with m rB u and n rC u both of rank r .

    z ( ,O)rEAF diag I with E and F nonsingular.

    z [QR Decomposition]

    (when )A QR r n with * nQ Q I and R upper triangular matrix with

    positive diagonal elements; if A is real, then Q and R are real as well.

    z [Singular Value Decomposition]*A U V 6 with m mU u and *n nV u unitary, m nu6 rectangular diagonal with

    nonnegative real numbers on the diagonal. The diagonal entries of 6 are known as the singular value of A . The mcolumns of U and the n columns of V are called the left-singular vectors and right-singular vectors of A.

    If A is a square matrix of order n, then z [Schur Decomposition]

    *P AP M with P unitary and M upper triangular.z [Spectral Theorem]

    *P AP / (diagonal) with P unitary, if and only if A is normal.z [Spectral Decomposition]

    P APc / (diagonal) with P orthogonal, if A is symmetric. z P APc / (diagonal) and P BP Mc (diagonal) with A and B

    symmetric and P orthogonal, if and only if A and B commute.

    If A is a square matrix of order n, then also

    z [Jordan Decomposition] 1T AT J (Jordan matrix) with T nonsingular.

    z1T AT / (diagonal) with T nonsingular, if A has distinct eigenvalues.

    z1 ( )T AT diag A with T unit upper nonsingular, if A upper triangular

    with distinct diagonal elements.z

    1T AT / (diagonal) and 1T BT M (diagonal) with T nonsingular, if A has only simple eigenvalues and commutes with B .

  • Example:a. [Singular Value Decomposition]

    Consider the 4 5u matrix1 0 0 0 20 0 3 0 00 0 0 0 00 4 0 0 0

    A

    A singular value decomposition of this matrix is given by *U V6

    It can be verified that * 4UU I and * 5VV I .

    In fact, this particular singular value decomposition is not unique. Choosing Vsuch that

    is also a valid singular value decomposition.b. [Spectral Decomposition]

    Consider the symmetric matrix 1 2 2

    2 2 3A

    ,

    | | (1 )(3 ) 8 ( 1)( 5) 0A IO O O O O 1 21, 5O O

    For 1O , 1 22 1x x x 2x x ( , )3 3

    A O c

    For 5O , 2 11 2x x x 2x x ( , )3 3

    A O c

    2 1 2 11 2 2 1 03 3 3 3

    0 51 2 2 2 3 1 23 3 3 3

    P AP

    c /

    .

  • Quadratic forms and definite matrices

    Many optimization problems involve double sums of the form 1 1

    n n

    i j iji j

    q x x a

    .

    This quadratic form can be written as x xq Ac , where A is a symmetric

    matrix. In general, q may be positive, negative, or zero; it depends on A and x. There are some matrices, however, for which the sign of q will be determined regardless of x. For a given matrix A,

    z If x x ( ) 0Ac ! for all nonzero x, then A is positive (negative) definite.

    z If x x ( ) 0Ac t d for all nonzero x, then A is nonnegative definite or positive

    semidefinite (nonpositive definite).z Let A be a symmetric matrix,

    a. If all eigenvalues of A are positive (negative), then A is positive definite (negative definite).

    b. If some of the roots are zero, then A is nonnegative (nonpositive) definite if the remainder are positive (negative).

    c. If A has both negative and positive roots, then A is indefinite.Example: Use the Spectral Decomposition to explain.Note: The if part is in fact if and only if.

    z If A is positive definite, then | | 0A ! . {What if A is negative definite?}

    z If A is positive definite, so is 1A .z The identity matrix is positive definite.

    z If n KA u (n > K) is of full rank, then A Ac is positive definite and AAc is

    nonnegative definite.z If A is positive definite and B is a nonsingular matrix, then B ABc is

    positive definite.

    z Define d x x x x x ( )xA B A Bc c c .

    If d is always positive for any nonzero vector x, then A is said to be greater than B . We write A B! . Or A B is positive definite.a. Suppose 1 2 nO O O! ! ! are the eigenvalues of A , 1 2 nP P P! ! !

    are those of B . If , 1, ,i i i nO Pt , then A B is nonnegative definite.

    b. If A B! , then 1 1B A ! .

    Real Powers of a Positive Definite Matrix

    For a positive definite matrix , r rA A P Pc / , for any real number r .

  • {What if nonnegative definite?} If A is nonnegative definite, then the powers can only be defined for 0r t .

    Cholesky DecompositionLet A be a Hermitian, positive-definite matrix, then A can be decomposed as

    *A LL , where L is a lower triangular matrix with strictly positive diagonal entries, and *L denotes the conjugate transpose of L .The Cholesky decomposition is of great importance in terms of numerical computation, especially for solving the system of linear equations.

    An idempotent matrix, A , is one that is equal to its square, that is, 2A A .

    All of the idempotent matrices we shall encounter are symmetric, though, idempotent matrices can be either symmetric or not.z Properties: Let A be a symmetric idempotent matrix,

    a. I A is also a symmetric idempotent matrix.b. All the eigenvalues of A are 0 or 1.

    c. ( ) ( )rk A tr A .

    d. A is nonsingular A I .Example: Prove a. d.

    z A useful idempotent matrix

    Recall the solution for the least squares problem 1( )b X X X yc c , where

    n KX u ( n K! ) is of full rank and y is a vector of order n.

    We define the projection matrix H as 1( )H X X X Xc c .

    a. Show that H is a symmetric idempotent matrix.

    b. What is ( )rk H ?

    c. Show that I H is a symmetric idempotent matrix.d. What is ( )rk I H ?

    Some results concerning eigenvalues

    Let A be a symmetric n nu matrix with eigenvalues 1 2 nO O Ot t t . The associated eigenvectors are denoted by x1, x2, , x. Let x be a point on the unit sphere, that is, xx = .a. ax xx = 1 obtained when x = x1b. in xx = obtained when x = xc. ax,,, xx = 1 obtained when x = x1

  • 2.6 Calculus and Matrix Algebra

    Scalar function ( )f x of a scalar x

    A variable y is a function of another variable x , say, ( )y f x , if each value

    of x is associated with a single value of y . In this relationship, y and x are

    respectively labeled the dependent variable and the independent variable.

    Assuming that the function ( )f x is continuous and differentiable, we obtain the

    following derivatives: 2

    2( ) , ( ) ,dy d yf x f xdx dx

    c cc and so on.

    z The addition rule, product rule, quotient rule, etc.

    z The chain rule:

    If ( ) ( ) ( ( ))h x g f x g f x $ , then ( ) ( ( )) ( )h x g f x f xc c c .

    Example: If 2 cos( ) sin(2 )xh x x e , what is ( )h xc ? What is ( )h xcc ?

    Scalar function (x)f of a vector x

    We can regard a function 1 2( , , , )ny f x x x as a scalar-valued function of a

    vector, that is, (x)y f .

    z The vector of partial derivatives, or gradient vector, or simply gradient, is

    1

    2(x)x

    n

    yx

    yf x

    yx

    w w

    w w w w w w

    .

    Also we have 1 2

    (x)x n

    f y y yx x x

    w w w w w w w cw .

  • For a matrix [ ]m n ijA au , the derivative is defined as

    11 12 1

    21 22 2

    1 2

    ( ) ( ) ( )

    ( ) ( ) ( )( )

    ( ) ( ) ( )

    n

    n

    m m mn

    f A f A f Aa a a

    f A f A f Af A a a a

    A

    f A f A f Aa a a

    w w w w w w w w w

    w w w w w w w w w w w

    .

    Note: The shape of the derivative is determined by the denominator of the derivative.

    z Commonly used derivatives

    i) (x) x yf c , then (x) yx

    fw w

    .

    ii) ( ) x yf B Bc , where the vector x is of order m, y is of order n, the matrix

    B is an m nu matrix, then ( ) xyf BB

    w c w

    .

    iii) (x) x yf Bc , then (x) yx

    f Bw w

    .

    iv) (y) x yf Bc , then (x) xy

    f Bw c w

    .

    v) (x) x xf Ac , where the vector x is of order n, the matrix A is an n nu

    matrix, then (x) ( )xx

    f A Aw c w

    .

    In particular, when A is symmetric, (x) 2 xx

    f Aw w

    .

    Example: Let 1 33 4

    A , show that (x x) 2 x

    xA Acw w

    .

    vi) ( ) log | |f A A , then 1 1( ) ( ) ( )f A A AA

    w c c w

    , which comes from the fact

    that | | ( 1) | |i j ijij

    A Aa

    w w (the cofactor expansion) and from the chain

    rule.

  • The vector function f (x) of a vector x

    z Let 1(x)

    f (x)(x)m

    f

    f

    , where x is a vector of order n, then

    1 1

    1

    1

    (x) (x)

    f (x)x

    (x) (x)

    n

    m m

    n

    f fx x

    f fx x

    w w w w w cw w w w w

    .

    More commonly, this matrix f (x)x

    wcw

    is called the Jacobian matrix.

    The Jacobian determinant is the determinant of the Jacobian matrix if m n .

    z Hessian matrix

    If we take the derivative for the gradient (x)x

    fww

    , then a second derivative

    matrix, the Hessian matrix, is computed as2 2 2

    21 1 2 1

    2 2 2

    22 1 2 2

    2 2 2

    21 2

    (x) (x) (x)

    (x) (x) (x)

    (x) (x) (x)

    n

    n

    n n n

    f f fx x x x xf f f

    H x x x x x

    f f fx x x x x

    w w w w w w w w w w w w w w w w w w w w w w w w

    .

    In general, H is a square, symmetric matrix. (The symmetry is obtained for continuous and continuously differentiable functions from Youngs theorem.)

    z The chain rule:

    Let (x) (f (x))h g , where 1(x)

    f (x)(x)m

    f

    f

    , then

    (x) (f (x)) f (x)x f x

    h gw w w c c cw w w .

  • Example 1: Consider the following problem ax R = ax xAx, where

    a = (5 4 ) and A= 3 3 3 5

    .

    Example 2: The least squares problem: Suppose we have y = Xb + e, where y and e are both vectors of order n, b is

    a vector of order K, X is thus an n Ku matrix. Solve for b = arin ee = arin (y Xb)(y Xb).