PS699: STATISTICAL METHODS II Quick Reviews of Matrix Algebra

75
Page 1 of 75 PS699: STATISTICAL METHODS II Quick Reviews of Matrix Algebra & Basic Calculus

Transcript of PS699: STATISTICAL METHODS II Quick Reviews of Matrix Algebra

Microsoft Word - ps699.1.MathReviews.docxPage 2 of 75
1. Scalar:
a) Definition: a plain-old, ordinary, every-day, regular number
b) Standard Notation: usu. plain or italic lower-case (Greek or Roman): a, α, b, β; a, α, b, β
c) Examples:
(1) a = 7, α = 7.3, b = 4/5, β = .77378, aj=3, aij=4.125
(2) US Population on 1/4/60 at noon, Number of countries in Africa on 1/12/97, etc.
2. (Column) Vector:
b) Standard Notation:
(1) Usually written in bold, lower-case, Roman or Greek letters: a, b, β
(2) Sometimes written as lower-case letter with tilde (~), line (–), or caret/hat (^), under or over
(3) Occasionally omitted once clearly established what’s scalar, vector, matrix, function, etc.
c) Examples:
1 1
a) Definition: a horizontal array of numbers;
b) Standard Notation: as column vector, except w/ prime ( a′ ) or superscipt T ( Ta )
c) Examples: [ ] 2 121 3 5 3a a a′ ′ ′= = = , [ ]0 1′ =β , [ ] 2 121.4 2 .0035 2= − = = −T T Tb b b
(1) NOTE: since b12 and the like are scalars, sometimes written italics, like b12
1 2
x x x ×
a) Definition: a rectangular array of numbers
(1) Thus, can view as set row vectors stacked vertically or column vectors horiz’ly (concatenated)
(2) Usually some substantive interpretation to rows and columns (e.g., see data matrix below).
b) Standard Notation:
(1) Usually denoted by upper-case letter, Greek or Roman, usually in bold: A, Σ, , B, etc.;
(2) Can also refer to matrix as the set, { } , of its elements, e.g., { }ija=A , or by referencing a
characteristic element of the matrix, e.g., ija = A .
(3) Vertical, horizontal, and/or diagonal ellipses often used to denote generic ranges of elements or repeated elements of a particular defined matrix (see below).
c) Examples:
21 22
1 2
X
X
d) NOTE: scalars and vectors are subset/special-cases (1×1 and k×1 or 1×k) of matrices.
5. Data Matrix: a matrix of data (duh).
a) Each column is a variable, usu. indexed j=1…k or j=0…k, or k=1…K or k=0…K (& therein lies no small amount of confusion!)
b) Each row is an observation on (each & all of) those variables, usu. i=1…n or i=1…N.
Page 6 of 75
6. Elements of a matrix: the scalars that comprise a matrix.
a) Thus, e.g., in a Data Matrix each element is an (1) observation on a (1) variable.
b) Since each element is a scalar, we can write it as such, e.g.: a, α36, etc. (see above).
c) Elements’ positions or Coordinates in a matrix indexed by subscripts, usu. i for row, j or k for column: aij or aik, sometimes separated by comma: ai,j or ai,k (see above).
d) Can refer to elements by their position such as “the ijth element of B”; in a data matrix, this might be said “the ith observation” or “the nth observation on k.”
7. (Prime) Diagonal of a Matrix: formally, the set of elements aij where i=j; informally, those on the diagonal from top-left to bottom-right of the matrix.
a) The off-diagonal runs bottom-left to top-right, less-often substantively important, so term often used generically for any element off the (prime) diagonal.
b) First-minor diagonal: the elements on the diagonal just below (lower first-minor) or just above (upper first-minor) the prime diagonal. Terms second-minor and so on also exist.
Page 7 of 75
i n n o
O



8. Dimensions of a matrix: number of rows (usu. n) and columns (usu. k)
a) Written n×k; read “n by k” as in “X is an n by k matrix...” (rows first: roman catholic)
b) NOTE: column vectors are n×1 matrices, row vectors 1×k matrices. (So, scalars are?)
c) Examples: from above, A, B, Ω are each 2×2; X is 5×(k+1); b is 3×1, and b' is 1×3.
9. Special Types of Matrices: (the types are ordered as nested special cases)
a) Square Matrix: Matrix w/ n=k, i.e., w/ equal # rows & columns. E.g.: A, B, Ω, Σ,
Page 8 of 75
( )4 4
×
− − − − =
Z
b) Symmetric Matrix: formally, square matrix A w/ aij=aji, or, equivalently, matrix A such that A=AT (see transposition below); informally, matrix w/ elements above prime diagonal a “reflection” of those below it. Matrix appears “mirrored” about its diagonal.
(1) Variance-Covariance Matrices are symmetric; (must be; why?); In fact, any matrix'matrix such as X'X is symmetric; (must be; why?).
(2) Examples:
( ) ( )



c) Diagonal Matrix: formally, symmetric matrix w/ aij=0 ∀ i≠j; informally, symmetric matrix w/ only its diagonal elements (possibly) non-zero. Example: Pure heteroskedastic error v- cov matrices are symmetric (as all v-cov) & diagonal but not scalar matrices (see below).
( )4 4
1 .5 4 1.97 .5 .25 .1 1.07 4 .1 0 1.11
1.97 1.07 1.11 16 ×
, , ,
, , ,[{ }]
V E E E E
E E E E
ε ε ε ε ε ε ε
ε ε ε ε ε ε εε ε ε ε ε ε ε ε ε
ε ε ε ε ε ε ε

= = = ≡
( )4 4
×
− =
0 0 0 0



d) Scalar Matrix: formally, diagonal matrix with aii=ajj ∀ i,j; informally, diagonal matrix w/ diagonal elements all the same number (scalar). Example: Homoskedastic error v-cov mats are symmetric, diagonal, and scalar:
( )4 4
×
=
0 0 0 0
e) (Multiplicative) Identity Matrix:
(1) Definition: formally, matrix w/ aii=1 ∀ i & aij=0 ∀ i≠j; informally, scalar matrix w/ scalar of 1, i.e., ones on diagonal & zeros elsewhere. (multiplicative usu. omitted b/c additive identity matrix—a matrix with all elements equal to zero—rarely used)
Page 10 of 75
(2) Standard Notation: usu. I, often w/ its dim (symmetric so 1 dim suffices) subscripted, In;
4
=
I
(a) NOTE: i is column-vector of ones; not to be confused w/ an identity vector of any sort. In fact, inner-product (see below) i'x is the sum of the elements of x. Its dim. also often sub’d.:
[ ][ ] 4
4 1
; . ., 1 3 5 7 1 1 1 1 1 3 5 7 16j j
e g =
x x NT NT= =
′= = ⊗ i i x
(b) NOTE: What’s the additive-identity scalar? Multiplicative-identity scalar? So, I is, colloquially, like “the matrix equivalent of 1,” and you will often hear me call it that.
f) Triangular Matrix: formally, square matrix w/ aij=0 ∀ i>j or aij=0 ∀ j>i; informally, square matrix w/ all zeros either above or below the diagonal. If all-zeros above, then lower-triangular; if all-zeros below, then upper-triangular. (Diagonal can be anything.)
Page 11 of 75
1.5 0 0 0 3 2.2 0 0 0 1 0 0 4 4 2 1.12
lower
T
0 1.5 1 1 0 0 3.5 2 0 0 0 5.5 0 0 0 1
upper
B. Operations: transpose, add (subtract), multiply, inverse. 1. Comparing Matrices: Generally, compare matrices element by element.
a) Equality & Inequality: Element-by-element comparison matrices are sequal iff every element in one equals corresponding element in other, which implies, e.g., that must be same dimensions. All else: unequal. Formally, A=B ⇔ aij=bij ∀ i,j; else: A≠B.
1 2 1 3 1 4 1 4 ; ; ;
3 4 4 2 3 2 3 2
= = = = ≠ ≠
A B C D A B C = D
b) Greater & Less Than: Also element-by-element comparison A>,<,≥,≤B ⇔ aij>,<,≥,≤bij ∀i,j; else: AÝ,Û,Þ,ÜB.
1 2 1 2 0 3 1 4 ; ; ; ( ); ( )
3 4 3 6 2 1 3 2
= = = = ≤ ≥ <
A B C D A B B A C D D > C
c) Matrix positive⇔all elements >0, negative ⇔ all elements <0, weakly positive (non-negative) ⇔ ≥0, weakly negative (non-positive) ⇔ ≤0; all these are also “for all elements, else not”.
Page 12 of 75
d) Positive & negative definite; positive & negative semi-definite. Note: A must be (square and) symmetric to be any of these things.
(1) A is positive definite ⇔ x'Ax>0 ∀x≠0. (Terminology: quadratic form= x'Ax= 1 1
n n
x x a = = )
(2) A is negative definite ⇔ x'Ax<0 ∀x≠0.
(3) A is positive semi-definite ⇔ x'Ax≥0 ∀x≠0. (a.k.a. non-negative definite)
(4) A is negative semi-definite ⇔ x'Ax≤0 ∀x≠0. (a.k.a. non-positive definite)
(5) If none of the above: indefinite. So what?
(a) If A definite, then |A|≠0, and so A-1 exists. (What’s |A|? What’s A-1? So what? See below.)
(i) If A (positive/negative) (semi-)definite, then |A|≠0 (>,<,≥,≤0, respectively, intuitively).
(b) Many regression quantities of interest have form x'Ax. Examples:
(i) With x a vector of coefficient estimates, and A their estimated variance-covariance matrix, x'Ax is the estimated variance of the sum of those coefficients. Variance-covariance matrices ∴ must be positive definite.
(ii) With x a vector of variable values and b their associated coefficients, ( ) ˆ( ) ( )V V V′ ′= =x b y x b x ; again, the matrix
V(b), the variance-covariance matrix of b ∴ must be positive definite.
2. Transposition: Intuitively: Flip matrix along its axis, making each column into the corresponding row (column one becomes row one, etc.) & each row into col.
a) Standard Notation: A' or AT; note: (col.) vectors transpose to row vectors: a' or aT & v.v.
b) Formal Definition: Z≡X' ⇔ zij=xji ∀ i,j; i.e., transposition swaps each element’s index.
Page 13 of 75
1 5 1 2
1 3 5 2 6 1 2 3 4 3 4 ;
2 4 6 3 7 5 6 7 8 5 6
4 8
× × × ×
× × × ×
= = = =
= = = =
′ ′
′ ′
a) Matrices add (& subtract) element-by-element matrices must have same dimensions to be conformable for addition: A & B conformable for addition ⇔ dim(A)=dim(B).
(1) Exception 1: may add scalar to anything, element-by-element: b+A=[aij+b]; example:
(2) Exception 2: some may allow adding a vector to a matrix w/ same number of columns or of rows, row-by-row or column-by-column.
b) Formally, adding or subtracting element by element A±B≡[aij±bij]; examples:
(a) Exception 1:
ij
1 3 1 2 3 2 3 5 2 a 2 ;such as: 2
2 4 2 2 4 2 4 6 + +
+ = + ≡ + = = + + A A A
Page 14 of 75
(b) Exception 2: some scholars may allow (n×1)+(n×k) and (1×m)+(n×m) as (row- and column-)conformable,
yielding: 1 3 2 4

A , a'=[1 2] ⇒ a'+A= 2 5 3 6
, a+A= 2 4 4 6
(2) More examples follow on next slide...
c) Subtraction is defined as multiplying the latter matrix by -1 then adding, which means it also conducted element-by-element.
d) Matrix 0’s is additive identity matrix, 0 or 0n, b/c A+0=A ∀A (assumes conformable)
e) Matrix Addition Properties:
(1) Commutative: A + B = B + A
(a) Proof: A + B = [aij] + [bij] = [aij + bij] = [bij + aij] = [bij] + [aij] = B + A
(2) Associative: (A + B) + C = A + (B + C)
(a) Proof: (A + B) + C = [aij + bij] + [cij] = [aij + bij + cij] = [aij] + [bij + cij] = A + (B + C)
(3) Transposition is Distributive over Addition: (A + B)' = A' + B' (prove in pset1)
Page 15 of 75
4. Matrix Multiplication (and “Division”):
a) Just like in scalar multiplication, absence of an operation sign implies multiplication.
b) Scalar Multiplication:
(1) You already know all about multiplying two or more scalars.
(2) A scalar times a vector or matrix: scalar multiplies every element.
c) Vector Multiplication: Inner Products
(1) written or or ′ ′⋅ ⋅a b a b a b
Page 16 of 75
... n
a b a b a b a b =
′ = + + + =a b
(a) inner-product multiplication is commutative: ′ ′=a b b a
(b) ′a a =“sum of squares”, i.e. sum squared elements of a
(4) Examples:
4 1
; . ., 1 3 5 7 1 1 1 1 1 3 5 7 16j j
e g =
′= = i x
d) Matrix Multiplication: first, since vectors are just special cases of matrices, we already know that matrix multiplication must work something like inner products
(1) Formal definition: 1 1 2 2 3 3 1
... J
ik ij jk i k i k i k iJ Jk j
c a b a b a b a b a b =
= ⇔ = = + + + +C AB . I.e., each Cij
element is an inner product of row i of A and column j of B.
(2) ⇒ for two matrices to be conformable for multiplication, number of columns in first must equal number of rows in second (k=m in their dimensions, n×k, m×q); & result will be n×q.
(3) An Informal Recipe for Solutions to Matrix Multiplication Problems
(a) Start by noting dimensions of the matrices to be multiplied: (a × b)(c × d).
(i) first, b must equal c; if not, the matrices are not conformable and it can’t be done so you’re done
(ii) second, if b=c, then the matrix solution will be (a × d). Draw an (a × d) matrix box for the answer.
Page 17 of 75
(b) Then, the ijth element of the solution is the inner product of the ith row of the first matrix and the jth column of the second. Fill in the answer element-by-element.
(4) Examples:
(a) From definition, can see why any matrix'matrix symmetric: just reverse order of same set of scalar multiplications in element ij v. ji: just x1'x2 v. x2'x1 as {12}v.{21}.
(b) Note if x & y mean-deviated, x x− & y y− , then 1st example, x'y, is covariation(x,y) & 2nd example, X'X, is variation-covariation matrix of columns of X.
(5) Some Properties and Facts (assuming conformability):
(a) When (n×k) is multiplied times (k×m), you get an (n×m): (n×k)(k×m)⇒(n×m)
(b) Pre-Multiplication and Post-Multiplication are different and may not both exist!
(i) B pre-multiplied by A is AB.
Page 18 of 75
(ii) B post-multiplied by A is BA.
(c) Not Commutative: AB does not necessarily equal BA (may not even both exist). EXAMPLES:
(d) Associative: (AB)C = A(BC). EXAMPLES
Page 19 of 75
(e) Distributive: A(B+C)=AB+AC (not commutative, so order matters; BA & CA may not even exist): e.g…
(f) Identity:
(g) Transposition is Distributive in Reverse Order: (AB)' = B'A' …a worthwhile exercise to prove ;)
5. More Terms & Operations…
a) Idempotent: Matrix A is Idempotent iff AA=A (implies A must be square); if A is symmetric, then A'A and A'A' and AA' are also A (since A'= A). E.g., I is idempotent.
Page 20 of 75
b) Trace: . Some useful prop’s; e.g., can cycle elements inside back to front or v.v. trace(ABC)=trace(BCA)=trace(CAB).
c) Kroenecker Product: post-multiply each element of first matrix by entire second matrix (k×l)⊗(m×n)=(km×ln) like so:
( ) 1 1
x x NT NT= =
′= = ⊗ i i x
d) Three central regression matrices: (n.b., M & N are both symmetric & idempotent)
( ) 1−′ ′≡A X X X , ( ) 1−′ ′≡ =N XA X X X X ( ) 1−′ ′≡ − = −M I N I X X X X ˆ ; ; LS LS LS= = =Ay b Ny y My e
Page 21 of 75
e) Writing Sets of Equations in Matrix Notation (big part of The Whole Point, really. Einstein once noted all advancement in mathematics is advance in notation.)
0 1 2 3
1 0 1 1 1 2 1 3 1 1 1 0 1 2 3
2 0 2 1 2 2 2 3 2 2 2 0 1 2 3
3 0 3 1 3 2 3 3 3 3 3
0 1 2 3 0 1 2 3
...
...
... ... ...
...
y x x x x x
y x x x x x
y x x x x x
y x x x x x
β β β β β ε β β β β β ε β β β β β ε
β β β β β ε
= + + + + + +
= + + + + + +
= + + + + + + = = + + + + + +
0 1 2 3 1 11 1 1 1 1
0 1 2 3 2 22 2 2 2 2
0 1 2 3 3 0 1 2 3 33 3 3 3 3
0 1 2 3
k n nn n n n n
y x x x x x y x x x x x y x x x x x
y x x x x x
ε ε
ε
n k n n nn k× + × × ×× + ×
= + = +y β ε Xb eX
Page 22 of 75
Page 23 of 75
(1) Rank: Rank of a matrix is # of linearly independent columns. Full column-rank: all columns linearly independent. Why care?
(a) Rule #1: Can’t have more dimensions (higher rank) than columns; recall, each column contains coordinates of a point. Can’t have more dimensions than points.
(b) Rule #2: Can’t have more dimensions (higher rank) than rows; recall, each row provides the actual values for those coordinates. Can’t have more dim’s than you give values to.
(c) Rules 1 & 2 Rule #3: Rank(A) ≤min{rows(A),cols(A)}
(d) Rule #4: Rank(AB) ≤min{Rank(A),Rank(B)} : Since AB is just linear combination of A and B, it can’t create any linearly independent information. Thus, it can span no more dimensions (provide no more information) than the least of those two. It could provide less information if some column(s) in A are linearly dependent on some column(s) in B or vice versa.
Page 24 of 75
(2) Determinant: How check if matrix full column-rank? If not, its determinant is zero, so the matrix inverse will not exist and the matrix will be said to be singular (see below).
Page 25 of 75
(3) Something approaching an intuition for why |A|=0 if not full-column rank…
Page 26 of 75
(4) Some special properties of determinants of diagonal matrices (only), D or C:
1( )
(5) Opposite the perfect colinearity that collapses determinants (hypervolumes) to zero is…
g) Orthogonality: vectors and are ortogonal 0′⇔ ⊥ ⇔ =a b a b a b .
(1) Why? i i i a b′ = Σa b , so the more a & b “go together” (positively), the larger this is; the more
they relate oppositely (negatively), the more negative. If no relation, i.e., orthogonal, then 0.
h) Orthogonal Projections: Suppose now we have 2 dimensions of information & we’d like to summarize it in 1 dimension. Or, suppose we have some y and we’d like to express it as nearly as possible using some other vector of information x (perhaps x is more readily available), but we want to keep everything linear (because curved lines gives us a headache). Want linearly rescale info in x to get as close as possible to y:
Page 27 of 75
(1) Don’t look now, but we just derived the least-squares formula for b in the bivariate case. Unfortunately, this particular solution only works when x'x is a scalar. (When will and won’t it be?) Suppose, instead, we have X'X, a matrix; how could we solve the analogous problem?
(2) That is, suppose that now we have k columns of information in X (i.e., X is of full rank k) and that we’d like to use all of this information to get as close as possible to the vector y.
i) DEF: homogenous equation-system and nonhomogenous equation-system
(1) A homogenous system of equations is of the form Ax=0.
(2) A nonhomogenous system of equations is of the form Ax=b, where b is a nonzero vector.
Page 28 of 75
That (regression as a projection problem; scaling the info in X by b to get as close as possible to y) was so fun, let’s do it again!
Page 29 of 75
Page 30 of 75
Page 31 of 75
Repeating the steps from solving systems of equations thru deriving the LS coefficient formula:
Page 32 of 75
Page 33 of 75
Page 34 of 75
Page 35 of 75
Some things we can already say about A-1:
But, returning to our main goal, we’re looking for B≡A-1 such that BA=I...
Page 36 of 75
j) Matrix Inversion: BA=I=AB ⇔ B=A-1 , A=B-1 (...the following is repeated on next)
Page 37 of 75
Page 38 of 75
Page 39 of 75
k) For a diagonal matrix, D, the inverse is simply:
1 11
-
-
-
é ù ê ú ê ú= ê ú ê ú ê úë û
D
l) Additional notes on “the updating formula” for how (X'X)-1 changes as add rows (i.e., obs) to X
in Greene A.4.2 (eq’s A-66 & assoc. para.).
Page 40 of 75
Page 41 of 75
Again, what I call the Goldberger matrices (b/c that’s where I met them):
Page 42 of 75
m) Further Useful Topics:
Page 43 of 75
Page 44 of 75
Page 45 of 75
Page 46 of 75
(2) Cholesky decomposition: Any symmetric, positive-definite A, expressible as product of lower-
triangular, L, & upper-triangular, L'=U; so A=LU. Useful for A-1=L-1U-1 & to find “A-1/2”.
(3) Working with partitioned matrices… See Greene section A.5.
Page 47 of 75
11 22 22
(b) In general: 11 12 1 1
22 11 12 22 21 11 22 21 11 12 21 22
− −= × − = × − A A
A A A A A A A A A A A A
(5) Partitioned Inverses:
1 11
-
-
-
é ù ê ú ê ú= ê ú ê ú ê úë û
D
11 11 1
(c) In general:
( ) ( ) ( )
1 1 1 1 11 12 1 11 12 2 21 11 11 12 2
1 21 22 2 21 11 2
( ) ,
− − − − −

− −− − −
+ − = = −
= − = = −
A A A I A F A A A A F A
A A F A A F
F A A A A A F A A A A
Not that these last “in generals” are so terribly intuitive, but useful have them at hand…
Page 48 of 75
(6) Useful sections of Greene, Appendix A (Matrix Algebra) not explicitly or fully covered here:
[ ]
of and columns of ), called - .
kl k l kl
k l cross product
X
(I.e., sum matrices of products & cross-products : var-covar matrix )
i ii
¢ ¢=
¢´ ´
x x
(b) A.2.8 very useful symmetric, idempotent mat, M0, which mean-deviates what it multiplies:
1 1 1
1 1
n n n
é ù ê ú ê ú ê úé ù¢º - = = - =ê ú ê úë û ê ú ê ú ê úë û


(i) Convince yourself that this symmetric matrix is 1-1/n on diagonal & -1/n for all off-diagonal elements.
(ii) Convince yourself that M0x= -x x and also, so, e.g., that M0i=0.
(c) Definition A.10 length (or norm) of a vector, x: ¢=x x x
(d) Cosine Law: The angle, θ, between two vectors, a and b, satisfies cosq ¢
= a b a b
Page 64 of 75
If A not symmetric, then: / ( )¢ ¢¶ ¶ =x Ax x A + A x . A few other rules follow that Greene eq. A-132).
Page 65 of 75
Page 66 of 75
( ) ( )
( )
2First-Order Condition: 0
is not ( ), nor is ; only is, so: 0 2 2 0
2
n
n n
n i i i i
i
n
y x b y y x b x b
y y x b x b b b b
y f b x b y x x b
x
Min
i i i i i i i i i
n n b y x x b y x i i i
i i b y x x
= = =
− = = =

( ) ( ) ( )
( )
, where
e e e y Xb
y Xb y Xb y y b X y y Xb b X Xb
y y b X y y Xb b X Xb b
y y b X y y Xb b X Xb
( ) (And Again!!)
X y X y X Xb X Xb X y
b X X X y
Page 69 of 75
( ) ( )
( ) ( )
( )
1
21 1 2
Matrix version:
i i i i
i
y x x b b
y y x b x b y x x b x
b b
=
=
= =
∂ − +
= + ∂
∂ − + ∂ + = = >
∂ ∂
′ ′ ′ ′ ′ ′ ′∇ − − + = −




′ ′− + ′ ′ ′ ′∇ − − + =b
X y X y X Xb X X
Page 70 of 75
One more time! This time also highlights the first-order condition “normal equations”, which demonstrate what we had also from the geometry of the problem: that y-xb is orthogonal to x.
Page 71 of 75
This time in matrix algebra highlights the dimensions of the problem & shows explicitly what derivative rules being used:
Page 72 of 75
Page 73 of 75
Lagrange-Multiplier (LM) tests based on d(ln(L))/dβ at unconstrained vs. constrained (though often necessary estimate only the constrained b/c know solution unconstrained…).
Also based on test λ*=0 for suitably constructed auxiliary regression s.t. coefficients=multipliers.
Page 74 of 75
Taylor Series (Linear) Approximations
Page 75 of 75