Transcript of PS699: STATISTICAL METHODS II Quick Reviews of Matrix Algebra
Microsoft Word - ps699.1.MathReviews.docxPage 2 of 75
1. Scalar:
a) Definition: a plain-old, ordinary, every-day, regular
number
b) Standard Notation: usu. plain or italic lower-case (Greek or
Roman): a, α, b, β; a, α, b, β
c) Examples:
(1) a = 7, α = 7.3, b = 4/5, β = .77378, aj=3, aij=4.125
(2) US Population on 1/4/60 at noon, Number of countries in Africa
on 1/12/97, etc.
2. (Column) Vector:
b) Standard Notation:
(1) Usually written in bold, lower-case, Roman or Greek letters: a,
b, β
(2) Sometimes written as lower-case letter with tilde (~), line
(–), or caret/hat (^), under or over
(3) Occasionally omitted once clearly established what’s scalar,
vector, matrix, function, etc.
c) Examples:
1 1
a) Definition: a horizontal array of numbers;
b) Standard Notation: as column vector, except w/ prime ( a′ ) or
superscipt T ( Ta )
c) Examples: [ ] 2 121 3 5 3a a a′ ′ ′= = = , [ ]0 1′ =β , [ ] 2
121.4 2 .0035 2= − = = −T T Tb b b
(1) NOTE: since b12 and the like are scalars, sometimes written
italics, like b12
1 2
x x x ×
a) Definition: a rectangular array of numbers
(1) Thus, can view as set row vectors stacked vertically or column
vectors horiz’ly (concatenated)
(2) Usually some substantive interpretation to rows and columns
(e.g., see data matrix below).
b) Standard Notation:
(1) Usually denoted by upper-case letter, Greek or Roman, usually
in bold: A, Σ, , B, etc.;
(2) Can also refer to matrix as the set, { } , of its elements,
e.g., { }ija=A , or by referencing a
characteristic element of the matrix, e.g., ija = A .
(3) Vertical, horizontal, and/or diagonal ellipses often used to
denote generic ranges of elements or repeated elements of a
particular defined matrix (see below).
c) Examples:
21 22
1 2
X
X
d) NOTE: scalars and vectors are subset/special-cases (1×1 and k×1
or 1×k) of matrices.
5. Data Matrix: a matrix of data (duh).
a) Each column is a variable, usu. indexed j=1…k or j=0…k, or k=1…K
or k=0…K (& therein lies no small amount of confusion!)
b) Each row is an observation on (each & all of) those
variables, usu. i=1…n or i=1…N.
Page 6 of 75
6. Elements of a matrix: the scalars that comprise a matrix.
a) Thus, e.g., in a Data Matrix each element is an (1) observation
on a (1) variable.
b) Since each element is a scalar, we can write it as such, e.g.:
a, α36, etc. (see above).
c) Elements’ positions or Coordinates in a matrix indexed by
subscripts, usu. i for row, j or k for column: aij or aik,
sometimes separated by comma: ai,j or ai,k (see above).
d) Can refer to elements by their position such as “the ijth
element of B”; in a data matrix, this might be said “the ith
observation” or “the nth observation on k.”
7. (Prime) Diagonal of a Matrix: formally, the set of elements aij
where i=j; informally, those on the diagonal from top-left to
bottom-right of the matrix.
a) The off-diagonal runs bottom-left to top-right, less-often
substantively important, so term often used generically for any
element off the (prime) diagonal.
b) First-minor diagonal: the elements on the diagonal just below
(lower first-minor) or just above (upper first-minor) the prime
diagonal. Terms second-minor and so on also exist.
Page 7 of 75
i n n o
O
8. Dimensions of a matrix: number of rows (usu. n) and columns
(usu. k)
a) Written n×k; read “n by k” as in “X is an n by k matrix...”
(rows first: roman catholic)
b) NOTE: column vectors are n×1 matrices, row vectors 1×k matrices.
(So, scalars are?)
c) Examples: from above, A, B, Ω are each 2×2; X is 5×(k+1); b is
3×1, and b' is 1×3.
9. Special Types of Matrices: (the types are ordered as nested
special cases)
a) Square Matrix: Matrix w/ n=k, i.e., w/ equal # rows &
columns. E.g.: A, B, Ω, Σ,
Page 8 of 75
( )4 4
×
− − − − =
Z
b) Symmetric Matrix: formally, square matrix A w/ aij=aji, or,
equivalently, matrix A such that A=AT (see transposition below);
informally, matrix w/ elements above prime diagonal a “reflection”
of those below it. Matrix appears “mirrored” about its
diagonal.
(1) Variance-Covariance Matrices are symmetric; (must be; why?); In
fact, any matrix'matrix such as X'X is symmetric; (must be;
why?).
(2) Examples:
( ) ( )
c) Diagonal Matrix: formally, symmetric matrix w/ aij=0 ∀ i≠j;
informally, symmetric matrix w/ only its diagonal elements
(possibly) non-zero. Example: Pure heteroskedastic error v- cov
matrices are symmetric (as all v-cov) & diagonal but not scalar
matrices (see below).
( )4 4
1 .5 4 1.97 .5 .25 .1 1.07 4 .1 0 1.11
1.97 1.07 1.11 16 ×
, , ,
, , ,[{ }]
V E E E E
E E E E
ε ε ε ε ε ε ε
ε ε ε ε ε ε εε ε ε ε ε ε ε ε ε
ε ε ε ε ε ε ε
′
= = = ≡
( )4 4
×
− =
0 0 0 0
d) Scalar Matrix: formally, diagonal matrix with aii=ajj ∀ i,j;
informally, diagonal matrix w/ diagonal elements all the same
number (scalar). Example: Homoskedastic error v-cov mats are
symmetric, diagonal, and scalar:
( )4 4
×
=
0 0 0 0
e) (Multiplicative) Identity Matrix:
(1) Definition: formally, matrix w/ aii=1 ∀ i & aij=0 ∀ i≠j;
informally, scalar matrix w/ scalar of 1, i.e., ones on diagonal
& zeros elsewhere. (multiplicative usu. omitted b/c additive
identity matrix—a matrix with all elements equal to zero—rarely
used)
Page 10 of 75
(2) Standard Notation: usu. I, often w/ its dim (symmetric so 1 dim
suffices) subscripted, In;
4
=
I
(a) NOTE: i is column-vector of ones; not to be confused w/ an
identity vector of any sort. In fact, inner-product (see below) i'x
is the sum of the elements of x. Its dim. also often sub’d.:
[ ][ ] 4
4 1
; . ., 1 3 5 7 1 1 1 1 1 3 5 7 16j j
e g =
x x NT NT= =
′= = ⊗ i i x
(b) NOTE: What’s the additive-identity scalar?
Multiplicative-identity scalar? So, I is, colloquially, like “the
matrix equivalent of 1,” and you will often hear me call it
that.
f) Triangular Matrix: formally, square matrix w/ aij=0 ∀ i>j or
aij=0 ∀ j>i; informally, square matrix w/ all zeros either above
or below the diagonal. If all-zeros above, then lower-triangular;
if all-zeros below, then upper-triangular. (Diagonal can be
anything.)
Page 11 of 75
1.5 0 0 0 3 2.2 0 0 0 1 0 0 4 4 2 1.12
lower
T
0 1.5 1 1 0 0 3.5 2 0 0 0 5.5 0 0 0 1
upper
B. Operations: transpose, add (subtract), multiply, inverse. 1.
Comparing Matrices: Generally, compare matrices element by
element.
a) Equality & Inequality: Element-by-element comparison
matrices are sequal iff every element in one equals corresponding
element in other, which implies, e.g., that must be same
dimensions. All else: unequal. Formally, A=B ⇔ aij=bij ∀ i,j; else:
A≠B.
1 2 1 3 1 4 1 4 ; ; ;
3 4 4 2 3 2 3 2
= = = = ≠ ≠
A B C D A B C = D
b) Greater & Less Than: Also element-by-element comparison
A>,<,≥,≤B ⇔ aij>,<,≥,≤bij ∀i,j; else: AÝ,Û,Þ,ÜB.
1 2 1 2 0 3 1 4 ; ; ; ( ); ( )
3 4 3 6 2 1 3 2
= = = = ≤ ≥ <
A B C D A B B A C D D > C
c) Matrix positive⇔all elements >0, negative ⇔ all elements
<0, weakly positive (non-negative) ⇔ ≥0, weakly negative
(non-positive) ⇔ ≤0; all these are also “for all elements, else
not”.
Page 12 of 75
d) Positive & negative definite; positive & negative
semi-definite. Note: A must be (square and) symmetric to be any of
these things.
(1) A is positive definite ⇔ x'Ax>0 ∀x≠0. (Terminology:
quadratic form= x'Ax= 1 1
n n
x x a = = )
(2) A is negative definite ⇔ x'Ax<0 ∀x≠0.
(3) A is positive semi-definite ⇔ x'Ax≥0 ∀x≠0. (a.k.a. non-negative
definite)
(4) A is negative semi-definite ⇔ x'Ax≤0 ∀x≠0. (a.k.a. non-positive
definite)
(5) If none of the above: indefinite. So what?
(a) If A definite, then |A|≠0, and so A-1 exists. (What’s |A|?
What’s A-1? So what? See below.)
(i) If A (positive/negative) (semi-)definite, then |A|≠0
(>,<,≥,≤0, respectively, intuitively).
(b) Many regression quantities of interest have form x'Ax.
Examples:
(i) With x a vector of coefficient estimates, and A their estimated
variance-covariance matrix, x'Ax is the estimated variance of the
sum of those coefficients. Variance-covariance matrices ∴ must be
positive definite.
(ii) With x a vector of variable values and b their associated
coefficients, ( ) ˆ( ) ( )V V V′ ′= =x b y x b x ; again, the
matrix
V(b), the variance-covariance matrix of b ∴ must be positive
definite.
2. Transposition: Intuitively: Flip matrix along its axis, making
each column into the corresponding row (column one becomes row one,
etc.) & each row into col.
a) Standard Notation: A' or AT; note: (col.) vectors transpose to
row vectors: a' or aT & v.v.
b) Formal Definition: Z≡X' ⇔ zij=xji ∀ i,j; i.e., transposition
swaps each element’s index.
Page 13 of 75
1 5 1 2
1 3 5 2 6 1 2 3 4 3 4 ;
2 4 6 3 7 5 6 7 8 5 6
4 8
× × × ×
× × × ×
= = = =
= = = =
′ ′
′ ′
a) Matrices add (& subtract) element-by-element matrices must
have same dimensions to be conformable for addition: A & B
conformable for addition ⇔ dim(A)=dim(B).
(1) Exception 1: may add scalar to anything, element-by-element:
b+A=[aij+b]; example:
(2) Exception 2: some may allow adding a vector to a matrix w/ same
number of columns or of rows, row-by-row or column-by-column.
b) Formally, adding or subtracting element by element
A±B≡[aij±bij]; examples:
(a) Exception 1:
ij
1 3 1 2 3 2 3 5 2 a 2 ;such as: 2
2 4 2 2 4 2 4 6 + +
+ = + ≡ + = = + + A A A
Page 14 of 75
(b) Exception 2: some scholars may allow (n×1)+(n×k) and
(1×m)+(n×m) as (row- and column-)conformable,
yielding: 1 3 2 4
≡
A , a'=[1 2] ⇒ a'+A= 2 5 3 6
, a+A= 2 4 4 6
(2) More examples follow on next slide...
c) Subtraction is defined as multiplying the latter matrix by -1
then adding, which means it also conducted
element-by-element.
d) Matrix 0’s is additive identity matrix, 0 or 0n, b/c A+0=A ∀A
(assumes conformable)
e) Matrix Addition Properties:
(1) Commutative: A + B = B + A
(a) Proof: A + B = [aij] + [bij] = [aij + bij] = [bij + aij] =
[bij] + [aij] = B + A
(2) Associative: (A + B) + C = A + (B + C)
(a) Proof: (A + B) + C = [aij + bij] + [cij] = [aij + bij + cij] =
[aij] + [bij + cij] = A + (B + C)
(3) Transposition is Distributive over Addition: (A + B)' = A' + B'
(prove in pset1)
Page 15 of 75
4. Matrix Multiplication (and “Division”):
a) Just like in scalar multiplication, absence of an operation sign
implies multiplication.
b) Scalar Multiplication:
(1) You already know all about multiplying two or more
scalars.
(2) A scalar times a vector or matrix: scalar multiplies every
element.
c) Vector Multiplication: Inner Products
(1) written or or ′ ′⋅ ⋅a b a b a b
Page 16 of 75
... n
a b a b a b a b =
′ = + + + =a b
(a) inner-product multiplication is commutative: ′ ′=a b b a
(b) ′a a =“sum of squares”, i.e. sum squared elements of a
(4) Examples:
4 1
; . ., 1 3 5 7 1 1 1 1 1 3 5 7 16j j
e g =
′= = i x
d) Matrix Multiplication: first, since vectors are just special
cases of matrices, we already know that matrix multiplication must
work something like inner products
(1) Formal definition: 1 1 2 2 3 3 1
... J
ik ij jk i k i k i k iJ Jk j
c a b a b a b a b a b =
= ⇔ = = + + + +C AB . I.e., each Cij
element is an inner product of row i of A and column j of B.
(2) ⇒ for two matrices to be conformable for multiplication, number
of columns in first must equal number of rows in second (k=m in
their dimensions, n×k, m×q); & result will be n×q.
(3) An Informal Recipe for Solutions to Matrix Multiplication
Problems
(a) Start by noting dimensions of the matrices to be multiplied: (a
× b)(c × d).
(i) first, b must equal c; if not, the matrices are not conformable
and it can’t be done so you’re done
(ii) second, if b=c, then the matrix solution will be (a × d). Draw
an (a × d) matrix box for the answer.
Page 17 of 75
(b) Then, the ijth element of the solution is the inner product of
the ith row of the first matrix and the jth column of the second.
Fill in the answer element-by-element.
(4) Examples:
(a) From definition, can see why any matrix'matrix symmetric: just
reverse order of same set of scalar multiplications in element ij
v. ji: just x1'x2 v. x2'x1 as {12}v.{21}.
(b) Note if x & y mean-deviated, x x− & y y− , then 1st
example, x'y, is covariation(x,y) & 2nd example, X'X, is
variation-covariation matrix of columns of X.
(5) Some Properties and Facts (assuming conformability):
(a) When (n×k) is multiplied times (k×m), you get an (n×m):
(n×k)(k×m)⇒(n×m)
(b) Pre-Multiplication and Post-Multiplication are different and
may not both exist!
(i) B pre-multiplied by A is AB.
Page 18 of 75
(ii) B post-multiplied by A is BA.
(c) Not Commutative: AB does not necessarily equal BA (may not even
both exist). EXAMPLES:
(d) Associative: (AB)C = A(BC). EXAMPLES
Page 19 of 75
(e) Distributive: A(B+C)=AB+AC (not commutative, so order matters;
BA & CA may not even exist): e.g…
(f) Identity:
(g) Transposition is Distributive in Reverse Order: (AB)' = B'A' …a
worthwhile exercise to prove ;)
5. More Terms & Operations…
a) Idempotent: Matrix A is Idempotent iff AA=A (implies A must be
square); if A is symmetric, then A'A and A'A' and AA' are also A
(since A'= A). E.g., I is idempotent.
Page 20 of 75
b) Trace: . Some useful prop’s; e.g., can cycle elements inside
back to front or v.v. trace(ABC)=trace(BCA)=trace(CAB).
c) Kroenecker Product: post-multiply each element of first matrix
by entire second matrix (k×l)⊗(m×n)=(km×ln) like so:
( ) 1 1
x x NT NT= =
′= = ⊗ i i x
d) Three central regression matrices: (n.b., M & N are both
symmetric & idempotent)
( ) 1−′ ′≡A X X X , ( ) 1−′ ′≡ =N XA X X X X ( ) 1−′ ′≡ − = −M I N
I X X X X ˆ ; ; LS LS LS= = =Ay b Ny y My e
Page 21 of 75
e) Writing Sets of Equations in Matrix Notation (big part of The
Whole Point, really. Einstein once noted all advancement in
mathematics is advance in notation.)
0 1 2 3
1 0 1 1 1 2 1 3 1 1 1 0 1 2 3
2 0 2 1 2 2 2 3 2 2 2 0 1 2 3
3 0 3 1 3 2 3 3 3 3 3
0 1 2 3 0 1 2 3
...
...
... ... ...
...
y x x x x x
y x x x x x
y x x x x x
y x x x x x
β β β β β ε β β β β β ε β β β β β ε
β β β β β ε
= + + + + + +
= + + + + + +
= + + + + + + = = + + + + + +
0 1 2 3 1 11 1 1 1 1
0 1 2 3 2 22 2 2 2 2
0 1 2 3 3 0 1 2 3 33 3 3 3 3
0 1 2 3
k n nn n n n n
y x x x x x y x x x x x y x x x x x
y x x x x x
ε ε
ε
n k n n nn k× + × × ×× + ×
= + = +y β ε Xb eX
Page 22 of 75
Page 23 of 75
(1) Rank: Rank of a matrix is # of linearly independent columns.
Full column-rank: all columns linearly independent. Why care?
(a) Rule #1: Can’t have more dimensions (higher rank) than columns;
recall, each column contains coordinates of a point. Can’t have
more dimensions than points.
(b) Rule #2: Can’t have more dimensions (higher rank) than rows;
recall, each row provides the actual values for those coordinates.
Can’t have more dim’s than you give values to.
(c) Rules 1 & 2 Rule #3: Rank(A) ≤min{rows(A),cols(A)}
(d) Rule #4: Rank(AB) ≤min{Rank(A),Rank(B)} : Since AB is just
linear combination of A and B, it can’t create any linearly
independent information. Thus, it can span no more dimensions
(provide no more information) than the least of those two. It could
provide less information if some column(s) in A are linearly
dependent on some column(s) in B or vice versa.
Page 24 of 75
(2) Determinant: How check if matrix full column-rank? If not, its
determinant is zero, so the matrix inverse will not exist and the
matrix will be said to be singular (see below).
Page 25 of 75
(3) Something approaching an intuition for why |A|=0 if not
full-column rank…
Page 26 of 75
(4) Some special properties of determinants of diagonal matrices
(only), D or C:
1( )
(5) Opposite the perfect colinearity that collapses determinants
(hypervolumes) to zero is…
g) Orthogonality: vectors and are ortogonal 0′⇔ ⊥ ⇔ =a b a b a b
.
(1) Why? i i i a b′ = Σa b , so the more a & b “go together”
(positively), the larger this is; the more
they relate oppositely (negatively), the more negative. If no
relation, i.e., orthogonal, then 0.
h) Orthogonal Projections: Suppose now we have 2 dimensions of
information & we’d like to summarize it in 1 dimension. Or,
suppose we have some y and we’d like to express it as nearly as
possible using some other vector of information x (perhaps x is
more readily available), but we want to keep everything linear
(because curved lines gives us a headache). Want linearly rescale
info in x to get as close as possible to y:
Page 27 of 75
(1) Don’t look now, but we just derived the least-squares formula
for b in the bivariate case. Unfortunately, this particular
solution only works when x'x is a scalar. (When will and won’t it
be?) Suppose, instead, we have X'X, a matrix; how could we solve
the analogous problem?
(2) That is, suppose that now we have k columns of information in X
(i.e., X is of full rank k) and that we’d like to use all of this
information to get as close as possible to the vector y.
i) DEF: homogenous equation-system and nonhomogenous
equation-system
(1) A homogenous system of equations is of the form Ax=0.
(2) A nonhomogenous system of equations is of the form Ax=b, where
b is a nonzero vector.
Page 28 of 75
That (regression as a projection problem; scaling the info in X by
b to get as close as possible to y) was so fun, let’s do it
again!
Page 29 of 75
Page 30 of 75
Page 31 of 75
Repeating the steps from solving systems of equations thru deriving
the LS coefficient formula:
Page 32 of 75
Page 33 of 75
Page 34 of 75
Page 35 of 75
Some things we can already say about A-1:
But, returning to our main goal, we’re looking for B≡A-1 such that
BA=I...
Page 36 of 75
j) Matrix Inversion: BA=I=AB ⇔ B=A-1 , A=B-1 (...the following is
repeated on next)
Page 37 of 75
Page 38 of 75
Page 39 of 75
k) For a diagonal matrix, D, the inverse is simply:
1 11
-
-
-
é ù ê ú ê ú= ê ú ê ú ê úë û
D
l) Additional notes on “the updating formula” for how (X'X)-1
changes as add rows (i.e., obs) to X
in Greene A.4.2 (eq’s A-66 & assoc. para.).
Page 40 of 75
Page 41 of 75
Again, what I call the Goldberger matrices (b/c that’s where I met
them):
Page 42 of 75
m) Further Useful Topics:
Page 43 of 75
Page 44 of 75
Page 45 of 75
Page 46 of 75
(2) Cholesky decomposition: Any symmetric, positive-definite A,
expressible as product of lower-
triangular, L, & upper-triangular, L'=U; so A=LU. Useful for
A-1=L-1U-1 & to find “A-1/2”.
(3) Working with partitioned matrices… See Greene section
A.5.
Page 47 of 75
11 22 22
(b) In general: 11 12 1 1
22 11 12 22 21 11 22 21 11 12 21 22
− −= × − = × − A A
A A A A A A A A A A A A
(5) Partitioned Inverses:
1 11
-
-
-
é ù ê ú ê ú= ê ú ê ú ê úë û
D
11 11 1
(c) In general:
( ) ( ) ( )
1 1 1 1 11 12 1 11 12 2 21 11 11 12 2
1 21 22 2 21 11 2
( ) ,
− − − − −
−
− −− − −
+ − = = −
= − = = −
A A A I A F A A A A F A
A A F A A F
F A A A A A F A A A A
Not that these last “in generals” are so terribly intuitive, but
useful have them at hand…
Page 48 of 75
(6) Useful sections of Greene, Appendix A (Matrix Algebra) not
explicitly or fully covered here:
[ ]
of and columns of ), called - .
kl k l kl
k l cross product
X
(I.e., sum matrices of products & cross-products : var-covar
matrix )
i ii
¢ ¢=
¢´ ´
x x
(b) A.2.8 very useful symmetric, idempotent mat, M0, which
mean-deviates what it multiplies:
1 1 1
1 1
n n n
é ù ê ú ê ú ê úé ù¢º - = = - =ê ú ê úë û ê ú ê ú ê úë û
(i) Convince yourself that this symmetric matrix is 1-1/n on
diagonal & -1/n for all off-diagonal elements.
(ii) Convince yourself that M0x= -x x and also, so, e.g., that
M0i=0.
(c) Definition A.10 length (or norm) of a vector, x: ¢=x x x
(d) Cosine Law: The angle, θ, between two vectors, a and b,
satisfies cosq ¢
= a b a b
Page 64 of 75
If A not symmetric, then: / ( )¢ ¢¶ ¶ =x Ax x A + A x . A few other
rules follow that Greene eq. A-132).
Page 65 of 75
Page 66 of 75
( ) ( )
( )
2First-Order Condition: 0
is not ( ), nor is ; only is, so: 0 2 2 0
2
n
n n
n i i i i
i
n
y x b y y x b x b
y y x b x b b b b
y f b x b y x x b
x
Min
i i i i i i i i i
n n b y x x b y x i i i
i i b y x x
= = =
− = = =
( ) ( ) ( )
( )
, where
e e e y Xb
y Xb y Xb y y b X y y Xb b X Xb
y y b X y y Xb b X Xb b
y y b X y y Xb b X Xb
( ) (And Again!!)
X y X y X Xb X Xb X y
b X X X y
Page 69 of 75
( ) ( )
( ) ( )
( )
1
21 1 2
Matrix version:
i i i i
i
y x x b b
y y x b x b y x x b x
b b
=
=
= =
∂ − +
= + ∂
∂ − + ∂ + = = >
∂ ∂
′ ′ ′ ′ ′ ′ ′∇ − − + = −
′ ′− + ′ ′ ′ ′∇ − − + =b
X y X y X Xb X X
Page 70 of 75
One more time! This time also highlights the first-order condition
“normal equations”, which demonstrate what we had also from the
geometry of the problem: that y-xb is orthogonal to x.
Page 71 of 75
This time in matrix algebra highlights the dimensions of the
problem & shows explicitly what derivative rules being
used:
Page 72 of 75
Page 73 of 75
Lagrange-Multiplier (LM) tests based on d(ln(L))/dβ at
unconstrained vs. constrained (though often necessary estimate only
the constrained b/c know solution unconstrained…).
Also based on test λ*=0 for suitably constructed auxiliary
regression s.t. coefficients=multipliers.
Page 74 of 75
Taylor Series (Linear) Approximations
Page 75 of 75
•