1. Preliminaries - GitHub Pages1 Find any r r nonsingular submatrix of A where r = rank(A). Call...
Transcript of 1. Preliminaries - GitHub Pages1 Find any r r nonsingular submatrix of A where r = rank(A). Call...
1. Preliminaries
Copyright c©2019 Dan Nettleton (Iowa State University) 1. Statistics 510 1 / 44
Notation for Scalars, Vectors, and Matrices
Lowercase letters =⇒ scalars: x, c, σ.
Boldface, lowercase letters =⇒ vectors: x, y, β.
Boldface, uppercase letters =⇒ matrices: A, X, Σ.
Copyright c©2019 Dan Nettleton (Iowa State University) 1. Statistics 510 2 / 44
Notation for Dimensions and Elements of a Matrix
Suppose A is a matrix with m rows and n columns.
Then we say that A has dimensions m× n.
Let aij ∈ R be the element or entry in the ith row and jth column of A.
We convey all this information with the notation
m×nA = [aij].
Copyright c©2019 Dan Nettleton (Iowa State University) 1. Statistics 510 3 / 44
The Product of a Scalar and a Matrix
Supposem×nA = [aij].
For any c ∈ R,
cA = Ac = [caij];
i.e., the product of the scalar c and the matrix A = [aij] is the matrix
whose entry in the ith row and jth column is c times aij for each
i = 1, . . . ,m and j = 1, . . . , n.
Copyright c©2019 Dan Nettleton (Iowa State University) 1. Statistics 510 4 / 44
The Sum of Two Matrices
Suppose
m×nA = [aij] and
m×nB = [bij].
Then
m×nA +
m×nB =
m×nC = [cij = aij + bij] ;
i.e., the sum of m× n matrices A and B is an m× n matrix whose entry
in the ith row and jth column is the sum of the entry in the ith row and
jth column of A and the entry in the ith row and jth column of B
(i = 1, . . . ,m and j = 1, . . . , n).
Copyright c©2019 Dan Nettleton (Iowa State University) 1. Statistics 510 5 / 44
Vector and Vector Transpose
In STAT 510, a vector is a matrix with one column:
x =
x1...
xn
.In STAT 510, we use x′ to denote the transpose of the vector x:
x′ = [x1, . . . , xn];
i.e., x is a matrix with one column and x′ is the matrix with thesame entries as x but written as a row rather than a column.
Copyright c©2019 Dan Nettleton (Iowa State University) 1. Statistics 510 6 / 44
Transpose of a Matrix
Suppose A is an m× n matrix. Then we may write A as[a1, . . . , an], where ai is the ith column of A for each i = 1, . . . , n.
The transpose of the matrix A is
A′ = [a1, . . . , an]′ =
a′1...
a′n
.
Copyright c©2019 Dan Nettleton (Iowa State University) 1. Statistics 510 7 / 44
Matrix Multiplication
Supposem×nA = [aij] and
n×kB = [bij].
Thenm×nA
n×kB =
m×kC =
[cij =
n∑l=1
ailblj
].
Copyright c©2019 Dan Nettleton (Iowa State University) 1. Statistics 510 8 / 44
Matrix Multiplication Special Cases
If a =
a1...
an
and b =
b1...
bn
, then a′b =n∑
i=1
aibi.
Also, a′a =n∑
i=1
a2i ≡ ||a||2.
||a|| ≡√
a′a =
√√√√ n∑i=1
a2i is known as the Euclidean norm of a.
Copyright c©2019 Dan Nettleton (Iowa State University) 1. Statistics 510 9 / 44
Another Look at Matrix Multiplication
Supposem×nA = [aij] = [a1, . . . , an] =
a′(1)...
a′(m)
andn×kB = [bij] = [b1, . . . , bk] =
b′(1)...
b′(n)
.
Thenm×nA
n×kB =
m×kC =
[cij =
n∑l=1
ailblj
]= [cij = a′(i)bj]
= [Ab1, . . . ,Abk] =
a′(1)B...
a′(m)B
=
n∑l=1
alb′(l).
Copyright c©2019 Dan Nettleton (Iowa State University) 1. Statistics 510 10 / 44
Transpose of a Matrix Product
The transpose of a matrix product is a product of the transposesin reverse order; i.e.,
(AB)′ = B′A′.
Copyright c©2019 Dan Nettleton (Iowa State University) 1. Statistics 510 11 / 44
Zero and One Vectors and the Identity Matrix
0 =
0
0
0...0
, 1 =
1
1
1...1
, I =
1 0 0 · · · 0
0 1 0 · · · 0
0 0 1 . . . ......
... . . . . . . 0
0 0 · · · 0 1
0 is also used to denote a matrix whose entries are all zero.
Copyright c©2019 Dan Nettleton (Iowa State University) 1. Statistics 510 12 / 44
Linear Combination
If c1, . . . , cn ∈ R and a1, . . . , an ∈ Rm, then
n∑i=1
ciai = c1a1 + · · ·+ cnan
is a linear combination (LC) of a1, . . . , an.
The coefficients of the LC are c1, . . . , cn.
Copyright c©2019 Dan Nettleton (Iowa State University) 1. Statistics 510 13 / 44
Column Spaces
Ac is a linear combination of the columns of an m× n matrix A:
Ac = [a1, . . . , an]
c1...
cn
= c1a1 + · · ·+ cnan.
The set of all possible linear combinations of the columns of A is
called the column space of A and is written as
C(A) = {Ac : c ∈ Rn}.
Note that C(A) ⊆ Rm.
Copyright c©2019 Dan Nettleton (Iowa State University) 1. Statistics 510 14 / 44
Linear Independence and Linear Dependence
The vectors a1, . . . , an are linearly independent (LI) iff
n∑i=1
ciai = 0 =⇒ c1 = · · · = cn = 0.
The vectors a1, . . . , an are linearly dependent (LD) iff
there exist c1, . . . , cn not all 0 such thatn∑
i=1
ciai = 0.
Copyright c©2019 Dan Nettleton (Iowa State University) 1. Statistics 510 15 / 44
Rank and Trace
The rank of a matrix A is written as rank(A) and is the maximumnumber of linearly independent rows (or columns) of A
The trace of an n× n matrix A is written as trace(A) and is thesum of the diagonal elements of A; i.e.,
trace(A) =n∑
i=1
aii.
Copyright c©2019 Dan Nettleton (Iowa State University) 1. Statistics 510 16 / 44
Idempotent Matrices
A matrix A is said to be idempotent iff AA = A.
The rank of an idempotent matrix is equal to its trace; i.e.,
rank(A) = trace(A).
Copyright c©2019 Dan Nettleton (Iowa State University) 1. Statistics 510 17 / 44
Square Matrices
An m× n matrix A is said to be square iff m = n.
If A is an m× n matrix, then A′A is an n× n matrix.
Thus, A′A is a square matrix for any matrix A.
Copyright c©2019 Dan Nettleton (Iowa State University) 1. Statistics 510 18 / 44
Inverse of a Matrix
A square matrix A is nonsingular or invertible iff there exists a
square matrix B such that AB = I.
If A is nonsingular and AB = I, then B is the unique inverse of A
and is written as A−1.
For a nonsingular matrix A, we have AA−1 = I. (It is also true that
A−1A = I.)
A square matrix without an inverse is called singular.
An n× n matrix A is singular iff rank(A) < n.
Copyright c©2019 Dan Nettleton (Iowa State University) 1. Statistics 510 19 / 44
Generalized Inverses
G is a generalized inverse of an m× n matrix A iff AGA = A.
We usually denote a generalized inverse of A by A−.
If A is nonsingular, i.e., if A−1 exists, then A−1 is the one and only
generalized inverse of A.
AA−1A = IA = AI = A
If A is singular, i.e., if A−1 does not exist, then there are infinitely
many generalized inverses of A.
Copyright c©2019 Dan Nettleton (Iowa State University) 1. Statistics 510 20 / 44
Finding a Generalized Inverse of a Matrix A
1 Find any r × r nonsingular submatrix of A where r = rank(A). Call
this matrix W.
2 Invert and transpose W, i.e., compute (W−1)′.
3 Replace each element of W in A with the corresponding element
of (W−1)′.
4 Replace all other elements in A with zeros.
5 Transpose the resulting matrix to obtain G, a generalized inverse
for A.
Copyright c©2019 Dan Nettleton (Iowa State University) 1. Statistics 510 21 / 44
Positive and Non-Negative Definite Matrices
x′Ax is known as a quadratic form.
We say that an n× n matrix A is positive definite (PD) iff
A is symmetric (i.e., A = A′), and
x′Ax > 0 for all x ∈ Rn \ {0}.
We say that an n× n matrix A is non-negative definite (NND) iff
A is symmetric (i.e., A = A′), and
x′Ax ≥ 0 for all x ∈ Rn.
Copyright c©2019 Dan Nettleton (Iowa State University) 1. Statistics 510 22 / 44
Positive and Non-Negative Definite Matrices
A matrix that is positive definite is nonsingular; i.e.,
A positive definite =⇒ A−1 exists.
A matrix that is non-negative definite but not positive definite issingular.
Copyright c©2019 Dan Nettleton (Iowa State University) 1. Statistics 510 23 / 44
Random Vectors
A random vector is a vector whose components are randomvariables.
y =
y1
y2...
yn
Copyright c©2019 Dan Nettleton (Iowa State University) 1. Statistics 510 24 / 44
Expected Value of a Random Vector
The expected value, or mean, of a random vector y is the vectorof expected values of the components of y.
y =
y1
y2...
yn
=⇒ E(y) =
E(y1)
E(y2)...
E(yn)
Likewise, if A = [aij] is a matrix of random variables, thenE(A) = [E(aij)]; i.e., the expected value of A is the matrix ofexpected values of the elements of A.
Copyright c©2019 Dan Nettleton (Iowa State University) 1. Statistics 510 25 / 44
Variance of a Random Vector
The variance of a random vector y = [y1, y2, . . . , yn]′ is the matrix
whose i, jth element is Cov(yi, yj) (i, j ∈ {1, . . . , n}).
Var(y) =
Cov(y1, y1) Cov(y1, y2) · · · Cov(y1, yn)
Cov(y2, y1) Cov(y2, y2) · · · Cov(y2, yn)...
... . . . ...Cov(yn, y1) Cov(yn, y2) · · · Cov(yn, yn)
Copyright c©2019 Dan Nettleton (Iowa State University) 1. Statistics 510 26 / 44
Variance of a Random Vector
The covariance of a random variable with itself is the variance ofthat random variable. Thus,
Var(y) =
Var(y1) Cov(y1, y2) · · · Cov(y1, yn)
Cov(y2, y1) Var(y2) · · · Cov(y2, yn)...
... . . . ...Cov(yn, y1) Cov(yn, y2) · · · Var(yn)
.
Copyright c©2019 Dan Nettleton (Iowa State University) 1. Statistics 510 27 / 44
Covariance Between Two Random Vectors
The covariance between random vectors u = [u1, . . . , um]′ and
v = [v1, . . . , vn]′ is the matrix whose i, jth element is Cov(ui, vj)
(i ∈ {1, . . . ,m}, j ∈ {1, . . . , n}).
Cov(u, v) =
Cov(u1, v1) Cov(u1, v2) · · · Cov(u1, vn)
Cov(u2, v1) Cov(u2, v2) · · · Cov(u2, vn)...
......
Cov(um, v1) Cov(um, v2) · · · Cov(um, vn)
= E(uv′)− E(u)E(v′).
Copyright c©2019 Dan Nettleton (Iowa State University) 1. Statistics 510 28 / 44
Linear Transformation of a Random Vector
If y is an n× 1 random vector, A is an m× n matrix of constants,and b is an m× 1 vector of constants, then
Ay + b
is a linear transformation of the random vector y.
Copyright c©2019 Dan Nettleton (Iowa State University) 1. Statistics 510 29 / 44
Mean, Variance, and Covariance of Linear
Transformations of a Random Vector y
E(Ay + b) = AE(y) + b
Var(Ay + b) = AVar(y)A′
Cov(Ay + b,Cy + d) = AVar(y)C′
Copyright c©2019 Dan Nettleton (Iowa State University) 1. Statistics 510 30 / 44
Standard Multivariate Normal Distributions
If z1, . . . , zniid∼ N(0, 1), then
z =
z1...zn
has a standard multivariate normal distribution: z ∼ N(0, I).
Copyright c©2019 Dan Nettleton (Iowa State University) 1. Statistics 510 31 / 44
Multivariate Normal Distributions
Suppose z is an n× 1 standard multivariate normal randomvector, i.e., z ∼ N(0, In×n).
Suppose A is an m× n matrix of constants and µ is an m× 1
vector of constants.
Then Az + µ has a multivariate normal distributionwith mean µ and variance AA′:
z ∼ N(0, I) =⇒ Az + µ ∼ N(µ,AA′).
Copyright c©2019 Dan Nettleton (Iowa State University) 1. Statistics 510 32 / 44
Multivariate Normal Distributions
If µ is an m× 1 vector of constants and Σ is a m× m symmetric,non-negative definite (NND) matrix of rank n, then N(µ,Σ)
signifies the multivariate normal distribution with mean µ andvariance Σ.
If y ∼ N(µ,Σ), then y d= Az + µ, where z ∼ N(0, In×n) and A is an
m× n matrix of rank n such that AA′ = Σ.
Copyright c©2019 Dan Nettleton (Iowa State University) 1. Statistics 510 33 / 44
Linear Transformations of Multivariate Normal
Distributions are Multivariate Normal
y ∼ N(µ,Σ) =⇒ y d= Az + µ, z ∼ N(0, I), AA′ = Σ
=⇒ Cy + d d= C(Az + µ) + d
=⇒ Cy + d d= CAz + Cµ+ d
=⇒ Cy + d d= Mz + u, M ≡ CA, u ≡ Cµ+ d
=⇒ Cy + d ∼ N(u,MM′).
Copyright c©2019 Dan Nettleton (Iowa State University) 1. Statistics 510 34 / 44
Non-Central Chi-Squared Distributions
If y ∼ N(µ, In×n), then
w ≡ y′y =n∑
i=1
y2i
has a non-central chi-squared distribution with n degrees offreedom and non-centrality parameter µ′µ/2:
w ∼ χ2n(µ
′µ/2).
(Some define the non-centrality parameter as µ′µ rather than µ′µ/2.)
Copyright c©2019 Dan Nettleton (Iowa State University) 1. Statistics 510 35 / 44
Central Chi-Squared Distributions
If z ∼ N(0, In×n), then
w ≡ z′z =n∑
i=1
z2i
has a central chi-squared distribution with n degrees of freedom:
w ∼ χ2n.
A central chi-squared distribution is a non-central chi-squareddistribution with non-centrality parameter 0: w ∼ χ2
n(0).
Copyright c©2019 Dan Nettleton (Iowa State University) 1. Statistics 510 36 / 44
Important Distributional Result about Quadratic Forms
Suppose Σ is an n× n positive definite matrix.
Suppose A is an n× n symmetric matrix of rank m such that AΣis idempotent (i.e., AΣAΣ = AΣ).
Then y ∼ N(µ,Σ) =⇒ y′Ay ∼ χ2m(µ
′Aµ/2).
Copyright c©2019 Dan Nettleton (Iowa State University) 1. Statistics 510 37 / 44
Mean and Variance of Chi-Squared Distributions
If w ∼ χ2m(θ), then
E(w) = m + 2θ and Var(w) = 2m + 8θ.
Copyright c©2019 Dan Nettleton (Iowa State University) 1. Statistics 510 38 / 44
Non-Central t Distributions
Suppose y ∼ N(δ, 1).
Suppose w ∼ χ2m.
Suppose y and w are independent.
Then y/√
w/m has a non-central t distribution with m degrees offreedom and non-centrality parameter δ:
y√w/m
∼ tm(δ).
Copyright c©2019 Dan Nettleton (Iowa State University) 1. Statistics 510 39 / 44
Central t Distributions
Suppose z ∼ N(0, 1).
Suppose w ∼ χ2m.
Suppose z and w are independent.
Then z/√
w/m has a central t distribution with m degrees offreedom:
z√w/m
∼ tm.
The distribution tm is the same as tm(0).
Copyright c©2019 Dan Nettleton (Iowa State University) 1. Statistics 510 40 / 44
Non-Central F Distributions
Suppose w1 ∼ χ2m1(θ).
Suppose w2 ∼ χ2m2
.
Suppose w1 and w2 are independent.
Then (w1/m1)/(w2/m2) has a non-central F distribution with m1
numerator degrees of freedom, m2 denominator degrees offreedom, and non-centrality parameter θ:
w1/m1
w2/m2∼ Fm1,m2(θ).
Copyright c©2019 Dan Nettleton (Iowa State University) 1. Statistics 510 41 / 44
Central F Distributions
Suppose w1 ∼ χ2m1
.
Suppose w2 ∼ χ2m2
.
Suppose w1 and w2 are independent.
Then (w1/m1)/(w2/m2) has a central F distribution with m1
numerator degrees of freedom and m2 denominator degrees offreedom:
w1/m1
w2/m2∼ Fm1,m2 (which is the same as the Fm1,m2(0) distribution).
Copyright c©2019 Dan Nettleton (Iowa State University) 1. Statistics 510 42 / 44
Relationship between t and F Distributions
If u ∼ tm(δ), then u2 ∼ F1,m(δ2/2).
Copyright c©2019 Dan Nettleton (Iowa State University) 1. Statistics 510 43 / 44
Some Independence (⊥) Results
Suppose y ∼ N(µ,Σ), where Σ is an n× n PD matrix.
If A1 is an n1 × n matrix of constants and A2 is an n2 × n
matrix of constants, then A1ΣA′2 = 0 =⇒ A1y ⊥ A2y.
If A1 is an n1 × n matrix of constants and A2 is an n× n
symmetric matrix of constants, thenA1ΣA2 = 0 =⇒ A1y ⊥ y′A2y.
If A1 and A2 are n× n symmetric matrices of constants, thenA1ΣA2 = 0 =⇒ y′A1y ⊥ y′A2y.
Copyright c©2019 Dan Nettleton (Iowa State University) 1. Statistics 510 44 / 44