1. Preliminaries - GitHub Pages1 Find any r r nonsingular submatrix of A where r = rank(A). Call...

1. Preliminaries

Copyright c©2019 Dan Nettleton (Iowa State University) 1. Statistics 510 1 / 44

Notation for Scalars, Vectors, and Matrices

Lowercase letters =⇒ scalars: x, c, σ.

Boldface, lowercase letters =⇒ vectors: x, y, β.

Boldface, uppercase letters =⇒ matrices: A, X, Σ.


Notation for Dimensions and Elements of a Matrix

Suppose A is a matrix with m rows and n columns.

Then we say that A has dimensions m× n.

Let aij ∈ R be the element or entry in the ith row and jth column of A.

We convey all this information with the notation

m×nA = [aij].


The Product of a Scalar and a Matrix

Supposem×nA = [aij].

For any c ∈ R,

cA = Ac = [caij];

i.e., the product of the scalar c and the matrix A = [aij] is the matrix

whose entry in the ith row and jth column is c times aij for each

i = 1, . . . ,m and j = 1, . . . , n.


The Sum of Two Matrices

Suppose

m×nA = [aij] and

m×nB = [bij].

Then

m×nA +

m×nB =

m×nC = [cij = aij + bij] ;

i.e., the sum of m× n matrices A and B is an m× n matrix whose entry

in the ith row and jth column is the sum of the entry in the ith row and

jth column of A and the entry in the ith row and jth column of B

(i = 1, . . . ,m and j = 1, . . . , n).


Vector and Vector Transpose

In STAT 510, a vector is a matrix with one column:

x =

x1...

xn

.In STAT 510, we use x′ to denote the transpose of the vector x:

x′ = [x1, . . . , xn];

i.e., x is a matrix with one column and x′ is the matrix with thesame entries as x but written as a row rather than a column.


Transpose of a Matrix

Suppose A is an m× n matrix. Then we may write A as[a1, . . . , an], where ai is the ith column of A for each i = 1, . . . , n.

The transpose of the matrix A is

A′ = [a1, . . . , an]′ =

a′1...

a′n

.


Matrix Multiplication

Supposem×nA = [aij] and

n×kB = [bij].

Thenm×nA

n×kB =

m×kC =

[cij =

n∑l=1

ailblj

].


Matrix Multiplication Special Cases

If a =

a1...

an

and b =

b1...

bn

, then a′b =n∑

i=1

aibi.

Also, a′a =n∑

i=1

a2i ≡ ||a||2.

||a|| ≡√

a′a =

√√√√ n∑i=1

a2i is known as the Euclidean norm of a.


Another Look at Matrix Multiplication

Supposem×nA = [aij] = [a1, . . . , an] =

a′(1)...

a′(m)

andn×kB = [bij] = [b1, . . . , bk] =

b′(1)...

b′(n)

.

Thenm×nA

n×kB =

m×kC =

[cij =

n∑l=1

ailblj

]= [cij = a′(i)bj]

= [Ab1, . . . ,Abk] =

a′(1)B...

a′(m)B

=

n∑l=1

alb′(l).


Transpose of a Matrix Product

The transpose of a matrix product is a product of the transposesin reverse order; i.e.,

(AB)′ = B′A′.


Zero and One Vectors and the Identity Matrix

0 =

0

0

0...0

, 1 =

1

1

1...1

, I =

1 0 0 · · · 0

0 1 0 · · · 0

0 0 1 . . . ......

... . . . . . . 0

0 0 · · · 0 1

0 is also used to denote a matrix whose entries are all zero.


Linear Combination

If c1, . . . , cn ∈ R and a1, . . . , an ∈ Rm, then

n∑i=1

ciai = c1a1 + · · ·+ cnan

is a linear combination (LC) of a1, . . . , an.

The coefficients of the LC are c1, . . . , cn.


Column Spaces

Ac is a linear combination of the columns of an m× n matrix A:

Ac = [a1, . . . , an]

c1...

cn

= c1a1 + · · ·+ cnan.

The set of all possible linear combinations of the columns of A is

called the column space of A and is written as

C(A) = {Ac : c ∈ Rn}.

Note that C(A) ⊆ Rm.


Linear Independence and Linear Dependence

The vectors a1, . . . , an are linearly independent (LI) iff

n∑i=1

ciai = 0 =⇒ c1 = · · · = cn = 0.

The vectors a1, . . . , an are linearly dependent (LD) iff

there exist c1, . . . , cn not all 0 such thatn∑

i=1

ciai = 0.


Rank and Trace

The rank of a matrix A is written as rank(A) and is the maximumnumber of linearly independent rows (or columns) of A

The trace of an n× n matrix A is written as trace(A) and is thesum of the diagonal elements of A; i.e.,

trace(A) =n∑

i=1

aii.


Idempotent Matrices

A matrix A is said to be idempotent iff AA = A.

The rank of an idempotent matrix is equal to its trace; i.e.,

rank(A) = trace(A).


Square Matrices

An m× n matrix A is said to be square iff m = n.

If A is an m× n matrix, then A′A is an n× n matrix.

Thus, A′A is a square matrix for any matrix A.


Inverse of a Matrix

A square matrix A is nonsingular or invertible iff there exists a

square matrix B such that AB = I.

If A is nonsingular and AB = I, then B is the unique inverse of A

and is written as A−1.

For a nonsingular matrix A, we have AA−1 = I. (It is also true that

A−1A = I.)

A square matrix without an inverse is called singular.

An n× n matrix A is singular iff rank(A) < n.


Generalized Inverses

G is a generalized inverse of an m× n matrix A iff AGA = A.

We usually denote a generalized inverse of A by A−.

If A is nonsingular, i.e., if A−1 exists, then A−1 is the one and only

generalized inverse of A.

AA−1A = IA = AI = A

If A is singular, i.e., if A−1 does not exist, then there are infinitely

many generalized inverses of A.


Finding a Generalized Inverse of a Matrix A

1 Find any r × r nonsingular submatrix of A where r = rank(A). Call

this matrix W.

2 Invert and transpose W, i.e., compute (W−1)′.

3 Replace each element of W in A with the corresponding element

of (W−1)′.

4 Replace all other elements in A with zeros.

5 Transpose the resulting matrix to obtain G, a generalized inverse

for A.


Positive and Non-Negative Definite Matrices

x′Ax is known as a quadratic form.

We say that an n× n matrix A is positive definite (PD) iff

A is symmetric (i.e., A = A′), and

x′Ax > 0 for all x ∈ Rn \ {0}.

We say that an n× n matrix A is non-negative definite (NND) iff

A is symmetric (i.e., A = A′), and

x′Ax ≥ 0 for all x ∈ Rn.


Positive and Non-Negative Definite Matrices

A matrix that is positive definite is nonsingular; i.e.,

A positive definite =⇒ A−1 exists.

A matrix that is non-negative definite but not positive definite issingular.


Random Vectors

A random vector is a vector whose components are randomvariables.

y =

y1

y2...

yn


Expected Value of a Random Vector

The expected value, or mean, of a random vector y is the vectorof expected values of the components of y.

y =

y1

y2...

yn

=⇒ E(y) =

E(y1)

E(y2)...

E(yn)

Likewise, if A = [aij] is a matrix of random variables, thenE(A) = [E(aij)]; i.e., the expected value of A is the matrix ofexpected values of the elements of A.


Variance of a Random Vector

The variance of a random vector y = [y1, y2, . . . , yn]′ is the matrix

whose i, jth element is Cov(yi, yj) (i, j ∈ {1, . . . , n}).

Var(y) =

Cov(y1, y1) Cov(y1, y2) · · · Cov(y1, yn)

Cov(y2, y1) Cov(y2, y2) · · · Cov(y2, yn)...

... . . . ...Cov(yn, y1) Cov(yn, y2) · · · Cov(yn, yn)


Variance of a Random Vector

The covariance of a random variable with itself is the variance ofthat random variable. Thus,

Var(y) =

Var(y1) Cov(y1, y2) · · · Cov(y1, yn)

Cov(y2, y1) Var(y2) · · · Cov(y2, yn)...

... . . . ...Cov(yn, y1) Cov(yn, y2) · · · Var(yn)

.


Covariance Between Two Random Vectors

The covariance between random vectors u = [u1, . . . , um]′ and

v = [v1, . . . , vn]′ is the matrix whose i, jth element is Cov(ui, vj)

(i ∈ {1, . . . ,m}, j ∈ {1, . . . , n}).

Cov(u, v) =

Cov(u1, v1) Cov(u1, v2) · · · Cov(u1, vn)

Cov(u2, v1) Cov(u2, v2) · · · Cov(u2, vn)...

......

Cov(um, v1) Cov(um, v2) · · · Cov(um, vn)

= E(uv′)− E(u)E(v′).


Linear Transformation of a Random Vector

If y is an n× 1 random vector, A is an m× n matrix of constants,and b is an m× 1 vector of constants, then

Ay + b

is a linear transformation of the random vector y.


Mean, Variance, and Covariance of Linear

Transformations of a Random Vector y

E(Ay + b) = AE(y) + b

Var(Ay + b) = AVar(y)A′

Cov(Ay + b,Cy + d) = AVar(y)C′


Standard Multivariate Normal Distributions

If z1, . . . , zniid∼ N(0, 1), then

z =

z1...zn

has a standard multivariate normal distribution: z ∼ N(0, I).


Multivariate Normal Distributions

Suppose z is an n× 1 standard multivariate normal randomvector, i.e., z ∼ N(0, In×n).

Suppose A is an m× n matrix of constants and µ is an m× 1

vector of constants.

Then Az + µ has a multivariate normal distributionwith mean µ and variance AA′:

z ∼ N(0, I) =⇒ Az + µ ∼ N(µ,AA′).


Multivariate Normal Distributions

If µ is an m× 1 vector of constants and Σ is a m× m symmetric,non-negative definite (NND) matrix of rank n, then N(µ,Σ)

signifies the multivariate normal distribution with mean µ andvariance Σ.

If y ∼ N(µ,Σ), then y d= Az + µ, where z ∼ N(0, In×n) and A is an

m× n matrix of rank n such that AA′ = Σ.


Linear Transformations of Multivariate Normal

Distributions are Multivariate Normal

y ∼ N(µ,Σ) =⇒ y d= Az + µ, z ∼ N(0, I), AA′ = Σ

=⇒ Cy + d d= C(Az + µ) + d

=⇒ Cy + d d= CAz + Cµ+ d

=⇒ Cy + d d= Mz + u, M ≡ CA, u ≡ Cµ+ d

=⇒ Cy + d ∼ N(u,MM′).


Non-Central Chi-Squared Distributions

If y ∼ N(µ, In×n), then

w ≡ y′y =n∑

i=1

y2i

has a non-central chi-squared distribution with n degrees offreedom and non-centrality parameter µ′µ/2:

w ∼ χ2n(µ

′µ/2).

(Some define the non-centrality parameter as µ′µ rather than µ′µ/2.)


Central Chi-Squared Distributions

If z ∼ N(0, In×n), then

w ≡ z′z =n∑

i=1

z2i

has a central chi-squared distribution with n degrees of freedom:

w ∼ χ2n.

A central chi-squared distribution is a non-central chi-squareddistribution with non-centrality parameter 0: w ∼ χ2

n(0).


Important Distributional Result about Quadratic Forms

Suppose Σ is an n× n positive definite matrix.

Suppose A is an n× n symmetric matrix of rank m such that AΣis idempotent (i.e., AΣAΣ = AΣ).

Then y ∼ N(µ,Σ) =⇒ y′Ay ∼ χ2m(µ

′Aµ/2).


Mean and Variance of Chi-Squared Distributions

If w ∼ χ2m(θ), then

E(w) = m + 2θ and Var(w) = 2m + 8θ.


Non-Central t Distributions

Suppose y ∼ N(δ, 1).

Suppose w ∼ χ2m.

Suppose y and w are independent.

Then y/√

w/m has a non-central t distribution with m degrees offreedom and non-centrality parameter δ:

y√w/m

∼ tm(δ).


Central t Distributions

Suppose z ∼ N(0, 1).

Suppose w ∼ χ2m.

Suppose z and w are independent.

Then z/√

w/m has a central t distribution with m degrees offreedom:

z√w/m

∼ tm.

The distribution tm is the same as tm(0).


Non-Central F Distributions

Suppose w1 ∼ χ2m1(θ).

Suppose w2 ∼ χ2m2

.

Suppose w1 and w2 are independent.

Then (w1/m1)/(w2/m2) has a non-central F distribution with m1

numerator degrees of freedom, m2 denominator degrees offreedom, and non-centrality parameter θ:

w1/m1

w2/m2∼ Fm1,m2(θ).


Central F Distributions


.


.

Suppose w1 and w2 are independent.

Then (w1/m1)/(w2/m2) has a central F distribution with m1

numerator degrees of freedom and m2 denominator degrees offreedom:

w1/m1

w2/m2∼ Fm1,m2 (which is the same as the Fm1,m2(0) distribution).


Relationship between t and F Distributions

If u ∼ tm(δ), then u2 ∼ F1,m(δ2/2).


Some Independence (⊥) Results

Suppose y ∼ N(µ,Σ), where Σ is an n× n PD matrix.

If A1 is an n1 × n matrix of constants and A2 is an n2 × n

matrix of constants, then A1ΣA′2 = 0 =⇒ A1y ⊥ A2y.

If A1 is an n1 × n matrix of constants and A2 is an n× n

symmetric matrix of constants, thenA1ΣA2 = 0 =⇒ A1y ⊥ y′A2y.

If A1 and A2 are n× n symmetric matrices of constants, thenA1ΣA2 = 0 =⇒ y′A1y ⊥ y′A2y.


1. Preliminaries - GitHub Pages1 Find any r r nonsingular submatrix of A where r = rank(A). Call...

Documents

Transcript of 1. Preliminaries - GitHub Pages1 Find any r r nonsingular submatrix of A where r = rank(A). Call...