In matrix notation the factor model takes the form: p ...An observable random vector X of dimension...

An observable random vector X of dimension p has mean vector µ and covariance matrix Σ

The (orthogonal) factor model postulates that X depends linearly on unobservable random variables (latent variables) F1, F2, ..., Fm, called common factors and p additional (unobservable) sources of variation ε1, ε2, ...., εp, called errors or specific factors

Estimation for the factor model (cf. section 9.3)

Model formulation:

1 1 11 1 12 2 1 1m mX l F l F l Fµ ε− = + + +L

errors or specific factors

1 1 2 2p p p p pm m pX l F l F l Fµ ε− = + + +L

The coefficient is called the loading of the i-th variable on the j-th factor

− = +X µ L F ε

In matrix notation the factor model takes the form:

Here is the p x m matrix of factor loadings,

is the m-dimensional vector of

common factors, and is the

p-vector of errors (or specific factors)

{ }ijl=L

[ ]1 2, , , mF F F ′=F K

1 2, , , pε ε ε ′ = ε K

Cov( ) ( )

=′= =

F FF I

p-vector of errors (or specific factors)

Assumptions:

Cov( ) ( ) diag{ , , , }p

E ψ ψ ψ=

′= = =ε 0

ε εε Ψ K

Cov( , )=ε F 0

The model implies that

cov( ) {( )( ) }E ′= = − −Σ X X µ X µ ′= +LL Ψ

We may write

Cov( , ) {( ) }E ′= −X F X µ F = L

Var( )ii iXσ = 2

= +∑ 2def

communality specific variance

i ih ψ= +14243 14243

Cov( , )m

i k ij kjj

X X l l=

Cov( , )i j ijX F l=

Let T be a m x m orthogonal matrix

− = +X µ L F ε

The factor model may then be reformulated as

* *= +L F ε

where * *and ′= =L L T F T F

It is impossible on the basis of observations to

It is impossible on the basis of observations to distinguish the loadings L from the loadings L*

Thus the factor loadings are determined only up to rotation by an orthogonal matrix

There are different methods for estimation of the factor model

We will consider:

• estimation using principal components

• maximum likelihood estimation

For both methods the solution may be rotated by multiplication by an orthogonal matrix to simplify the interpretation of factors

Assume that Σ have eigenvalue-eigenvector pairs

By the spectral decomposition we may write

1 1 2 2( , ), ( , ), , ( , )p pλ λ λe e eK 1 2 0pλ λ λ≥ ≥ ≥ ≥L

1 1 1 2 2 2 p p pλ λ λ′ ′ ′= + + +Σ e e e e e eL

We first consider estimation using principal components

1 1 2 2 p p

λλ λ λ

′ ′

ee e e

M M L MM

This is of the form with m=p and ′= +Σ LL Ψ =Ψ 0

If the last p – m eigenvalues are small, we may neglect

in the representation of Σ

The representation on the previous slide is not particularly useful since we seek a representation with just a few common factors

1 1 1m m m p p pλ λ+ + +′ ′+ +e e e eL

This gives:

1 1 2 2 m m

λλ λ λ

′ ′ ≈ ′

eΣ e e e

M M L MM

This gives:

Allowing for specific factors (errors) we obtain

′≈ +Σ LL Ψ

1 1 2 2 m m

λλ λ λ

′ ′ = + ′

ee e e Ψ

M M L MM

where is given by 1 2diag{ , , , }pψ ψ ψ=Ψ K2

i ii ijj

lψ σ=

= −∑

m mλ ′ e

The estimation method using principal components (the principal factor method) is the empirical counterpart to this approximation

The matrix of estimated factor loadings is given by

Assume that S have eigenvalue-eigenvector pairs where

Let m < p be the number of common factors

1 1 2 2ˆ ˆ ˆˆ ˆ ˆ( , ), ( , ), , ( , )p pλ λ λe e eK 1 2

ˆ ˆ ˆ 0pλ λ λ≥ ≥ ≥ >L

Estimation using principal components:

1 1 2 2 m mλ λ λ = L e e e% M M L M

{ }ijl=L %%

1 1 2 2 m mλ λ λ = L e e eM M L M

The diagonal matrix of specific variances is estimated as

with 1 2diag{ , , , }pψ ψ ψ=Ψ% % % %K 2

i ii ijjs lψ

== −∑ %%

The communalities are estimated as 2 2

i ijjh l

==∑% %

Working with standardized variables, we replace S by the sample correlation matrix R

The total sample variance is

11 22 tr( )pps s s+ + + = SL

The contribution to this from the j-th factor is

2 2 21 2j j pjl l l+ + +% % %L ( ) ( )ˆ

j j j jλ λ′

= e e) ˆ

j j jλ ′= e e ˆjλ=

Thus contribution of total sample variance due to j-th factor is

pps s s

λ+ + +L

(similar to principal components)

How do we determine the number of factors?(if not given apriori)

We want the m factors to explain a fairly large proportion of the total sample variance, so may chose m so that this is achieved (subjectively)

For factor analysis of the correlation matrix R one may

For factor analysis of the correlation matrix R one may let m be the number of eigenvalues larger than 1

One would also like the residual matrix

( )′− + ΨS L L% % %

to have small off-diagonal element (the diagonal elements are zero)

Example 9.3 (contd): In a consumer preference study a random sample of customers were asked to rate several attributes of a new product using a 7-point scale

Sample correlation matrix:

The first two eigenvalues are the only ones larger than 1 and a model with two common factors accounts for 93% of the total (standardized) sample variance

Example 9.4

Weekly rates of return for five stocks on New York Stock Exchange Jan 2004 through Dec 2005

Consider now estimation by maximum likelihood

Then X is multivariate normal with mean vector µand covariance matrix of the form

We then assume that the common factors and the specific factors are normally distributed

′= +Σ LL Ψ

Likelihood (based on n observations x1, x2, .... , xn ):

111( , ) exp ( ) ( )

L − ′= − − − ∑µ Σ x µ Σ x µ

Note that the likelihood depends on and via

1/ 2/ 2

1( , ) exp ( ) ( )

(2 )j jnnp

′= − − −

∑µ Σ x µ Σ x µ

L Ψ Σ

Note also that the factor loadings and give the same value of the likelihood when is an orthogonal matrix

L * =L LTT

It is common to impose the restrictions that is a diagonal matrix

In order to make L well defined (corresponding to a particular choice of T ) we need to impose m(m-1)/2 restrictions on (the matrix T has m2 elements, and orthogonality imposes m(m+1)/2 restrictions)

1−′LΨ L

Under these restrictions there exist unique maximum likelihood estimates ˆˆ ˆ, (and )=L Ψ µ x

likelihood estimates ˆˆ ˆ, (and )=L Ψ µ x

The maximum likelihood estimates of the communalities are2 2 2 2

1 2ˆ ˆ ˆ ˆ ( 1,...., )i i i imh l l l i p= + + + =L

Contribution of total sample variance due to j-th factor:

2 2 21 2

ˆ ˆ ˆj j pj

+ + ++ + +

In general the correlation matrix

ρ ρρ ρ

M M O M

may be written as where 1/ 2 1/ 2− −= V ΣVρ

is the inverse of the standard deviation matrix

may be written as where = V ΣVρ

221/ 2

0 0 1 pp

M M O M

When we have that′= +Σ LL Ψ

1/ 2 1/ 2− −= V ΣVρ

1/ 2 1/ 2 1/ 2 1/ 2( )( )− − − −′= +V L V L V ΨV

′= +z z zL L Ψ

1/ 2 1/ 2( )− −′= +V LL Ψ V

1/ 2 1/ 2 1/ 2and (diagonal)− − −= =z zL V L Ψ V ΨV

We obtain maximum likelihood estimates by inserting

the maximum likelihood estimates for (details are given in Supplement 9A)

1/ 2, and−L V Ψ

Example 9.5

Weekly rates of return for five stocks on New York Stock Exchange Jan 2004 through Dec 2005

Under the assumption that X is multivariate normal with mean vector µ and covariance matrix Σ we may test the hypothesis that a factor model with m factors holds

0 :H ′= +Σ LL Ψ

We may use the likelihood ratio test

This corresponds to test the null hypothesis

We may use the likelihood ratio test

When we assume no structure on Σ the maximum of the likelihood becomes

/ 2/ 2/ 2

1max ( , )

(2 )np

−=µ,Σ

µ ΣS

where Sn = (n-1)S/n

When one may show that the maximum of the likelihood becomes

′= +Σ LL Ψ

where / 2ˆ| |

ˆˆ ˆ ˆ ′= +Σ LL Ψ

The likelihood ratio takes the form

/ 2/ 2

1max ( , )

ˆ(2 )

′= +=

µ,Σ LL Ψµ Σ

n For testing we may use that under H0

ˆ| |2 log log

− Λ =

is approximately chi-squared distributed with p(p+1)/2 – [p(m+1) – m(m – 1)/2] degrees of freedom

A Bartlett correction may be used to improve the approximation to the chi-squared distribution, cf, J&W

Example 9.8:

We register the scores in 6 subject areas for 220 students

Sample correlation matrix:

Factor rotation (cf. section 9.4 )

Maximum likelihood solution for two factors:

Plot factor loading pairs ; cf. Fig 9.1 1 2ˆ ˆ( , )i il l

H Original factors:• First factor: “general intelligence”• Second factor: “nonmath-math”

Ge Rotated factors:• First factor: “mathematical ability”• Second factor: “verbal ability”

(subjectively rotate -20 degrees)

where the are the rotated loadings scaled by the square root of the communalities

The VARIMAX criterion for chooses the rotation matrix T that maximizes

1 1p pm

ij ijj i i

V l lp p= = =

∑ ∑ ∑% %

* * ˆ/ij ij il l h=%

After the rotation the are multiplied by to *l% h

After the rotation the are multiplied by to preserve the original communalities

*ijl% ih

Note that

variance of squares of (scaled) loadings for th factor

∝ ∑

The criterion aims at “spreading out” the (square of) the loadings on each factor as much as possible

Example 9.9: In a consumer preference study a random sample of customers were asked to rate several attributes of a new product using a 7-point scale

Sample correlation matrix

Principal component solution with two factors and varimaxrotated factors

Rotated factors:• First factor: “nutritional”• Second factor: “taste”

Example 9.10 (and more)

We have found a two factor principal component solution as well as a maximum likelihood solution for the stock-price data (cf. examples 9.4 and 9.5)

30The loadings are quite different

Maximum likelihoodF1 F2

morgan 0.763 0.029 citi 0.819 0.232 fargo 0.668 0.108

Principal componentsF1 F2

0.852 0.0360.849 0.2200.812 0.085

Using the varimax criterion, we obtain the following rotated loadings

fargo 0.668 0.108 shell 0.112 0.994 exxon 0.109 0.675

0.812 0.0850.126 0.9120.078 0.910

The rotated loadings are quite similar

In matrix notation the factor model takes the form: p ...An observable random vector X of dimension...

Documents

Transcript of In matrix notation the factor model takes the form: p ...An observable random vector X of dimension...

Matrix-Matrix Multiplication · Week 5. Matrix-Matrix Multiplication 162 5.2Observations 5.2.1Partitioned Matrix-Matrix Multiplication * View at edX Theorem 5.1 Let C 2Rm n, A 2Rm

extracellular matrix - Weebly

Linear Transformations and Matrices•Relate linear transformations and matrix-vector multiplication. •Understand and exploit how a linear transformation is completely described

Rotation Matrix

Chapter 7 PageRank · Background Web graph Google’s matrix Teleportation 1 − α Personalised vector Sensitivity Proofs Local algorithms New Brin and Page, 1998 Established the

Matrix Norms - Forsiden - Universitetet i Oslo · Examples I From any vector norm kk ... I The Frobenius norm is subordinate to the Euclidian ... I Since all matrix norms are equivalent,

Matrix-Matrix Multiplication › users › flame › LAFF › Notes › Week5.pdfWeek 5. Matrix-Matrix Multiplication 164 Is matrix-matrix multiplication associative? Homework 5.2.2.1

Fundamentals Matrix Algebra

Sensorless Vector Inverter

Vector Class

n matrix A and an n-vector ~b, deter- ~x IR A~x ~b n ...pbeerli/classes/isc5315-notes/jpeterson/linear_algebra_part...$7 from Amazon) •G. Golub and C. van Loan, Matrix Computations

Sparse Matrix-Vector Multiplication with Wide SIMD Units ... · GATech variant to be in future MKL editions Upcoming (initial release for 2014): General Hybrid Optimized Sparse Toolkit

Vector Calculas

Matrix beam

Lecture 1: Inner product, Orthogonality, Vector/Matrix normmath.xmu.edu.cn/group/nona/nla/lecture01.pdf · Lecture 1: Inner product, Orthogonality, Vector/Matrix norm September 20,

Purdue Universitybulrich/...submitted-J-Alg.pdf · A MATRIX OF LINEAR FORMS WHICH IS ANNIHILATED BY A VECTOR OF INDETERMINATES ANDREW R. KUSTIN, CLAUDIA POLINI, AND BERND ULRICH ABSTRACT.Let

VECTOR PROCESSING

VECTOR AUTOREGRESSIVE COVARIANCE MATRIX ESTIMATIONecon.lse.ac.uk/staff/wdenhaan/papers/asym35.pdf · VECTOR AUTOREGRESSIVE COVARIANCE MATRIX ESTIMATION ... (PSD) kernels. The VAR

Power Factor

Matrix Chain Multiplication. a b b c = A = a x b matrix B = b x c matrix matrix multiplication.