In matrix notation the factor model takes the form: p ...An observable random vector X of dimension...

An observable random vector X of dimension p has mean vector µ and covariance matrix Σ

The (orthogonal) factor model postulates that X depends linearly on unobservable random variables (latent variables) F1, F2, ..., Fm, called common factors and p additional (unobservable) sources of variation ε1, ε2, ...., εp, called errors or specific factors

Estimation for the factor model (cf. section 9.3)

1

Model formulation:

1 1 11 1 12 2 1 1m mX l F l F l Fµ ε− = + + +L

errors or specific factors

1 1 2 2p p p p pm m pX l F l F l Fµ ε− = + + +L

The coefficient is called the loading of the i-th variable on the j-th factor

ijl

M

− = +X µ L F ε

In matrix notation the factor model takes the form:

Here is the p x m matrix of factor loadings,

is the m-dimensional vector of

common factors, and is the

p-vector of errors (or specific factors)

{ }ijl=L

[ ]1 2, , , mF F F ′=F K

1 2, , , pε ε ε ′ = ε K

2

( )

Cov( ) ( )

E

E

=′= =

F 0

F FF I

p-vector of errors (or specific factors)

Assumptions:

1 2

( )

Cov( ) ( ) diag{ , , , }p

E

E ψ ψ ψ=

′= = =ε 0

ε εε Ψ K

Cov( , )=ε F 0

The model implies that

cov( ) {( )( ) }E ′= = − −Σ X X µ X µ ′= +LL Ψ

We may write

m

Cov( , ) {( ) }E ′= −X F X µ F = L

3

Var( )ii iXσ = 2

1

m

ij ij

l ψ=

= +∑ 2def

communality specific variance

i ih ψ= +14243 14243

1

Cov( , )m

i k ij kjj

X X l l=

=∑

Cov( , )i j ijX F l=

Let T be a m x m orthogonal matrix

− = +X µ L F ε

The factor model may then be reformulated as

* *= +L F ε

where * *and ′= =L L T F T F

It is impossible on the basis of observations to

4

It is impossible on the basis of observations to distinguish the loadings L from the loadings L*

Thus the factor loadings are determined only up to rotation by an orthogonal matrix

There are different methods for estimation of the factor model

We will consider:

• estimation using principal components

• maximum likelihood estimation

5

• maximum likelihood estimation

For both methods the solution may be rotated by multiplication by an orthogonal matrix to simplify the interpretation of factors

Assume that Σ have eigenvalue-eigenvector pairs

where

By the spectral decomposition we may write

1 1 2 2( , ), ( , ), , ( , )p pλ λ λe e eK 1 2 0pλ λ λ≥ ≥ ≥ ≥L

1 1 1 2 2 2 p p pλ λ λ′ ′ ′= + + +Σ e e e e e eL

We first consider estimation using principal components

6

1 1

2 2

1 1 2 2 p p

p p

λ

λλ λ λ

λ

′ ′

= ′

e

ee e e

e

LLL

LLL

LLL

M M L MM

This is of the form with m=p and ′= +Σ LL Ψ =Ψ 0

If the last p – m eigenvalues are small, we may neglect

in the representation of Σ

The representation on the previous slide is not particularly useful since we seek a representation with just a few common factors

1 1 1m m m p p pλ λ+ + +′ ′+ +e e e eL

This gives:

7

1 1

2 2

1 1 2 2 m m

m m

λ

λλ λ λ

λ

′ ′ ≈ ′

e

eΣ e e e

e

LLL

LLL

LLL

M M L MM

This gives:

Allowing for specific factors (errors) we obtain

′≈ +Σ LL Ψ

1 1

2 2

1 1 2 2 m m

λ

λλ λ λ

λ

′ ′ = + ′

e

ee e e Ψ

e

LLL

LLL

LLL

M M L MM

8

where is given by 1 2diag{ , , , }pψ ψ ψ=Ψ K2

1

m

i ii ijj

lψ σ=

= −∑

m mλ ′ e

The estimation method using principal components (the principal factor method) is the empirical counterpart to this approximation

The matrix of estimated factor loadings is given by

Assume that S have eigenvalue-eigenvector pairs where

Let m < p be the number of common factors

1 1 2 2ˆ ˆ ˆˆ ˆ ˆ( , ), ( , ), , ( , )p pλ λ λe e eK 1 2

ˆ ˆ ˆ 0pλ λ λ≥ ≥ ≥ >L

Estimation using principal components:

1 1 2 2 m mλ λ λ = L e e e% M M L M

{ }ijl=L %%

9

1 1 2 2 m mλ λ λ = L e e eM M L M

The diagonal matrix of specific variances is estimated as

with 1 2diag{ , , , }pψ ψ ψ=Ψ% % % %K 2

1

m

i ii ijjs lψ

== −∑ %%

The communalities are estimated as 2 2

1

m

i ijjh l

==∑% %

Working with standardized variables, we replace S by the sample correlation matrix R

The total sample variance is

11 22 tr( )pps s s+ + + = SL

The contribution to this from the j-th factor is

2 2 21 2j j pjl l l+ + +% % %L ( ) ( )ˆ

j j j jλ λ′

= e e) ˆ

j j jλ ′= e e ˆjλ=

10

Thus contribution of total sample variance due to j-th factor is

11 22

ˆj

pps s s

λ+ + +L

(similar to principal components)

How do we determine the number of factors?(if not given apriori)

We want the m factors to explain a fairly large proportion of the total sample variance, so may chose m so that this is achieved (subjectively)

For factor analysis of the correlation matrix R one may

11

For factor analysis of the correlation matrix R one may let m be the number of eigenvalues larger than 1

One would also like the residual matrix

( )′− + ΨS L L% % %

to have small off-diagonal element (the diagonal elements are zero)

Example 9.3 (contd): In a consumer preference study a random sample of customers were asked to rate several attributes of a new product using a 7-point scale

Sample correlation matrix:

12

The first two eigenvalues are the only ones larger than 1 and a model with two common factors accounts for 93% of the total (standardized) sample variance

13

Example 9.4

Weekly rates of return for five stocks on New York Stock Exchange Jan 2004 through Dec 2005

14

15

Consider now estimation by maximum likelihood

Then X is multivariate normal with mean vector µand covariance matrix of the form

We then assume that the common factors and the specific factors are normally distributed

′= +Σ LL Ψ

Likelihood (based on n observations x1, x2, .... , xn ):

111( , ) exp ( ) ( )

n

L − ′= − − − ∑µ Σ x µ Σ x µ

Note that the likelihood depends on and via

16

1/ 2/ 2

1

1

2

1( , ) exp ( ) ( )

(2 )j jnnp

j

Lπ

−

=

′= − − −

∑µ Σ x µ Σ x µ

Σ

L Ψ Σ

Note also that the factor loadings and give the same value of the likelihood when is an orthogonal matrix

L * =L LTT

It is common to impose the restrictions that is a diagonal matrix

In order to make L well defined (corresponding to a particular choice of T ) we need to impose m(m-1)/2 restrictions on (the matrix T has m2 elements, and orthogonality imposes m(m+1)/2 restrictions)

1−′LΨ L

Under these restrictions there exist unique maximum likelihood estimates ˆˆ ˆ, (and )=L Ψ µ x

17

likelihood estimates ˆˆ ˆ, (and )=L Ψ µ x

The maximum likelihood estimates of the communalities are2 2 2 2

1 2ˆ ˆ ˆ ˆ ( 1,...., )i i i imh l l l i p= + + + =L

Contribution of total sample variance due to j-th factor:

2 2 21 2

11 22

ˆ ˆ ˆj j pj

pp

l l l

s s s

+ + ++ + +

L

L

In general the correlation matrix

12 1

21 2

1 2

1

1

1

p

p

p p

ρ ρρ ρ

ρ ρ

=

ρ

L

L

M M O M

L

may be written as where 1/ 2 1/ 2− −= V ΣVρ

18

is the inverse of the standard deviation matrix

may be written as where = V ΣVρ

11

221/ 2

1 0 0

0 1 0

0 0 1 pp

σσ

σ

−

=

V

L

L

M M O M

L

When we have that′= +Σ LL Ψ

1/ 2 1/ 2− −= V ΣVρ

1/ 2 1/ 2 1/ 2 1/ 2( )( )− − − −′= +V L V L V ΨV

′= +z z zL L Ψ

1/ 2 1/ 2( )− −′= +V LL Ψ V

19

where

1/ 2 1/ 2 1/ 2and (diagonal)− − −= =z zL V L Ψ V ΨV

We obtain maximum likelihood estimates by inserting

the maximum likelihood estimates for (details are given in Supplement 9A)

1/ 2, and−L V Ψ

Example 9.5

Weekly rates of return for five stocks on New York Stock Exchange Jan 2004 through Dec 2005

20

Under the assumption that X is multivariate normal with mean vector µ and covariance matrix Σ we may test the hypothesis that a factor model with m factors holds

0 :H ′= +Σ LL Ψ

We may use the likelihood ratio test

This corresponds to test the null hypothesis

21

We may use the likelihood ratio test

When we assume no structure on Σ the maximum of the likelihood becomes

/ 2/ 2/ 2

1max ( , )

(2 )np

nnpn

L eπ

−=µ,Σ

µ ΣS

where Sn = (n-1)S/n

When one may show that the maximum of the likelihood becomes

′= +Σ LL Ψ

where / 2ˆ| |

| |

n

n

−

Λ =

Σ

S

ˆˆ ˆ ˆ ′= +Σ LL Ψ

The likelihood ratio takes the form

/ 2/ 2

/ 2

1max ( , )

ˆ(2 )

npn

npL e

π−

′= +=

µ,Σ LL Ψµ Σ

Σ

22

n For testing we may use that under H0

ˆ| |2 log log

| |n

n

− Λ =

Σ

S

is approximately chi-squared distributed with p(p+1)/2 – [p(m+1) – m(m – 1)/2] degrees of freedom

A Bartlett correction may be used to improve the approximation to the chi-squared distribution, cf, J&W

Example 9.8:

We register the scores in 6 subject areas for 220 students

Sample correlation matrix:

Factor rotation (cf. section 9.4 )

23

Maximum likelihood solution for two factors:

24

Plot factor loading pairs ; cf. Fig 9.1 1 2ˆ ˆ( , )i il l

Ga

E

H Original factors:• First factor: “general intelligence”• Second factor: “nonmath-math”

25

ArAl

Ge Rotated factors:• First factor: “mathematical ability”• Second factor: “verbal ability”

(subjectively rotate -20 degrees)

where the are the rotated loadings scaled by the square root of the communalities

The VARIMAX criterion for chooses the rotation matrix T that maximizes

2

*4 *2

1 1 1

1 1p pm

ij ijj i i

V l lp p= = =

= −

∑ ∑ ∑% %

* * ˆ/ij ij il l h=%

After the rotation the are multiplied by to *l% h

26

After the rotation the are multiplied by to preserve the original communalities

*ijl% ih

Note that

1

variance of squares of (scaled) loadings for th factor

m

jj

V=

∝ ∑

The criterion aims at “spreading out” the (square of) the loadings on each factor as much as possible

Example 9.9: In a consumer preference study a random sample of customers were asked to rate several attributes of a new product using a 7-point scale

Sample correlation matrix

27

Principal component solution with two factors and varimaxrotated factors

28

Rotated factors:• First factor: “nutritional”• Second factor: “taste”

29

Example 9.10 (and more)

We have found a two factor principal component solution as well as a maximum likelihood solution for the stock-price data (cf. examples 9.4 and 9.5)

30The loadings are quite different

Maximum likelihoodF1 F2

morgan 0.763 0.029 citi 0.819 0.232 fargo 0.668 0.108

Principal componentsF1 F2

0.852 0.0360.849 0.2200.812 0.085

Using the varimax criterion, we obtain the following rotated loadings

31

fargo 0.668 0.108 shell 0.112 0.994 exxon 0.109 0.675

0.812 0.0850.126 0.9120.078 0.910

The rotated loadings are quite similar

In matrix notation the factor model takes the form: p ...An observable random vector X of dimension...

Documents

Transcript of In matrix notation the factor model takes the form: p ...An observable random vector X of dimension...