Post on 08-Oct-2020
An observable random vector X of dimension p has mean vector µ and covariance matrix Σ
The (orthogonal) factor model postulates that X depends linearly on unobservable random variables (latent variables) F1, F2, ..., Fm, called common factors and p additional (unobservable) sources of variation ε1, ε2, ...., εp, called errors or specific factors
Estimation for the factor model (cf. section 9.3)
1
Model formulation:
1 1 11 1 12 2 1 1m mX l F l F l Fµ ε− = + + +L
errors or specific factors
1 1 2 2p p p p pm m pX l F l F l Fµ ε− = + + +L
The coefficient is called the loading of the i-th variable on the j-th factor
ijl
M
− = +X µ L F ε
In matrix notation the factor model takes the form:
Here is the p x m matrix of factor loadings,
is the m-dimensional vector of
common factors, and is the
p-vector of errors (or specific factors)
{ }ijl=L
[ ]1 2, , , mF F F ′=F K
1 2, , , pε ε ε ′ = ε K
2
( )
Cov( ) ( )
E
E
=′= =
F 0
F FF I
p-vector of errors (or specific factors)
Assumptions:
1 2
( )
Cov( ) ( ) diag{ , , , }p
E
E ψ ψ ψ=
′= = =ε 0
ε εε Ψ K
Cov( , )=ε F 0
The model implies that
cov( ) {( )( ) }E ′= = − −Σ X X µ X µ ′= +LL Ψ
We may write
m
Cov( , ) {( ) }E ′= −X F X µ F = L
3
Var( )ii iXσ = 2
1
m
ij ij
l ψ=
= +∑ 2def
communality specific variance
i ih ψ= +14243 14243
1
Cov( , )m
i k ij kjj
X X l l=
=∑
Cov( , )i j ijX F l=
Let T be a m x m orthogonal matrix
− = +X µ L F ε
The factor model may then be reformulated as
* *= +L F ε
where * *and ′= =L L T F T F
It is impossible on the basis of observations to
4
It is impossible on the basis of observations to distinguish the loadings L from the loadings L*
Thus the factor loadings are determined only up to rotation by an orthogonal matrix
There are different methods for estimation of the factor model
We will consider:
• estimation using principal components
• maximum likelihood estimation
5
• maximum likelihood estimation
For both methods the solution may be rotated by multiplication by an orthogonal matrix to simplify the interpretation of factors
Assume that Σ have eigenvalue-eigenvector pairs
where
By the spectral decomposition we may write
1 1 2 2( , ), ( , ), , ( , )p pλ λ λe e eK 1 2 0pλ λ λ≥ ≥ ≥ ≥L
1 1 1 2 2 2 p p pλ λ λ′ ′ ′= + + +Σ e e e e e eL
We first consider estimation using principal components
6
1 1
2 2
1 1 2 2 p p
p p
λ
λλ λ λ
λ
′ ′
= ′
e
ee e e
e
LLL
LLL
LLL
M M L MM
This is of the form with m=p and ′= +Σ LL Ψ =Ψ 0
If the last p – m eigenvalues are small, we may neglect
in the representation of Σ
The representation on the previous slide is not particularly useful since we seek a representation with just a few common factors
1 1 1m m m p p pλ λ+ + +′ ′+ +e e e eL
This gives:
7
1 1
2 2
1 1 2 2 m m
m m
λ
λλ λ λ
λ
′ ′ ≈ ′
e
eΣ e e e
e
LLL
LLL
LLL
M M L MM
This gives:
Allowing for specific factors (errors) we obtain
′≈ +Σ LL Ψ
1 1
2 2
1 1 2 2 m m
λ
λλ λ λ
λ
′ ′ = + ′
e
ee e e Ψ
e
LLL
LLL
LLL
M M L MM
8
where is given by 1 2diag{ , , , }pψ ψ ψ=Ψ K2
1
m
i ii ijj
lψ σ=
= −∑
m mλ ′ e
The estimation method using principal components (the principal factor method) is the empirical counterpart to this approximation
The matrix of estimated factor loadings is given by
Assume that S have eigenvalue-eigenvector pairs where
Let m < p be the number of common factors
1 1 2 2ˆ ˆ ˆˆ ˆ ˆ( , ), ( , ), , ( , )p pλ λ λe e eK 1 2
ˆ ˆ ˆ 0pλ λ λ≥ ≥ ≥ >L
Estimation using principal components:
1 1 2 2 m mλ λ λ = L e e e% M M L M
{ }ijl=L %%
9
1 1 2 2 m mλ λ λ = L e e eM M L M
The diagonal matrix of specific variances is estimated as
with 1 2diag{ , , , }pψ ψ ψ=Ψ% % % %K 2
1
m
i ii ijjs lψ
== −∑ %%
The communalities are estimated as 2 2
1
m
i ijjh l
==∑% %
Working with standardized variables, we replace S by the sample correlation matrix R
The total sample variance is
11 22 tr( )pps s s+ + + = SL
The contribution to this from the j-th factor is
2 2 21 2j j pjl l l+ + +% % %L ( ) ( )ˆ
j j j jλ λ′
= e e) ˆ
j j jλ ′= e e ˆjλ=
10
Thus contribution of total sample variance due to j-th factor is
11 22
ˆj
pps s s
λ+ + +L
(similar to principal components)
How do we determine the number of factors?(if not given apriori)
We want the m factors to explain a fairly large proportion of the total sample variance, so may chose m so that this is achieved (subjectively)
For factor analysis of the correlation matrix R one may
11
For factor analysis of the correlation matrix R one may let m be the number of eigenvalues larger than 1
One would also like the residual matrix
( )′− + ΨS L L% % %
to have small off-diagonal element (the diagonal elements are zero)
Example 9.3 (contd): In a consumer preference study a random sample of customers were asked to rate several attributes of a new product using a 7-point scale
Sample correlation matrix:
12
The first two eigenvalues are the only ones larger than 1 and a model with two common factors accounts for 93% of the total (standardized) sample variance
13
Example 9.4
Weekly rates of return for five stocks on New York Stock Exchange Jan 2004 through Dec 2005
14
15
Consider now estimation by maximum likelihood
Then X is multivariate normal with mean vector µand covariance matrix of the form
We then assume that the common factors and the specific factors are normally distributed
′= +Σ LL Ψ
Likelihood (based on n observations x1, x2, .... , xn ):
111( , ) exp ( ) ( )
n
L − ′= − − − ∑µ Σ x µ Σ x µ
Note that the likelihood depends on and via
16
1/ 2/ 2
1
1
2
1( , ) exp ( ) ( )
(2 )j jnnp
j
Lπ
−
=
′= − − −
∑µ Σ x µ Σ x µ
Σ
L Ψ Σ
Note also that the factor loadings and give the same value of the likelihood when is an orthogonal matrix
L * =L LTT
It is common to impose the restrictions that is a diagonal matrix
In order to make L well defined (corresponding to a particular choice of T ) we need to impose m(m-1)/2 restrictions on (the matrix T has m2 elements, and orthogonality imposes m(m+1)/2 restrictions)
1−′LΨ L
Under these restrictions there exist unique maximum likelihood estimates ˆˆ ˆ, (and )=L Ψ µ x
17
likelihood estimates ˆˆ ˆ, (and )=L Ψ µ x
The maximum likelihood estimates of the communalities are2 2 2 2
1 2ˆ ˆ ˆ ˆ ( 1,...., )i i i imh l l l i p= + + + =L
Contribution of total sample variance due to j-th factor:
2 2 21 2
11 22
ˆ ˆ ˆj j pj
pp
l l l
s s s
+ + ++ + +
L
L
In general the correlation matrix
12 1
21 2
1 2
1
1
1
p
p
p p
ρ ρρ ρ
ρ ρ
=
ρ
L
L
M M O M
L
may be written as where 1/ 2 1/ 2− −= V ΣVρ
18
is the inverse of the standard deviation matrix
may be written as where = V ΣVρ
11
221/ 2
1 0 0
0 1 0
0 0 1 pp
σσ
σ
−
=
V
L
L
M M O M
L
When we have that′= +Σ LL Ψ
1/ 2 1/ 2− −= V ΣVρ
1/ 2 1/ 2 1/ 2 1/ 2( )( )− − − −′= +V L V L V ΨV
′= +z z zL L Ψ
1/ 2 1/ 2( )− −′= +V LL Ψ V
19
where
1/ 2 1/ 2 1/ 2and (diagonal)− − −= =z zL V L Ψ V ΨV
We obtain maximum likelihood estimates by inserting
the maximum likelihood estimates for (details are given in Supplement 9A)
1/ 2, and−L V Ψ
Example 9.5
Weekly rates of return for five stocks on New York Stock Exchange Jan 2004 through Dec 2005
20
Under the assumption that X is multivariate normal with mean vector µ and covariance matrix Σ we may test the hypothesis that a factor model with m factors holds
0 :H ′= +Σ LL Ψ
We may use the likelihood ratio test
This corresponds to test the null hypothesis
21
We may use the likelihood ratio test
When we assume no structure on Σ the maximum of the likelihood becomes
/ 2/ 2/ 2
1max ( , )
(2 )np
nnpn
L eπ
−=µ,Σ
µ ΣS
where Sn = (n-1)S/n
When one may show that the maximum of the likelihood becomes
′= +Σ LL Ψ
where / 2ˆ| |
| |
n
n
−
Λ =
Σ
S
ˆˆ ˆ ˆ ′= +Σ LL Ψ
The likelihood ratio takes the form
/ 2/ 2
/ 2
1max ( , )
ˆ(2 )
npn
npL e
π−
′= +=
µ,Σ LL Ψµ Σ
Σ
22
n For testing we may use that under H0
ˆ| |2 log log
| |n
n
− Λ =
Σ
S
is approximately chi-squared distributed with p(p+1)/2 – [p(m+1) – m(m – 1)/2] degrees of freedom
A Bartlett correction may be used to improve the approximation to the chi-squared distribution, cf, J&W
Example 9.8:
We register the scores in 6 subject areas for 220 students
Sample correlation matrix:
Factor rotation (cf. section 9.4 )
23
Maximum likelihood solution for two factors:
24
Plot factor loading pairs ; cf. Fig 9.1 1 2ˆ ˆ( , )i il l
Ga
E
H Original factors:• First factor: “general intelligence”• Second factor: “nonmath-math”
25
ArAl
Ge Rotated factors:• First factor: “mathematical ability”• Second factor: “verbal ability”
(subjectively rotate -20 degrees)
where the are the rotated loadings scaled by the square root of the communalities
The VARIMAX criterion for chooses the rotation matrix T that maximizes
2
*4 *2
1 1 1
1 1p pm
ij ijj i i
V l lp p= = =
= −
∑ ∑ ∑% %
* * ˆ/ij ij il l h=%
After the rotation the are multiplied by to *l% h
26
After the rotation the are multiplied by to preserve the original communalities
*ijl% ih
Note that
1
variance of squares of (scaled) loadings for th factor
m
jj
V=
∝ ∑
The criterion aims at “spreading out” the (square of) the loadings on each factor as much as possible
Example 9.9: In a consumer preference study a random sample of customers were asked to rate several attributes of a new product using a 7-point scale
Sample correlation matrix
27
Principal component solution with two factors and varimaxrotated factors
28
Rotated factors:• First factor: “nutritional”• Second factor: “taste”
29
Example 9.10 (and more)
We have found a two factor principal component solution as well as a maximum likelihood solution for the stock-price data (cf. examples 9.4 and 9.5)
30The loadings are quite different
Maximum likelihoodF1 F2
morgan 0.763 0.029 citi 0.819 0.232 fargo 0.668 0.108
Principal componentsF1 F2
0.852 0.0360.849 0.2200.812 0.085
Using the varimax criterion, we obtain the following rotated loadings
31
fargo 0.668 0.108 shell 0.112 0.994 exxon 0.109 0.675
0.812 0.0850.126 0.9120.078 0.910
The rotated loadings are quite similar