Time series, spring 2014 - Chalmersrootzen/timeseries/timeseries... · The Yule-Walker estimates...
Transcript of Time series, spring 2014 - Chalmersrootzen/timeseries/timeseries... · The Yule-Walker estimates...
1
Time series, spring 2014 8
2
Slides for TS ch 8, p. 238-261
Exercises: 8.4, 8.7, 8.8, 8.14
3
Useful for
• order identification (requires the fitting of a number of competing models).
• initial parameter estimates for likelihood optimization.
ARMA(p,q) Model: Based on observations x1,..., xn,
from the model
φ(B) Xt = θ(B) Zt , {Zt} ~ WN(0,σ2),
estimate φ =(φ1, . . ., φp) and θ =(θ1, . . ., θq), with the orders p and q assumed known.
ARMA modelling and forecasting: preliminary estimation
4
AR(p) Processes:
Yule-Walker Estimation (moment estimates)
Burg Estimation
ARMA(p,q) Processes:
Innovations Algorithm
Hannan-Rissanen Estimates
5
Recall from Chapter 3, that
γ(0)−φ1γ(1) − − φp γ(p) = σ2
γ(k)−φ1γ(k-1) − − φp γ(k-p) = 0, k=1, . . .,p
or in matrix form,
Γp φ = γp
where Γp is the covariance matrix of X1, . . ., Xp , andφ = (φ1, . . ., φp)’, γp = (γ(1), . . ., γ(p))’. The Y-W
estimates are then found by replacing the ACVF 𝛾𝛾 ⋅ by it’s estimated value �𝛾𝛾 ⋅ .
Yule-Walker estimation for AR(p) processes
6
The Yule-Walker fitted model is causal.
• The sample ACVF and fitted model ACVF agree at lags h=0,1,. . . ,p.
• Estimates are asymptotically efficient (i.e. have the
same limit behavior as MLE)
�𝜙𝜙 ≈ 𝑁𝑁 𝜙𝜙,𝑛𝑛−1𝜎𝜎2Γ𝑝𝑝−1
7
One can apply the Durbin-Levinson algorithm to the sample ACVF in order to compute the Yule-Walker estimates recursively.
8
Ex: (Dow-Jones Utilities Index; DOWJ.DAT).Plot of D1, . . .,D78
0 10 20 30 40 50 60 70 80
110
115
120
Lag
ACF
0 10 20 30 40
-0.4
0.00.2
0.40.6
0.81.0
9
Lag
ACF
0 5 10 15 20 25 30
-0.2
0.0
0.2
0.4
0.6
0.8
1.0
ACF of differenced series Yt = Dt − Dt-1
Lag
PACF
0 5 10 15 20 25 30
-0.2
0.00.2
0.40.6
0.81.0
Fig 5.2
10
Preliminary Model for Dow-Jones:
Xt = .4219 Xt-1 + Zt , {Zt} ~ WN(0,.1479)(±.094)
where
Xt = Yt − .1336, Yt = Dt − Dt-1 ,
11
The Yule-Walker estimates (φ1, . . . , φp) satisfy
Pp Xp+1= φ1 Xp + + φp X1
where Pp is the prediction operator relative to the fitted Y-W model. As can be seen from the Durbin-Levinson algorithm, these coefficients are determined from the sample PACF,
α(h) = φhh , h = 0, . . ., p.
Burg’s algorithm uses a different estimate of the PACF based on minimizing the forward and backward one-step prediction errors.
~
~
^ ^
^ ^
^ ^
Burg’s algorithm
12
Properties of Burg’s estimates:
• The fitted model is causal.
• Estimates are asymptotically efficient (i.e. have the
same limit behavior as MLE).
• Estimates tend to perform better than the Y-W
estimates especially if a zero of the AR polynomial
is close to the unit circle.
13
Lag
ACF
0 10 20 30 40
-0.2
0.0
0.2
0.4
0.6
0.8
1.0
Lake Huron data
Lag
PACF
0 10 20 30 40
-0.2
0.00.2
0.40.6
0.81.0
14
The sample ACF and PACF’s suggest an AR(2) model for the mean corrected data
𝑋𝑋𝑡𝑡 = 𝑌𝑌𝑡𝑡 − 9.0041.
Burg’s model:
Xt = 1.0449 Xt-1 - .2456 Xt-2+ Zt , {Zt} ~ WN(0, .4705)
Y-W fitted model:Xt = 1.0583 Xt-1 - .2668 Xt-2+ Zt , {Zt} ~ WN(0, .4920)
Minimum AICC model using Burg or Y-W is p=2.Burg’s gives smaller WN variance and larger likelihood
Model for the Lake Huron data
15
The Yule-Walker estimates were found by applying the Durbin-Levinson Algorithm with the ACF replaced by the sample ACF. The Innovations algorithm estimates are obtained in the same way -- the ACF is replaced by the sample ACF.
The fitted innovations MA(m) model:
Xt = Zt + θm1 Zt -1 + + θmm Zt -m , {Zt} ~ WN(0,vm)
where θ =(θm1 , . . . , θmm)’ and vm are found from the innovations algorithm with the ACVF replaced by the sample ACVF.
^ ^ ^
^ ^ ^
The innovations algorithm
16
Innovations estimates for MA(q) model
Xt = Zt + θm,1 Zt -1 + + θm,q Zt -q , {Zt} ~ WN(0,vm),
where m=mn is a sequence of integers,𝑚𝑚𝑛𝑛 → ∞, 𝑚𝑚𝑛𝑛 = 𝑜𝑜(𝑛𝑛1/3)
^ ^ ^
17
Usually there is no true AR model. Goal is to findan AR model which represents the data well, in some sense.
Two Techniques:
• If {Xt} is an AR(p) process with {Zt} ~ IID(0,σ2), then
φmm= �𝛼𝛼(𝑚𝑚) ∼ Ν(0, n -1) for m > p. ( �𝛼𝛼(𝑚𝑚) is the sample
PACF). Estimate p as the smallest value m such that
| �𝛼𝛼(𝑘𝑘) | < 1.96 n-.5 for k > m.
• Estimate p by minimizing the AICC statistic
AICC = −2ln L(φp) + 2(p+1)n/(n−p−2)
where L( ) denotes the Gaussian likelihood.
Order selection for AR(p) processes
18
• If {Xt} is an MA(q) process with {Zt} ~ IID(0,σ2),
then ρ(m) ∼ Ν(0, n -1(1+2ρ2 (1) + + 2ρ2 (q))
for all 𝑚𝑚 > 𝑞𝑞. Estimate 𝑞𝑞 as the smallest value 𝑚𝑚 such that �𝜌𝜌(𝑚𝑚) is not significantly different from 0, for all k > m.
• Examine the coefficient vector in the fitted innovations
MA(m) model to see which are not significantly different
from 0.
• Estimate q by minimizing the AICC statistic
AICC = −2 ln L(θq) + 2(q+1)n/(n−q−2)
Order selection for MA(q) processes
19
Dow-Jones Data: Xt = Dt - Dt-1- .1336
Lag
ACF
0 5 10 15 20 25 30
-0.2
0.0
0.2
0.4
0.6
0.8
1.0
ACF suggests MA(2).
MA(2) model using innovations estimates: (m=17)
Xt = Zt+ .4269 Zt-1+ .2704 Zt-2 , {Zt} ~ WN(0,.1470)(±.114) (±.124)
20
MA(17) model:
MA Coefficient
.427 .270 .118 .159 .135 .157 .128 -.006
.015 -.003 .197 -.046 .202 .129 -.021 -.258
.076
Coefficient/(1.96*standard error)
1.911 1.113 .473 .631 .533 .613 .497 -.023
.057 -.006 .759 -.176 .767 .480 -.079 -.956
.276
21
Step 1. Use Innovations algorithm to fit MA(m) model
Xt = Zt + θm1 Zt -1 + + θmm Zt -m , {Zt} ~ WN(0,vm),
where m > p + q .
Step 2. Replace ψj by θmj in the following equations:
ψj = θj + φiψj-i , j = 1, . . . , p+q.
Step 3. Solve the equations in Step 2 for φ and θ .
^
i
j p
=∑
1
min( , )
Preliminary estimation for ARMA(p, q) process
22
Step 1. Fit a high order AR(m) (𝑚𝑚 > max(𝑝𝑝, 𝑞𝑞)) using
Yule -Walker estimates and compute estimated residuals
Zt = Xt − φm1 Xt-1− − φmmXt-m , t = m+1, . . . , n.
Step 2. Estimate φ and θ, by regressing Xt onto
(Xt -1, . . ., Xt -p, Zt -1, . . . , Zt -q), i.e. by minimizing the
sum of squares
S(φ , θ) = (Xt − φ1 Xt-1− − φpXt-p −θ1 Zt-1− − θqZt-q)2
^ ^^
^ ^
t m
n
= +∑
1
^ ^
Hannan-Rissanen algorithm
23
Lake Huron data: ARMA(1,1) model fitted using the Innovations and Hannan-Rissanen estimates for the mean corrected data 𝑋𝑋𝑡𝑡 = 𝑌𝑌𝑡𝑡 − 9.0041.
Innovations fitted model :
Xt − .7234 Xt-1 = Zt + .3596 Zt-1, {Zt} ~ WN(0,.4757) (±.115) (±.099)
Hannan-Rissanen fitted model:
Xt − .6961 Xt-1 = Zt + .3788 Zt-1, {Zt} ~ WN(0,.4757) (±.078) (±.147)
24
Ex: Suppose {Xt} is the invertible MA(1) process,
Xt = Zt + θ Zt-1
1. Method of moments (solve q/(1+ q2) = r(1)). Estimator:2. Innovations estimator3. Maximum likelihood estimator:
Asymptotic relative efficiency of to :
where 𝜎𝜎12(𝑞𝑞) and 𝜎𝜎22(𝑞𝑞) are the respective asymptotic variances
of the two estimators.
)1(ˆnθ
1,)2( ˆˆ
mn θ=θ)3(ˆ
nθ
)()()ˆ,ˆ,( 2
1
22)2()1(
θσθσ
=θθθ nne
)2(ˆnθ)1(ˆ
nθ
Asymptotic efficiency
25
If the relative efficiency is < 1 ( >1), then we say the second estimator is more (less) efficient than the first. For MA(1),
MLE more efficient than innovations algorithm which is more efficient than method of moments.
)()()ˆ,ˆ,( 2
1
22)2()1(
θσθσ
=θθθ nne
=θ=θ=θ
=θθθ.75.,06.,50.,37.,25.,82.
)ˆ,ˆ,( )2()1(nne
=θ=θ=θ
=θθθ.75.,44.,50.,75.,25.,94.
)ˆ,ˆ,( )3()2(nne
26
Suppose {𝑋𝑋𝑡𝑡} is a causal ARMA(p,q) process. If the noise is IID N(0,𝜎𝜎2), then based on the observations 𝑋𝑋1, . . . ,𝑋𝑋𝑛𝑛, the Gaussian likelihood is given by
𝐿𝐿 Γ𝑛𝑛 = 2𝜋𝜋 −𝑛𝑛/2 det Γ𝑛𝑛 −1/2 exp{−12𝑿𝑿𝑛𝑛′ Γ𝑛𝑛−1𝑿𝑿𝑛𝑛}
Direct calculation of det(Γ𝑛𝑛) and Γ𝑛𝑛−1 can be avoided by expressing this in terms of one-step predictors and their MSEs.To see this, note that
)ˆ()ˆ(ˆ
ˆˆ
111,1111,1 XXXXXX
XXXX
ttttttt
tttt
−θ++−θ+−=
+−=
−−−−−
Recursive calculation of the Gaussian likelihood
27
thus
hence
where
)ˆ(
ˆ
ˆˆˆ
10010010001
33
22
11
3,12,11,1
2122
11
3
2
1
nnnn
nnnnnnnnn
C
XX
XXXXXX
X
XXX
XXX −=
−
−−−
θθθ
θθθ
=
−−−−−−
'')ˆcov()cov( nnnnnnnnn CDCCC =−==Γ XXX
),,( 10 −= nn vvdiagD
28
From
we find that
Hence,
so that the Gaussian likelihood can be calculated as
'nnnn CDC=Γ ),,( 10 −= nn vvdiagD
111'110 and |||| −−−−
− =Γ==Γ nnnnnnn CDCvvD
∑=
−
−
−−−−
−=
−−=
−−=Γ
n
tttt
nnnnn
nnnnnnnnnnnn
vXX
D
CCDCC
11
2
1
111'1
/)ˆ(
)ˆ()'ˆ(
)ˆ(')'ˆ('
XXXX
XXXXXX
}/)ˆ(2/1exp{)()2()(1
122/1
102/ ∑
=−
−−
− −−π=Γn
ttttn
nn vXXvvL
29
Suppose {Xt} is a causal ARMA(p,q) process.
The Gaussian Likelihood :
where
The maximum likelihood estimators are found by maximizing the likelihood, or equivalently, the log-likelihood. Maximizing first wrt to 𝜎𝜎2 gives that
Where 𝜙𝜙 and 𝜃𝜃 minimize
∑=
−−
−− −
σ−πσ=σθφ
n
ttttn
n rXXrrL1
12
22/1
102/22 }/)ˆ(
21exp{)()2(),,(
21
2},,{ )ˆ( ,ˆ
11 ttttXXspt XXErXPXt
−=σ= −−
),(:/)ˆ(ˆ 1
11
212 θφ=−=σ −
=−
− ∑ SnrXXnn
tttt
∑=
−− +θφ=θφ
n
ttrSnl
11
1 ln)),(ln(),(
Maximum likelihood and lest squares
30
The quantity
is called the reduced (or profile) likelihood and does not depend on 𝜎𝜎2.
Alternatively, one could minimize the sum of squares,
Such estimators are called least squares estimates.
∑=
−− +θφ=θφ
n
ttrSnl
11
1 ln)),(ln(),(
∑=
−−=θφn
tttt rXXS
11
2 /)ˆ(),(
31
AICC Criterion:
Choose p, q, φp and θq to minimize
AICC = -2 ln L(φp , θq) + 2(p+q+1)n/(n−p−q−2).
For fixed p, q, AICC is minimized when φp , θq are the maximum likelihood estimates.
Large Sample Behavior of MLE :
For a large sample,
(φ , θ)’ is approx N((φ , θ)’ , n-1V(φ , θ) ) ^^
Order selection
32
Ex: Lake Huron Data
Preliminary ARMA(1,1) model:
Xt − .7234 Xt-1 = Zt + .3596 Zt-1, {Zt} ~ WN(0,.4757)
(Model fitted to the mean-corrected data, Xt = Yt -9.004 using the innovations algorithm)
Maximum likelihood ARMA(1,1) model:
Xt − .7453 Xt-1 = Zt + .3208 Zt-1, {Zt} ~ WN(0,.4750)(±.0771) (±.1116)
33
Maximum Likelihood AR(2) Model:
Xt − 1.0415 Xt-1 +.2494 Xt-2 = Zt , {Zt} ~ WN(0,.4790)
AICC = 213.54 (AR(2) Model)
AICC = 212.77 (ARMA(1,1) model)
Based on AICC, ARMA(1,1) model is slightly better.