Time series, spring 2014 - Chalmersrootzen/timeseries/timeseries... · The Yule-Walker estimates...

Post on 08-Apr-2020

6 views 0 download

Transcript of Time series, spring 2014 - Chalmersrootzen/timeseries/timeseries... · The Yule-Walker estimates...

1

Time series, spring 2014 8

2

Slides for TS ch 8, p. 238-261

Exercises: 8.4, 8.7, 8.8, 8.14

3

Useful for

• order identification (requires the fitting of a number of competing models).

• initial parameter estimates for likelihood optimization.

ARMA(p,q) Model: Based on observations x1,..., xn,

from the model

φ(B) Xt = θ(B) Zt , {Zt} ~ WN(0,σ2),

estimate φ =(φ1, . . ., φp) and θ =(θ1, . . ., θq), with the orders p and q assumed known.

ARMA modelling and forecasting: preliminary estimation

4

AR(p) Processes:

Yule-Walker Estimation (moment estimates)

Burg Estimation

ARMA(p,q) Processes:

Innovations Algorithm

Hannan-Rissanen Estimates

5

Recall from Chapter 3, that

γ(0)−φ1γ(1) − − φp γ(p) = σ2

γ(k)−φ1γ(k-1) − − φp γ(k-p) = 0, k=1, . . .,p

or in matrix form,

Γp φ = γp

where Γp is the covariance matrix of X1, . . ., Xp , andφ = (φ1, . . ., φp)’, γp = (γ(1), . . ., γ(p))’. The Y-W

estimates are then found by replacing the ACVF 𝛾𝛾 ⋅ by it’s estimated value �𝛾𝛾 ⋅ .

Yule-Walker estimation for AR(p) processes

6

The Yule-Walker fitted model is causal.

• The sample ACVF and fitted model ACVF agree at lags h=0,1,. . . ,p.

• Estimates are asymptotically efficient (i.e. have the

same limit behavior as MLE)

�𝜙𝜙 ≈ 𝑁𝑁 𝜙𝜙,𝑛𝑛−1𝜎𝜎2Γ𝑝𝑝−1

7

One can apply the Durbin-Levinson algorithm to the sample ACVF in order to compute the Yule-Walker estimates recursively.

8

Ex: (Dow-Jones Utilities Index; DOWJ.DAT).Plot of D1, . . .,D78

0 10 20 30 40 50 60 70 80

110

115

120

Lag

ACF

0 10 20 30 40

-0.4

0.00.2

0.40.6

0.81.0

9

Lag

ACF

0 5 10 15 20 25 30

-0.2

0.0

0.2

0.4

0.6

0.8

1.0

ACF of differenced series Yt = Dt − Dt-1

Lag

PACF

0 5 10 15 20 25 30

-0.2

0.00.2

0.40.6

0.81.0

Fig 5.2

10

Preliminary Model for Dow-Jones:

Xt = .4219 Xt-1 + Zt , {Zt} ~ WN(0,.1479)(±.094)

where

Xt = Yt − .1336, Yt = Dt − Dt-1 ,

11

The Yule-Walker estimates (φ1, . . . , φp) satisfy

Pp Xp+1= φ1 Xp + + φp X1

where Pp is the prediction operator relative to the fitted Y-W model. As can be seen from the Durbin-Levinson algorithm, these coefficients are determined from the sample PACF,

α(h) = φhh , h = 0, . . ., p.

Burg’s algorithm uses a different estimate of the PACF based on minimizing the forward and backward one-step prediction errors.

~

~

^ ^

^ ^

^ ^

Burg’s algorithm

12

Properties of Burg’s estimates:

• The fitted model is causal.

• Estimates are asymptotically efficient (i.e. have the

same limit behavior as MLE).

• Estimates tend to perform better than the Y-W

estimates especially if a zero of the AR polynomial

is close to the unit circle.

13

Lag

ACF

0 10 20 30 40

-0.2

0.0

0.2

0.4

0.6

0.8

1.0

Lake Huron data

Lag

PACF

0 10 20 30 40

-0.2

0.00.2

0.40.6

0.81.0

14

The sample ACF and PACF’s suggest an AR(2) model for the mean corrected data

𝑋𝑋𝑡𝑡 = 𝑌𝑌𝑡𝑡 − 9.0041.

Burg’s model:

Xt = 1.0449 Xt-1 - .2456 Xt-2+ Zt , {Zt} ~ WN(0, .4705)

Y-W fitted model:Xt = 1.0583 Xt-1 - .2668 Xt-2+ Zt , {Zt} ~ WN(0, .4920)

Minimum AICC model using Burg or Y-W is p=2.Burg’s gives smaller WN variance and larger likelihood

Model for the Lake Huron data

15

The Yule-Walker estimates were found by applying the Durbin-Levinson Algorithm with the ACF replaced by the sample ACF. The Innovations algorithm estimates are obtained in the same way -- the ACF is replaced by the sample ACF.

The fitted innovations MA(m) model:

Xt = Zt + θm1 Zt -1 + + θmm Zt -m , {Zt} ~ WN(0,vm)

where θ =(θm1 , . . . , θmm)’ and vm are found from the innovations algorithm with the ACVF replaced by the sample ACVF.

^ ^ ^

^ ^ ^

The innovations algorithm

16

Innovations estimates for MA(q) model

Xt = Zt + θm,1 Zt -1 + + θm,q Zt -q , {Zt} ~ WN(0,vm),

where m=mn is a sequence of integers,𝑚𝑚𝑛𝑛 → ∞, 𝑚𝑚𝑛𝑛 = 𝑜𝑜(𝑛𝑛1/3)

^ ^ ^

17

Usually there is no true AR model. Goal is to findan AR model which represents the data well, in some sense.

Two Techniques:

• If {Xt} is an AR(p) process with {Zt} ~ IID(0,σ2), then

φmm= �𝛼𝛼(𝑚𝑚) ∼ Ν(0, n -1) for m > p. ( �𝛼𝛼(𝑚𝑚) is the sample

PACF). Estimate p as the smallest value m such that

| �𝛼𝛼(𝑘𝑘) | < 1.96 n-.5 for k > m.

• Estimate p by minimizing the AICC statistic

AICC = −2ln L(φp) + 2(p+1)n/(n−p−2)

where L( ) denotes the Gaussian likelihood.

Order selection for AR(p) processes

18

• If {Xt} is an MA(q) process with {Zt} ~ IID(0,σ2),

then ρ(m) ∼ Ν(0, n -1(1+2ρ2 (1) + + 2ρ2 (q))

for all 𝑚𝑚 > 𝑞𝑞. Estimate 𝑞𝑞 as the smallest value 𝑚𝑚 such that �𝜌𝜌(𝑚𝑚) is not significantly different from 0, for all k > m.

• Examine the coefficient vector in the fitted innovations

MA(m) model to see which are not significantly different

from 0.

• Estimate q by minimizing the AICC statistic

AICC = −2 ln L(θq) + 2(q+1)n/(n−q−2)

Order selection for MA(q) processes

19

Dow-Jones Data: Xt = Dt - Dt-1- .1336

Lag

ACF

0 5 10 15 20 25 30

-0.2

0.0

0.2

0.4

0.6

0.8

1.0

ACF suggests MA(2).

MA(2) model using innovations estimates: (m=17)

Xt = Zt+ .4269 Zt-1+ .2704 Zt-2 , {Zt} ~ WN(0,.1470)(±.114) (±.124)

20

MA(17) model:

MA Coefficient

.427 .270 .118 .159 .135 .157 .128 -.006

.015 -.003 .197 -.046 .202 .129 -.021 -.258

.076

Coefficient/(1.96*standard error)

1.911 1.113 .473 .631 .533 .613 .497 -.023

.057 -.006 .759 -.176 .767 .480 -.079 -.956

.276

21

Step 1. Use Innovations algorithm to fit MA(m) model

Xt = Zt + θm1 Zt -1 + + θmm Zt -m , {Zt} ~ WN(0,vm),

where m > p + q .

Step 2. Replace ψj by θmj in the following equations:

ψj = θj + φiψj-i , j = 1, . . . , p+q.

Step 3. Solve the equations in Step 2 for φ and θ .

^

i

j p

=∑

1

min( , )

Preliminary estimation for ARMA(p, q) process

22

Step 1. Fit a high order AR(m) (𝑚𝑚 > max(𝑝𝑝, 𝑞𝑞)) using

Yule -Walker estimates and compute estimated residuals

Zt = Xt − φm1 Xt-1− − φmmXt-m , t = m+1, . . . , n.

Step 2. Estimate φ and θ, by regressing Xt onto

(Xt -1, . . ., Xt -p, Zt -1, . . . , Zt -q), i.e. by minimizing the

sum of squares

S(φ , θ) = (Xt − φ1 Xt-1− − φpXt-p −θ1 Zt-1− − θqZt-q)2

^ ^^

^ ^

t m

n

= +∑

1

^ ^

Hannan-Rissanen algorithm

23

Lake Huron data: ARMA(1,1) model fitted using the Innovations and Hannan-Rissanen estimates for the mean corrected data 𝑋𝑋𝑡𝑡 = 𝑌𝑌𝑡𝑡 − 9.0041.

Innovations fitted model :

Xt − .7234 Xt-1 = Zt + .3596 Zt-1, {Zt} ~ WN(0,.4757) (±.115) (±.099)

Hannan-Rissanen fitted model:

Xt − .6961 Xt-1 = Zt + .3788 Zt-1, {Zt} ~ WN(0,.4757) (±.078) (±.147)

24

Ex: Suppose {Xt} is the invertible MA(1) process,

Xt = Zt + θ Zt-1

1. Method of moments (solve q/(1+ q2) = r(1)). Estimator:2. Innovations estimator3. Maximum likelihood estimator:

Asymptotic relative efficiency of to :

where 𝜎𝜎12(𝑞𝑞) and 𝜎𝜎22(𝑞𝑞) are the respective asymptotic variances

of the two estimators.

)1(ˆnθ

1,)2( ˆˆ

mn θ=θ)3(ˆ

)()()ˆ,ˆ,( 2

1

22)2()1(

θσθσ

=θθθ nne

)2(ˆnθ)1(ˆ

Asymptotic efficiency

25

If the relative efficiency is < 1 ( >1), then we say the second estimator is more (less) efficient than the first. For MA(1),

MLE more efficient than innovations algorithm which is more efficient than method of moments.

)()()ˆ,ˆ,( 2

1

22)2()1(

θσθσ

=θθθ nne

=θ=θ=θ

=θθθ.75.,06.,50.,37.,25.,82.

)ˆ,ˆ,( )2()1(nne

=θ=θ=θ

=θθθ.75.,44.,50.,75.,25.,94.

)ˆ,ˆ,( )3()2(nne

26

Suppose {𝑋𝑋𝑡𝑡} is a causal ARMA(p,q) process. If the noise is IID N(0,𝜎𝜎2), then based on the observations 𝑋𝑋1, . . . ,𝑋𝑋𝑛𝑛, the Gaussian likelihood is given by

𝐿𝐿 Γ𝑛𝑛 = 2𝜋𝜋 −𝑛𝑛/2 det Γ𝑛𝑛 −1/2 exp{−12𝑿𝑿𝑛𝑛′ Γ𝑛𝑛−1𝑿𝑿𝑛𝑛}

Direct calculation of det(Γ𝑛𝑛) and Γ𝑛𝑛−1 can be avoided by expressing this in terms of one-step predictors and their MSEs.To see this, note that

)ˆ()ˆ(ˆ

ˆˆ

111,1111,1 XXXXXX

XXXX

ttttttt

tttt

−θ++−θ+−=

+−=

−−−−−

Recursive calculation of the Gaussian likelihood

27

thus

hence

where

)ˆ(

ˆ

ˆˆˆ

10010010001

33

22

11

3,12,11,1

2122

11

3

2

1

nnnn

nnnnnnnnn

C

XX

XXXXXX

X

XXX

XXX −=

−−−

θθθ

θθθ

=

−−−−−−

'')ˆcov()cov( nnnnnnnnn CDCCC =−==Γ XXX

),,( 10 −= nn vvdiagD

28

From

we find that

Hence,

so that the Gaussian likelihood can be calculated as

'nnnn CDC=Γ ),,( 10 −= nn vvdiagD

111'110 and |||| −−−−

− =Γ==Γ nnnnnnn CDCvvD

∑=

−−−−

−=

−−=

−−=Γ

n

tttt

nnnnn

nnnnnnnnnnnn

vXX

D

CCDCC

11

2

1

111'1

/)ˆ(

)ˆ()'ˆ(

)ˆ(')'ˆ('

XXXX

XXXXXX

}/)ˆ(2/1exp{)()2()(1

122/1

102/ ∑

=−

−−

− −−π=Γn

ttttn

nn vXXvvL

29

Suppose {Xt} is a causal ARMA(p,q) process.

The Gaussian Likelihood :

where

The maximum likelihood estimators are found by maximizing the likelihood, or equivalently, the log-likelihood. Maximizing first wrt to 𝜎𝜎2 gives that

Where 𝜙𝜙 and 𝜃𝜃 minimize

∑=

−−

−− −

σ−πσ=σθφ

n

ttttn

n rXXrrL1

12

22/1

102/22 }/)ˆ(

21exp{)()2(),,(

21

2},,{ )ˆ( ,ˆ

11 ttttXXspt XXErXPXt

−=σ= −−

),(:/)ˆ(ˆ 1

11

212 θφ=−=σ −

=−

− ∑ SnrXXnn

tttt

∑=

−− +θφ=θφ

n

ttrSnl

11

1 ln)),(ln(),(

Maximum likelihood and lest squares

30

The quantity

is called the reduced (or profile) likelihood and does not depend on 𝜎𝜎2.

Alternatively, one could minimize the sum of squares,

Such estimators are called least squares estimates.

∑=

−− +θφ=θφ

n

ttrSnl

11

1 ln)),(ln(),(

∑=

−−=θφn

tttt rXXS

11

2 /)ˆ(),(

31

AICC Criterion:

Choose p, q, φp and θq to minimize

AICC = -2 ln L(φp , θq) + 2(p+q+1)n/(n−p−q−2).

For fixed p, q, AICC is minimized when φp , θq are the maximum likelihood estimates.

Large Sample Behavior of MLE :

For a large sample,

(φ , θ)’ is approx N((φ , θ)’ , n-1V(φ , θ) ) ^^

Order selection

32

Ex: Lake Huron Data

Preliminary ARMA(1,1) model:

Xt − .7234 Xt-1 = Zt + .3596 Zt-1, {Zt} ~ WN(0,.4757)

(Model fitted to the mean-corrected data, Xt = Yt -9.004 using the innovations algorithm)

Maximum likelihood ARMA(1,1) model:

Xt − .7453 Xt-1 = Zt + .3208 Zt-1, {Zt} ~ WN(0,.4750)(±.0771) (±.1116)

33

Maximum Likelihood AR(2) Model:

Xt − 1.0415 Xt-1 +.2494 Xt-2 = Zt , {Zt} ~ WN(0,.4790)

AICC = 213.54 (AR(2) Model)

AICC = 212.77 (ARMA(1,1) model)

Based on AICC, ARMA(1,1) model is slightly better.