Model choice. Akaike’s criterion Akaike criterion ...

60
Model choice. Akaike’s criterion Akaike criterion: Kullback-Leibler discrepancy Given a family of probability densities ff(; ); 2 g, Kullback-Leibler’s index off ( ; ) relative to f ( ; ) is ( j)=E( 2 log(f (X ; ))) = Z R n 2log(f (x; ))f (x; ) dx: Kullback-Leibler’s discrepancy between f ( ; ) and f ( ; ) is d( j ) = ( j) ( j )= Z R n 2 log f(x; ) f (x; ) f (x; ) dx: 24 novembre 2014 1 / 29

Transcript of Model choice. Akaike’s criterion Akaike criterion ...

Page 1: Model choice. Akaike’s criterion Akaike criterion ...

Model choice. Akaike’s criterion

Akaike criterion: Kullback-Leibler discrepancy

Given a family of probability densities {f (·;ψ), ψ ∈ Ψ}, Kullback-Leibler’sindex of f (·;ψ) relative to f (·; θ) is

∆(ψ|θ) = Eθ(−2 log(f (X ;ψ))) =

∫Rn

−2 log(f (x ;ψ))f (x ; θ) dx .

Kullback-Leibler’s discrepancy between f (·;ψ) and f (·; θ) is

d(ψ|θ) = ∆(ψ|θ)−∆(θ|θ) =

∫Rn

−2 log

(f (x ;ψ)

f (x ; θ)

)f (x ; θ) dx .

Jensen’s inequality implies E(log(Y )) ≤ log(E(Y )) for any randomvariable. Hence

d(ψ|θ) ≥ −2 log

(∫Rn

f (x ;ψ)

f (x ; θ)f (x ; θ) dx

)= 0

with equality only if f (x ;ψ) = f (x ; θ) a.e. [f (·; θ)].

24 novembre 2014 1 / 29

Page 2: Model choice. Akaike’s criterion Akaike criterion ...

Model choice. Akaike’s criterion

Akaike criterion: Kullback-Leibler discrepancy

Given a family of probability densities {f (·;ψ), ψ ∈ Ψ}, Kullback-Leibler’sindex of f (·;ψ) relative to f (·; θ) is

∆(ψ|θ) = Eθ(−2 log(f (X ;ψ))) =

∫Rn

−2 log(f (x ;ψ))f (x ; θ) dx .

Kullback-Leibler’s discrepancy between f (·;ψ) and f (·; θ) is

d(ψ|θ) = ∆(ψ|θ)−∆(θ|θ) =

∫Rn

−2 log

(f (x ;ψ)

f (x ; θ)

)f (x ; θ) dx .

Jensen’s inequality implies E(log(Y )) ≤ log(E(Y )) for any randomvariable. Hence

d(ψ|θ) ≥ −2 log

(∫Rn

f (x ;ψ)

f (x ; θ)f (x ; θ) dx

)= 0

with equality only if f (x ;ψ) = f (x ; θ) a.e. [f (·; θ)].

24 novembre 2014 1 / 29

Page 3: Model choice. Akaike’s criterion Akaike criterion ...

Model choice. Akaike’s criterion

Approximating Kullback-Leibler discrepancy

Given observations X1, . . . ,Xn, we would like to minimize d(ψ|θ) amongall candidate models ψ, given the true model θ.

As the true model is unknown, we estimate d(ψ|θ). Let ψ = (φ, ϑ, σ2)the parameters of an ARMA(p,q) model and ψ the MLE based onX1, . . . ,Xn. Let Y an independent realization of the same process. Then

−2 log LY (φ, ϑ, σ2) = n log(2π) + n log(σ2) + log(r0 . . . rn−1) +SY (φ, ϑ)

σ2

Indeed remember that for an ARMA(p,q) process

L(φ, ϑ, σ2) = (2πσ2)−n/2(r0 . . . rn−1)−1/2 exp

{− 1

2σ2S(φ, ϑ)

}with S(φ, ϑ) =

n∑j=1

(xj − xj)2

rj−1.

r0, . . . , rn−1 depend only on parameters (φ, ϑ) and not on observed data.Data enter likelihood only through the terms (xj − xj)

2 in S(φ, ϑ).

24 novembre 2014 2 / 29

Page 4: Model choice. Akaike’s criterion Akaike criterion ...

Model choice. Akaike’s criterion

Approximating Kullback-Leibler discrepancy

Given observations X1, . . . ,Xn, we would like to minimize d(ψ|θ) amongall candidate models ψ, given the true model θ.As the true model is unknown, we estimate d(ψ|θ).

Let ψ = (φ, ϑ, σ2)the parameters of an ARMA(p,q) model and ψ the MLE based onX1, . . . ,Xn. Let Y an independent realization of the same process. Then

−2 log LY (φ, ϑ, σ2) = n log(2π) + n log(σ2) + log(r0 . . . rn−1) +SY (φ, ϑ)

σ2

Indeed remember that for an ARMA(p,q) process

L(φ, ϑ, σ2) = (2πσ2)−n/2(r0 . . . rn−1)−1/2 exp

{− 1

2σ2S(φ, ϑ)

}with S(φ, ϑ) =

n∑j=1

(xj − xj)2

rj−1.

r0, . . . , rn−1 depend only on parameters (φ, ϑ) and not on observed data.Data enter likelihood only through the terms (xj − xj)

2 in S(φ, ϑ).

24 novembre 2014 2 / 29

Page 5: Model choice. Akaike’s criterion Akaike criterion ...

Model choice. Akaike’s criterion

Approximating Kullback-Leibler discrepancy

Given observations X1, . . . ,Xn, we would like to minimize d(ψ|θ) amongall candidate models ψ, given the true model θ.As the true model is unknown, we estimate d(ψ|θ). Let ψ = (φ, ϑ, σ2)the parameters of an ARMA(p,q) model and ψ the MLE based onX1, . . . ,Xn. Let Y an independent realization of the same process. Then

−2 log LY (φ, ϑ, σ2) = n log(2π) + n log(σ2) + log(r0 . . . rn−1) +SY (φ, ϑ)

σ2

Indeed remember that for an ARMA(p,q) process

L(φ, ϑ, σ2) = (2πσ2)−n/2(r0 . . . rn−1)−1/2 exp

{− 1

2σ2S(φ, ϑ)

}with S(φ, ϑ) =

n∑j=1

(xj − xj)2

rj−1.

r0, . . . , rn−1 depend only on parameters (φ, ϑ) and not on observed data.Data enter likelihood only through the terms (xj − xj)

2 in S(φ, ϑ).

24 novembre 2014 2 / 29

Page 6: Model choice. Akaike’s criterion Akaike criterion ...

Model choice. Akaike’s criterion

Approximating Kullback-Leibler discrepancy

Given observations X1, . . . ,Xn, we would like to minimize d(ψ|θ) amongall candidate models ψ, given the true model θ.As the true model is unknown, we estimate d(ψ|θ). Let ψ = (φ, ϑ, σ2)the parameters of an ARMA(p,q) model and ψ the MLE based onX1, . . . ,Xn. Let Y an independent realization of the same process. Then

−2 log LY (φ, ϑ, σ2) = n log(2π) + n log(σ2) + log(r0 . . . rn−1) +SY (φ, ϑ)

σ2

Indeed remember that for an ARMA(p,q) process

L(φ, ϑ, σ2) = (2πσ2)−n/2(r0 . . . rn−1)−1/2 exp

{− 1

2σ2S(φ, ϑ)

}with S(φ, ϑ) =

n∑j=1

(xj − xj)2

rj−1.

r0, . . . , rn−1 depend only on parameters (φ, ϑ) and not on observed data.Data enter likelihood only through the terms (xj − xj)

2 in S(φ, ϑ).24 novembre 2014 2 / 29

Page 7: Model choice. Akaike’s criterion Akaike criterion ...

Model choice. Akaike’s criterion

Approximating Kullback-Leibler discrepancy

Given observations X1, . . . ,Xn, we would like to minimize d(ψ|θ) amongall candidate models ψ, given the true model θ.As the true model is unknown, we estimate d(ψ|θ). Let ψ = (φ, ϑ, σ2) theparameters of an ARMA(p,q) model and ψ the MLE based on X1, . . . ,Xn.Let Y an independent realization of the same process. Then

−2 log LY (φ, ϑ, σ2) = n log(2π) + n log(σ2) + log(r0 . . . rn−1) +SY (φ, ϑ)

σ2

= −2 log LX (φ, ϑ, σ2) +SY (φ, ϑ)

σ2− SX (φ, ϑ)

σ2

= −2 log LX (φ, ϑ, σ2) +SY (φ, ϑ)

σ2− n =⇒

Eθ(∆(ψ|θ)) = E(φ,ϑ,σ2)(−2 log LX (φ, ϑ, σ2)) + E(φ,ϑ,σ2)

(SY (φ, ϑ)

σ2

)− n.

24 novembre 2014 3 / 29

Page 8: Model choice. Akaike’s criterion Akaike criterion ...

Model choice. Akaike’s criterion

Approximating Kullback-Leibler discrepancy

Given observations X1, . . . ,Xn, we would like to minimize d(ψ|θ) amongall candidate models ψ, given the true model θ.As the true model is unknown, we estimate d(ψ|θ). Let ψ = (φ, ϑ, σ2) theparameters of an ARMA(p,q) model and ψ the MLE based on X1, . . . ,Xn.Let Y an independent realization of the same process. Then

−2 log LY (φ, ϑ, σ2) = n log(2π) + n log(σ2) + log(r0 . . . rn−1) +SY (φ, ϑ)

σ2

= −2 log LX (φ, ϑ, σ2) +SY (φ, ϑ)

σ2− SX (φ, ϑ)

σ2

= −2 log LX (φ, ϑ, σ2) +SY (φ, ϑ)

σ2− n =⇒

Eθ(∆(ψ|θ)) = E(φ,ϑ,σ2)(−2 log LX (φ, ϑ, σ2)) + E(φ,ϑ,σ2)

(SY (φ, ϑ)

σ2

)− n.

24 novembre 2014 3 / 29

Page 9: Model choice. Akaike’s criterion Akaike criterion ...

Model choice. Akaike’s criterion

Approximating Kullback-Leibler discrepancy

Given observations X1, . . . ,Xn, we would like to minimize d(ψ|θ) amongall candidate models ψ, given the true model θ.As the true model is unknown, we estimate d(ψ|θ). Let ψ = (φ, ϑ, σ2) theparameters of an ARMA(p,q) model and ψ the MLE based on X1, . . . ,Xn.Let Y an independent realization of the same process. Then

−2 log LY (φ, ϑ, σ2) = n log(2π) + n log(σ2) + log(r0 . . . rn−1) +SY (φ, ϑ)

σ2

= −2 log LX (φ, ϑ, σ2) +SY (φ, ϑ)

σ2− SX (φ, ϑ)

σ2

= −2 log LX (φ, ϑ, σ2) +SY (φ, ϑ)

σ2− n =⇒

Eθ(∆(ψ|θ)) = E(φ,ϑ,σ2)(−2 log LX (φ, ϑ, σ2)) + E(φ,ϑ,σ2)

(SY (φ, ϑ)

σ2

)− n.

24 novembre 2014 3 / 29

Page 10: Model choice. Akaike’s criterion Akaike criterion ...

Model choice. Akaike’s criterion

Approximating Kullback-Leibler discrepancy

Given observations X1, . . . ,Xn, we would like to minimize d(ψ|θ) amongall candidate models ψ, given the true model θ.As the true model is unknown, we estimate d(ψ|θ). Let ψ = (φ, ϑ, σ2) theparameters of an ARMA(p,q) model and ψ the MLE based on X1, . . . ,Xn.Let Y an independent realization of the same process. Then

−2 log LY (φ, ϑ, σ2) = n log(2π) + n log(σ2) + log(r0 . . . rn−1) +SY (φ, ϑ)

σ2

= −2 log LX (φ, ϑ, σ2) +SY (φ, ϑ)

σ2− SX (φ, ϑ)

σ2

= −2 log LX (φ, ϑ, σ2) +SY (φ, ϑ)

σ2− n =⇒

Eθ(∆(ψ|θ)) = E(φ,ϑ,σ2)(−2 log LX (φ, ϑ, σ2)) + E(φ,ϑ,σ2)

(SY (φ, ϑ)

σ2

)− n.

24 novembre 2014 3 / 29

Page 11: Model choice. Akaike’s criterion Akaike criterion ...

Model choice. Akaike’s criterion

Kullback-Leibler discrepancy and AICC

Using linear approximations, and asymptotic distributions of estimators,one arrives at

E(φ,ϑ,σ2)

(SY (φ, ϑ)

)≈ σ2(n + p + q).

Similarly nσ2 = SX (φ, ϑ) for large n is distributed as σ2χ2(n − p − q − 2)and is asymptotically independent of (φ, ϑ).

Hence

E(φ,ϑ,σ2)

(SY (φ, ϑ)

σ2

)≈ σ2(n + p + q)

σ2(n − p − q − 2)/n

FromEθ(∆(ψ|θ)) = E(φ,ϑ,σ2)(−2 log LX (φ, ϑ, σ2)) + E(φ,ϑ,σ2)

(SY (φ,ϑ)

σ2

)− n

AICC = −2 log LX (φ, ϑ, σ2) +2(p + q + 1)n

n − p − q − 2

is an approximate unbiased estimate of ∆(θ|θ).

24 novembre 2014 4 / 29

Page 12: Model choice. Akaike’s criterion Akaike criterion ...

Model choice. Akaike’s criterion

Kullback-Leibler discrepancy and AICC

Using linear approximations, and asymptotic distributions of estimators,one arrives at

E(φ,ϑ,σ2)

(SY (φ, ϑ)

)≈ σ2(n + p + q).

Similarly nσ2 = SX (φ, ϑ) for large n is distributed as σ2χ2(n − p − q − 2)and is asymptotically independent of (φ, ϑ). Hence

E(φ,ϑ,σ2)

(SY (φ, ϑ)

σ2

)≈ σ2(n + p + q)

σ2(n − p − q − 2)/n

FromEθ(∆(ψ|θ)) = E(φ,ϑ,σ2)(−2 log LX (φ, ϑ, σ2)) + E(φ,ϑ,σ2)

(SY (φ,ϑ)

σ2

)− n

AICC = −2 log LX (φ, ϑ, σ2) +2(p + q + 1)n

n − p − q − 2

is an approximate unbiased estimate of ∆(θ|θ).

24 novembre 2014 4 / 29

Page 13: Model choice. Akaike’s criterion Akaike criterion ...

Model choice. Akaike’s criterion

Kullback-Leibler discrepancy and AICC

Using linear approximations, and asymptotic distributions of estimators,one arrives at

E(φ,ϑ,σ2)

(SY (φ, ϑ)

)≈ σ2(n + p + q).

Similarly nσ2 = SX (φ, ϑ) for large n is distributed as σ2χ2(n − p − q − 2)and is asymptotically independent of (φ, ϑ). Hence

E(φ,ϑ,σ2)

(SY (φ, ϑ)

σ2

)≈ σ2(n + p + q)

σ2(n − p − q − 2)/n

FromEθ(∆(ψ|θ)) = E(φ,ϑ,σ2)(−2 log LX (φ, ϑ, σ2)) + E(φ,ϑ,σ2)

(SY (φ,ϑ)

σ2

)− n

AICC = −2 log LX (φ, ϑ, σ2) +2(p + q + 1)n

n − p − q − 2

is an approximate unbiased estimate of ∆(θ|θ).24 novembre 2014 4 / 29

Page 14: Model choice. Akaike’s criterion Akaike criterion ...

Model choice. Akaike’s criterion

Criteria for model choice

The order is chosen by minimizing the value of AICC (Corrected Akaike’s

Information Criterion): −2 log LX (φ, ϑ, σ2) + 2(p+q+1)nn−p−q−2 .

The second term can be considered a penalty for models with a largenumber of parameters.

For n large it is approximately the same as Akaike’s information Criterion(AIC): −2 log LX (φ, ϑ, σ2) + 2(p + q + 1), but carries a higher penalty forfinite n, and thus is somewhat less likely to overfit.In R: AICC <- AIC(myfit,k=2*n/(n-p-q-2))

A rule of thumb is the fits of model 1 and model 2 are not significantlydifferent if |AICC1 − AICC2| < 2 (only the difference matters, not theabsolute value of AICC ). Hence, we may decide to choose model 1 if itsimpler than 2 (or its residuals are closer to white-noise) even ifAICC1 > AICC2 as long as AICC1 < AICC2 + 2.

24 novembre 2014 5 / 29

Page 15: Model choice. Akaike’s criterion Akaike criterion ...

Model choice. Akaike’s criterion

Criteria for model choice

The order is chosen by minimizing the value of AICC (Corrected Akaike’s

Information Criterion): −2 log LX (φ, ϑ, σ2) + 2(p+q+1)nn−p−q−2 .

The second term can be considered a penalty for models with a largenumber of parameters.

For n large it is approximately the same as Akaike’s information Criterion(AIC): −2 log LX (φ, ϑ, σ2) + 2(p + q + 1), but carries a higher penalty forfinite n, and thus is somewhat less likely to overfit.In R: AICC <- AIC(myfit,k=2*n/(n-p-q-2))

A rule of thumb is the fits of model 1 and model 2 are not significantlydifferent if |AICC1 − AICC2| < 2 (only the difference matters, not theabsolute value of AICC ). Hence, we may decide to choose model 1 if itsimpler than 2 (or its residuals are closer to white-noise) even ifAICC1 > AICC2 as long as AICC1 < AICC2 + 2.

24 novembre 2014 5 / 29

Page 16: Model choice. Akaike’s criterion Akaike criterion ...

Model choice. Akaike’s criterion

Criteria for model choice

The order is chosen by minimizing the value of AICC (Corrected Akaike’s

Information Criterion): −2 log LX (φ, ϑ, σ2) + 2(p+q+1)nn−p−q−2 .

The second term can be considered a penalty for models with a largenumber of parameters.

For n large it is approximately the same as Akaike’s information Criterion(AIC): −2 log LX (φ, ϑ, σ2) + 2(p + q + 1), but carries a higher penalty forfinite n, and thus is somewhat less likely to overfit.In R: AICC <- AIC(myfit,k=2*n/(n-p-q-2))

A rule of thumb is the fits of model 1 and model 2 are not significantlydifferent if |AICC1 − AICC2| < 2 (only the difference matters, not theabsolute value of AICC ). Hence, we may decide to choose model 1 if itsimpler than 2 (or its residuals are closer to white-noise) even ifAICC1 > AICC2 as long as AICC1 < AICC2 + 2.

24 novembre 2014 5 / 29

Page 17: Model choice. Akaike’s criterion Akaike criterion ...

Model choice. Akaike’s criterion

Tests on residuals

Xt(ϕ, ϑ) predicted value of Xt given the estimates (ϕ, ϑ).

Wt =Xt − Xt(ϕ, ϑ)(rt−1(ϕ, ϑ)

)1/2 standardized residuals.

Portmanteau tests on ACF of Wt : Box-Pierce; Ljung-Box;

Test on turning points

Rank tests

. . .

24 novembre 2014 6 / 29

Page 18: Model choice. Akaike’s criterion Akaike criterion ...

Multivariate time-series

Autocovariance

A mutivariate stochastic process {Xt ∈ Rm}, t ∈ Z is weakly stationary if

E(X 2t,i ) <∞ ∀ t, i E(Xt) ≡ µ, Cov(Xt+h,Xt) ≡ Γ(h).

In particular γij(h) = Cov(Xt+h,i ,Xt,j) = E((Xt+h,i − µi )(Xt,j − µj)).

Note that in general γij(h) 6= γji (h), while

γij(h) = Cov(Xt+h,i ,Xt,j) = (stationarity) = Cov(Xt,i ,Xt−h,j)

= (symmetry) = Cov(Xt−h,j ,Xt,i ) = γji (−h).

Another simple property is |γi ,j(h)| ≤ (γii (0)γjj(0))1/2 .

The ACF ρij(h) =γij(h)

(γii (0)γjj(0))1/2.

24 novembre 2014 7 / 29

Page 19: Model choice. Akaike’s criterion Akaike criterion ...

Multivariate time-series

Autocovariance

A mutivariate stochastic process {Xt ∈ Rm}, t ∈ Z is weakly stationary if

E(X 2t,i ) <∞ ∀ t, i E(Xt) ≡ µ, Cov(Xt+h,Xt) ≡ Γ(h).

In particular γij(h) = Cov(Xt+h,i ,Xt,j) = E((Xt+h,i − µi )(Xt,j − µj)).

Note that in general γij(h) 6= γji (h), while

γij(h) = Cov(Xt+h,i ,Xt,j) = (stationarity) = Cov(Xt,i ,Xt−h,j)

= (symmetry) = Cov(Xt−h,j ,Xt,i ) = γji (−h).

Another simple property is |γi ,j(h)| ≤ (γii (0)γjj(0))1/2 .

The ACF ρij(h) =γij(h)

(γii (0)γjj(0))1/2.

24 novembre 2014 7 / 29

Page 20: Model choice. Akaike’s criterion Akaike criterion ...

Multivariate time-series

Autocovariance

A mutivariate stochastic process {Xt ∈ Rm}, t ∈ Z is weakly stationary if

E(X 2t,i ) <∞ ∀ t, i E(Xt) ≡ µ, Cov(Xt+h,Xt) ≡ Γ(h).

In particular γij(h) = Cov(Xt+h,i ,Xt,j) = E((Xt+h,i − µi )(Xt,j − µj)).

Note that in general γij(h) 6= γji (h), while

γij(h) = Cov(Xt+h,i ,Xt,j) = (stationarity) = Cov(Xt,i ,Xt−h,j)

= (symmetry) = Cov(Xt−h,j ,Xt,i ) = γji (−h).

Another simple property is |γi ,j(h)| ≤ (γii (0)γjj(0))1/2 .

The ACF ρij(h) =γij(h)

(γii (0)γjj(0))1/2.

24 novembre 2014 7 / 29

Page 21: Model choice. Akaike’s criterion Akaike criterion ...

Multivariate time-series

Multivariate White-noise and MA

A mutivariate stochastic process {Zt ∈ Rm} is a white-noise withcovariance S , {Zt} ∼WN(0, S), if

{Zt} is stationary with mean 0 and ACVF Γ(h) =

{S h = 0

0 h 6= 0.

{Xt ∈ Rm} is a linear process if

Xt =+∞∑

k=−∞CkZt−k {Zt} ∼WN(0,S)

and Ck are matrices s.t.+∞∑

k=−∞|(Ck)ij | < +∞ for all i , j = 1 . . .m.

{Xt} is stationary and ΓX (h) =∞∑

k=−∞Ck+hSC t

k .

24 novembre 2014 8 / 29

Page 22: Model choice. Akaike’s criterion Akaike criterion ...

Multivariate time-series

Multivariate White-noise and MA

A mutivariate stochastic process {Zt ∈ Rm} is a white-noise withcovariance S , {Zt} ∼WN(0, S), if

{Zt} is stationary with mean 0 and ACVF Γ(h) =

{S h = 0

0 h 6= 0.

{Xt ∈ Rm} is a linear process if

Xt =+∞∑

k=−∞CkZt−k {Zt} ∼WN(0, S)

and Ck are matrices s.t.+∞∑

k=−∞|(Ck)ij | < +∞ for all i , j = 1 . . .m.

{Xt} is stationary and ΓX (h) =∞∑

k=−∞Ck+hSC t

k .

24 novembre 2014 8 / 29

Page 23: Model choice. Akaike’s criterion Akaike criterion ...

Multivariate time-series

Multivariate White-noise and MA

A mutivariate stochastic process {Zt ∈ Rm} is a white-noise withcovariance S , {Zt} ∼WN(0, S), if

{Zt} is stationary with mean 0 and ACVF Γ(h) =

{S h = 0

0 h 6= 0.

{Xt ∈ Rm} is a linear process if

Xt =+∞∑

k=−∞CkZt−k {Zt} ∼WN(0, S)

and Ck are matrices s.t.+∞∑

k=−∞|(Ck)ij | < +∞ for all i , j = 1 . . .m.

{Xt} is stationary and ΓX (h) =∞∑

k=−∞Ck+hSC t

k .

24 novembre 2014 8 / 29

Page 24: Model choice. Akaike’s criterion Akaike criterion ...

Multivariate time-series

Estimation of mean

The mean µ can be estimated through Xn. From the univariate theory, weknow

E(Xn) = µ, V((Xn)i )→ 0 (as n→∞), if γii (h)h→∞−−−→ 0

nV((Xn)i )→+∞∑

h=−∞γii (h) if

+∞∑h=−∞

|γii (h)| < +∞.

Moreover (Xn)i is asymptotically normal. Stronger assumptions arerequired for the vector Xn to be asymptotically normal

Theorem

If Xt = µ++∞∑

k=−∞CkZt−k {Zt} ∼WN(0, S)

then n1/2(Xn − µ) =⇒ N(0,∞∑

k=−∞Ck+hSC t

k ).

24 novembre 2014 9 / 29

Page 25: Model choice. Akaike’s criterion Akaike criterion ...

Multivariate time-series

Confidence intervals for the mean

In principle, from Xn ∼ N(µ,1

n

∞∑k=−∞

Ck+hSC tk ) one could build an

m-dimensional confidence ellipsoid.But...

not intuitive, Ck and S not known and have to be estimated. . .

Instead, build confidence intervals from (Xn)i ∼ N(µi ,1

n

+∞∑h=−∞

γii (h)).

+∞∑h=−∞

γii (h) = 2πfi (0) can be consistently estimated from

2πfi (0) =r∑

h=−r

(1− |h|

r

)γii (h) where rn →∞ and

rnn→ 0.

Componentwise confidence intervals can be combined. If we found ui (α)s.t. P(|µi − (Xn)i | < ui (a)) ≥ 1− α, then

P(|µi − (Xn)i |<ui (a), i = 1,m) ≥ 1−m∑i=1

P(|µi − (Xn)i | ≥ ui (a)

)≥ 1−mα.

Choosing α =0.05

m, one has a 95%-confidence m-rectangle.

24 novembre 2014 10 / 29

Page 26: Model choice. Akaike’s criterion Akaike criterion ...

Multivariate time-series

Confidence intervals for the mean

In principle, from Xn ∼ N(µ,1

n

∞∑k=−∞

Ck+hSC tk ) one could build an

m-dimensional confidence ellipsoid.But... not intuitive, Ck and S not known and have to be estimated. . .

Instead, build confidence intervals from (Xn)i ∼ N(µi ,1

n

+∞∑h=−∞

γii (h)).

+∞∑h=−∞

γii (h) = 2πfi (0) can be consistently estimated from

2πfi (0) =r∑

h=−r

(1− |h|

r

)γii (h) where rn →∞ and

rnn→ 0.

Componentwise confidence intervals can be combined. If we found ui (α)s.t. P(|µi − (Xn)i | < ui (a)) ≥ 1− α, then

P(|µi − (Xn)i |<ui (a), i = 1,m) ≥ 1−m∑i=1

P(|µi − (Xn)i | ≥ ui (a)

)≥ 1−mα.

Choosing α =0.05

m, one has a 95%-confidence m-rectangle.

24 novembre 2014 10 / 29

Page 27: Model choice. Akaike’s criterion Akaike criterion ...

Multivariate time-series

Confidence intervals for the mean

In principle, from Xn ∼ N(µ,1

n

∞∑k=−∞

Ck+hSC tk ) one could build an

m-dimensional confidence ellipsoid.But... not intuitive, Ck and S not known and have to be estimated. . .

Instead, build confidence intervals from (Xn)i ∼ N(µi ,1

n

+∞∑h=−∞

γii (h)).

+∞∑h=−∞

γii (h) = 2πfi (0) can be consistently estimated from

2πfi (0) =r∑

h=−r

(1− |h|

r

)γii (h) where rn →∞ and

rnn→ 0.

Componentwise confidence intervals can be combined. If we found ui (α)s.t. P(|µi − (Xn)i | < ui (a)) ≥ 1− α, then

P(|µi − (Xn)i |<ui (a), i = 1,m) ≥ 1−m∑i=1

P(|µi − (Xn)i | ≥ ui (a)

)≥ 1−mα.

Choosing α =0.05

m, one has a 95%-confidence m-rectangle.

24 novembre 2014 10 / 29

Page 28: Model choice. Akaike’s criterion Akaike criterion ...

Multivariate time-series

Confidence intervals for the mean

In principle, from Xn ∼ N(µ,1

n

∞∑k=−∞

Ck+hSC tk ) one could build an

m-dimensional confidence ellipsoid.But... not intuitive, Ck and S not known and have to be estimated. . .

Instead, build confidence intervals from (Xn)i ∼ N(µi ,1

n

+∞∑h=−∞

γii (h)).

+∞∑h=−∞

γii (h) = 2πfi (0) can be consistently estimated from

2πfi (0) =r∑

h=−r

(1− |h|

r

)γii (h) where rn →∞ and

rnn→ 0.

Componentwise confidence intervals can be combined. If we found ui (α)s.t. P(|µi − (Xn)i | < ui (a)) ≥ 1− α, then

P(|µi − (Xn)i |<ui (a), i = 1,m) ≥ 1−m∑i=1

P(|µi − (Xn)i | ≥ ui (a)

)≥ 1−mα.

Choosing α =0.05

m, one has a 95%-confidence m-rectangle.

24 novembre 2014 10 / 29

Page 29: Model choice. Akaike’s criterion Akaike criterion ...

Multivariate time-series

Confidence intervals for the mean

In principle, from Xn ∼ N(µ,1

n

∞∑k=−∞

Ck+hSC tk ) one could build an

m-dimensional confidence ellipsoid.But... not intuitive, Ck and S not known and have to be estimated. . .

Instead, build confidence intervals from (Xn)i ∼ N(µi ,1

n

+∞∑h=−∞

γii (h)).

+∞∑h=−∞

γii (h) = 2πfi (0) can be consistently estimated from

2πfi (0) =r∑

h=−r

(1− |h|

r

)γii (h) where rn →∞ and

rnn→ 0.

Componentwise confidence intervals can be combined. If we found ui (α)s.t. P(|µi − (Xn)i | < ui (a)) ≥ 1− α, then

P(|µi − (Xn)i |<ui (a), i = 1,m) ≥ 1−m∑i=1

P(|µi − (Xn)i | ≥ ui (a)

)≥ 1−mα.

Choosing α =0.05

m, one has a 95%-confidence m-rectangle.

24 novembre 2014 10 / 29

Page 30: Model choice. Akaike’s criterion Akaike criterion ...

Multivariate time-series

Estimation of ACVF (bivariate case, m = 2)

Γ(h) =

1

n

n−h∑t=1

(Xt+h − Xn)(Xt − Xn)t 0 ≤ h < n

1

n

n∑t=−h+1

(Xt+h − Xn)(Xt − Xn)t −n < h < 0.

ρij(h) = γij(h)(γii (0)γjj(0))−1/2.

Theorem

If Xt = µ++∞∑

k=−∞CkZt−k {Zt} ∼ IID(0,S)

then∀ h γij(h)

P−→ γij(h) ρij(h)P−→ ρij(h) as n→∞.

24 novembre 2014 11 / 29

Page 31: Model choice. Akaike’s criterion Akaike criterion ...

Multivariate time-series

Estimation of ACVF (bivariate case, m = 2)

Γ(h) =

1

n

n−h∑t=1

(Xt+h − Xn)(Xt − Xn)t 0 ≤ h < n

1

n

n∑t=−h+1

(Xt+h − Xn)(Xt − Xn)t −n < h < 0.

ρij(h) = γij(h)(γii (0)γjj(0))−1/2.

Theorem

If Xt = µ++∞∑

k=−∞CkZt−k {Zt} ∼ IID(0, S)

then∀ h γij(h)

P−→ γij(h) ρij(h)P−→ ρij(h) as n→∞.

24 novembre 2014 11 / 29

Page 32: Model choice. Akaike’s criterion Akaike criterion ...

Multivariate time-series

An example: Southern Oscillation Index

Southern Oscillation Index (an environmental measure) compared to fishrecruitment in South Pacific (1950 to 1985)

Southern Oscillation Index

1950 1960 1970 1980

-1.0

0.00.51.0

Recruitment

1950 1960 1970 1980

020

60100

24 novembre 2014 12 / 29

Page 33: Model choice. Akaike’s criterion Akaike criterion ...

Multivariate time-series

ACF of Southern Oscillation Index

0.0 0.5 1.0 1.5

-0.5

0.0

0.5

1.0

Lag

ACF

soi

0.0 0.5 1.0 1.5

-0.5

0.0

0.5

1.0

Lag

soi & rec

-1.5 -1.0 -0.5 0.0

-0.5

0.0

0.5

1.0

Lag

ACF

rec & soi

0.0 0.5 1.0 1.5

-0.5

0.0

0.5

1.0

Lag

rec

Bottom leftpanel is γ12 ofnegative lags.

24 novembre 2014 13 / 29

Page 34: Model choice. Akaike’s criterion Akaike criterion ...

Multivariate time-series

An example from Box and Jenkins

Sales (V2) with a leading indicator (V1)

1011

1213

14

V1

200

220

240

260

0 50 100 150

V2

Time

sales

24 novembre 2014 14 / 29

Page 35: Model choice. Akaike’s criterion Akaike criterion ...

Multivariate time-series

ACF of sales data

0 5 10 15

-0.2

0.20.40.60.81.0

Lag

ACF

V1

0 5 10 15

-0.2

0.20.40.60.81.0

Lag

V1 & V2

-15 -10 -5 0

-0.2

0.20.40.60.81.0

Lag

ACF

V2 & V1

0 5 10 15

-0.2

0.20.40.60.81.0

Lag

V2

Data are notstationary.

24 novembre 2014 15 / 29

Page 36: Model choice. Akaike’s criterion Akaike criterion ...

Multivariate time-series

Differenced sales data

-0.5

0.0

0.5

V1

-20

24

0 50 100 150

V2

Time

dsales

24 novembre 2014 16 / 29

Page 37: Model choice. Akaike’s criterion Akaike criterion ...

Multivariate time-series

ACF of sales data

0 5 10 15

-0.5

0.0

0.5

1.0

Lag

ACF

V1

0 5 10 15

-0.5

0.0

0.5

1.0

Lag

V1 & V2

-15 -10 -5 0

-0.5

0.0

0.5

1.0

Lag

ACF

V2 & V1

0 5 10 15

-0.5

0.0

0.5

1.0

Lag

V2

Only cross-correlationrelevant onlyat lags−2, −3.

24 novembre 2014 17 / 29

Page 38: Model choice. Akaike’s criterion Akaike criterion ...

Multivariate time-series

Testing for independence of time-series: basis

Generally asymptotic distribution of γij(h) is complicated. But

Theorem

Let

Xt,1 =∞∑

j=−∞αjZt−j ,1 Xt,2 =

∞∑j=−∞

βjZt−j ,2

with {Zt,1} ∼WN(0, σ21), {Zt,2} ∼WN(0, σ22) and independent.

Then

nV(γ12(h))n→∞−−−→

∞∑j=−∞

γ11(j)γ22(j).

n1/2ρ12(h) =⇒ N

0,∞∑

j=−∞ρ11(j)ρ22(j)

.

24 novembre 2014 18 / 29

Page 39: Model choice. Akaike’s criterion Akaike criterion ...

Multivariate time-series

Testing for independence of time-series: basis

Generally asymptotic distribution of γij(h) is complicated. But

Theorem

Let

Xt,1 =∞∑

j=−∞αjZt−j ,1 Xt,2 =

∞∑j=−∞

βjZt−j ,2

with {Zt,1} ∼WN(0, σ21), {Zt,2} ∼WN(0, σ22) and independent.

Then

nV(γ12(h))n→∞−−−→

∞∑j=−∞

γ11(j)γ22(j).

n1/2ρ12(h) =⇒ N

0,∞∑

j=−∞ρ11(j)ρ22(j)

.

24 novembre 2014 18 / 29

Page 40: Model choice. Akaike’s criterion Akaike criterion ...

Multivariate time-series

Testing for independence of time-series: basis

Generally asymptotic distribution of γij(h) is complicated. But

Theorem

Let

Xt,1 =∞∑

j=−∞αjZt−j ,1 Xt,2 =

∞∑j=−∞

βjZt−j ,2

with {Zt,1} ∼WN(0, σ21), {Zt,2} ∼WN(0, σ22) and independent.

Then

nV(γ12(h))n→∞−−−→

∞∑j=−∞

γ11(j)γ22(j).

n1/2ρ12(h) =⇒ N

0,∞∑

j=−∞ρ11(j)ρ22(j)

.

24 novembre 2014 18 / 29

Page 41: Model choice. Akaike’s criterion Akaike criterion ...

Multivariate time-series

Testing for independence of time-series: basis

Generally asymptotic distribution of γij(h) is complicated. But

Theorem

Let

Xt,1 =∞∑

j=−∞αjZt−j ,1 Xt,2 =

∞∑j=−∞

βjZt−j ,2

with {Zt,1} ∼WN(0, σ21), {Zt,2} ∼WN(0, σ22) and independent.

Then

nV(γ12(h))n→∞−−−→

∞∑j=−∞

γ11(j)γ22(j).

n1/2ρ12(h) =⇒ N

0,∞∑

j=−∞ρ11(j)ρ22(j)

.

24 novembre 2014 18 / 29

Page 42: Model choice. Akaike’s criterion Akaike criterion ...

Multivariate time-series

Testing for independence of time-series: an example

Suppose {Xt,1} and {Xt,2} are independent AR(1) processes with

ρi ,i (h) = 0.8|h|.

Then asymptotic variance of ρ12(h) is

n−1∞∑

h=−∞0.64|h| ∼ n−14.556

Values of ρ12(h) quite larger than 1.96n−1 should be common even if thetwo series are independent.Instead, if one series is white-noise, then V(ρ12(h)) ≈ 1

n .

Hence, in testing for independence, it is often recommended to“prewhiten” one series.

24 novembre 2014 19 / 29

Page 43: Model choice. Akaike’s criterion Akaike criterion ...

Multivariate time-series

Testing for independence of time-series: an example

Suppose {Xt,1} and {Xt,2} are independent AR(1) processes with

ρi ,i (h) = 0.8|h|.

Then asymptotic variance of ρ12(h) is

n−1∞∑

h=−∞0.64|h| ∼ n−14.556

Values of ρ12(h) quite larger than 1.96n−1 should be common even if thetwo series are independent.

Instead, if one series is white-noise, then V(ρ12(h)) ≈ 1n .

Hence, in testing for independence, it is often recommended to“prewhiten” one series.

24 novembre 2014 19 / 29

Page 44: Model choice. Akaike’s criterion Akaike criterion ...

Multivariate time-series

Testing for independence of time-series: an example

Suppose {Xt,1} and {Xt,2} are independent AR(1) processes with

ρi ,i (h) = 0.8|h|.

Then asymptotic variance of ρ12(h) is

n−1∞∑

h=−∞0.64|h| ∼ n−14.556

Values of ρ12(h) quite larger than 1.96n−1 should be common even if thetwo series are independent.Instead, if one series is white-noise, then V(ρ12(h)) ≈ 1

n .

Hence, in testing for independence, it is often recommended to“prewhiten” one series.

24 novembre 2014 19 / 29

Page 45: Model choice. Akaike’s criterion Akaike criterion ...

Multivariate time-series

Testing for independence of time-series: an example

Suppose {Xt,1} and {Xt,2} are independent AR(1) processes with

ρi ,i (h) = 0.8|h|.

Then asymptotic variance of ρ12(h) is

n−1∞∑

h=−∞0.64|h| ∼ n−14.556

Values of ρ12(h) quite larger than 1.96n−1 should be common even if thetwo series are independent.Instead, if one series is white-noise, then V(ρ12(h)) ≈ 1

n .

Hence, in testing for independence, it is often recommended to“prewhiten” one series.

24 novembre 2014 19 / 29

Page 46: Model choice. Akaike’s criterion Akaike criterion ...

Multivariate time-series

‘Pre-whitening’ a time series

Instead of testing ρ12(h) of the original series, one trasforms them intowhite noise.If {Xt,1} and {Xt,2} are invertible ARMA, then

Zt,i =∞∑k=0

π(i)k Xt−k,i ∼WN(0, σ2i ), i = 1, 2

where∞∑k=0

π(i)k zk = π(i)(z) = φ(i)(z)/θ(i)(z).

{Xt,1} and {Xt,2} are independent if and only if {Zt,1} and {Zt,2}, henceone test for ρZ1,Z2(h).

As φ(i)(z) and θ(i)(z) not known, one fits an ARMA to the series, anduses the residuals Wt,i in place of Zt,i .It may be enough doing this just to one series.

24 novembre 2014 20 / 29

Page 47: Model choice. Akaike’s criterion Akaike criterion ...

Multivariate time-series

‘Pre-whitening’ a time series

Instead of testing ρ12(h) of the original series, one trasforms them intowhite noise.If {Xt,1} and {Xt,2} are invertible ARMA, then

Zt,i =∞∑k=0

π(i)k Xt−k,i ∼WN(0, σ2i ), i = 1, 2

where∞∑k=0

π(i)k zk = π(i)(z) = φ(i)(z)/θ(i)(z).

{Xt,1} and {Xt,2} are independent if and only if {Zt,1} and {Zt,2}, henceone test for ρZ1,Z2(h).

As φ(i)(z) and θ(i)(z) not known, one fits an ARMA to the series, anduses the residuals Wt,i in place of Zt,i .It may be enough doing this just to one series.

24 novembre 2014 20 / 29

Page 48: Model choice. Akaike’s criterion Akaike criterion ...

Multivariate time-series

‘Pre-whitening’ a time series

Instead of testing ρ12(h) of the original series, one trasforms them intowhite noise.If {Xt,1} and {Xt,2} are invertible ARMA, then

Zt,i =∞∑k=0

π(i)k Xt−k,i ∼WN(0, σ2i ), i = 1, 2

where∞∑k=0

π(i)k zk = π(i)(z) = φ(i)(z)/θ(i)(z).

{Xt,1} and {Xt,2} are independent if and only if {Zt,1} and {Zt,2}, henceone test for ρZ1,Z2(h).

As φ(i)(z) and θ(i)(z) not known, one fits an ARMA to the series, anduses the residuals Wt,i in place of Zt,i .It may be enough doing this just to one series.

24 novembre 2014 20 / 29

Page 49: Model choice. Akaike’s criterion Akaike criterion ...

Multivariate time-series

Siimulated data

1st series is AR(1) with ϕ = 0.9; 2nd series is AR(2) with ϕ1 = 0.7,ϕ2 = 0.27.

-4-2

02

4

dat1

-6-4

-20

24

0 50 100 150 200

dat2

Time

dat_sim

24 novembre 2014 21 / 29

Page 50: Model choice. Akaike’s criterion Akaike criterion ...

Multivariate time-series

ACF of simulated data

0 5 10 15 20

0.00.20.40.60.81.0

Lag

ACF

dat1

0 5 10 15 20

0.00.20.40.60.81.0

Lag

dat1 & dat2

-20 -15 -10 -5 0

0.00.20.40.60.81.0

Lag

ACF

dat2 & dat1

0 5 10 15 20

0.00.20.40.60.81.0

Lag

dat2

24 novembre 2014 22 / 29

Page 51: Model choice. Akaike’s criterion Akaike criterion ...

Multivariate time-series

ACF of residuals

MLE fits the correct model to both series.

0 5 10 15 20

0.00.20.40.60.81.0

Lag

ACF

fitunk1$res

0 5 10 15 20

0.00.20.40.60.81.0

Lag

fitunk1$res & fitunk2$res

-20 -15 -10 -5 0

0.00.20.40.60.81.0

Lag

ACF

fitunk2$res & fitunk1$res

0 5 10 15 20

0.00.20.40.60.81.0

Lag

fitunk2$res

A few cross-correlationcoefficientmay appearslightlysignificant.

24 novembre 2014 23 / 29

Page 52: Model choice. Akaike’s criterion Akaike criterion ...

Multivariate time-series

ACF of residuals

MLE fits the correct model to both series.

0 5 10 15 20

0.00.20.40.60.81.0

Lag

ACF

fitunk1$res

0 5 10 15 20

0.00.20.40.60.81.0

Lag

fitunk1$res & fitunk2$res

-20 -15 -10 -5 0

0.00.20.40.60.81.0

Lag

ACF

fitunk2$res & fitunk1$res

0 5 10 15 20

0.00.20.40.60.81.0

Lag

fitunk2$res

A few cross-correlationcoefficientmay appearslightlysignificant.

24 novembre 2014 23 / 29

Page 53: Model choice. Akaike’s criterion Akaike criterion ...

Multivariate time-series

Bartlett’s formula

More generally

Theorem

If {Xt} is a bivariate Gaussian time series with+∞∑

h=−∞|γij(h)| <∞, then

limn→∞

nCov (ρ12(h), ρ12(k)) =+∞∑

j=−∞

[ρ11(j)ρ22(j + k − h)

+ρ12(j + k)ρ21(j − h)− ρ12(h) (ρ11(j)ρ12(j + k) + ρ22(j)ρ21(j − k))

−ρ12(k) (ρ11(j)ρ12(j + h) + ρ22(j)ρ21(j − h))

+ρ12(h)ρ12(k)

(1

2ρ211(j) + ρ212(j) +

1

2ρ222(j)

)]

24 novembre 2014 24 / 29

Page 54: Model choice. Akaike’s criterion Akaike criterion ...

Multivariate time-series

Spectral density of multivariate series

If+∞∑

h=−∞|γij(h)| <∞, one can define

f (λ) =1

∞∑h=−∞

e−ihλΓ(h), λ ∈ [−π, π]

and one obtainsΓ(h) =

∫ π

−πe iλhf (λ) dλ

and

Xt =

∫ π

−πe iλh dZ (λ)

where Zi (·) are (complex) processes with independent increments s.t.∫ λ2

λ1

fij(λ) dλ = E(

(Zi (λ2)− Zi (λj))(Zj(λ2)− Zj(λ1))).

24 novembre 2014 25 / 29

Page 55: Model choice. Akaike’s criterion Akaike criterion ...

Multivariate time-series

Spectral density of multivariate series

If+∞∑

h=−∞|γij(h)| <∞, one can define

f (λ) =1

∞∑h=−∞

e−ihλΓ(h), λ ∈ [−π, π]

and one obtainsΓ(h) =

∫ π

−πe iλhf (λ) dλ

and

Xt =

∫ π

−πe iλh dZ (λ)

where Zi (·) are (complex) processes with independent increments s.t.∫ λ2

λ1

fij(λ) dλ = E(

(Zi (λ2)− Zi (λj))(Zj(λ2)− Zj(λ1))).

24 novembre 2014 25 / 29

Page 56: Model choice. Akaike’s criterion Akaike criterion ...

Multivariate time-series

Coherence of series

For a bivariate series the coherence at frequency λ is

X12(λ) =f12(λ)

[f11(λ)f22(λ)]1/2

and represents the correlation between dZ1(λ) and dZ2(λ).

The squared coherency function is |X12(λ)|2 satisfies 0 ≤ |X12(λ)|2 ≤ 1.

Remark. If Xt,2 =+∞∑

k=−∞ψkXt−k,1, then |X12(λ)|2 ≡ 1.

24 novembre 2014 26 / 29

Page 57: Model choice. Akaike’s criterion Akaike criterion ...

Multivariate time-series

Coherence of series

For a bivariate series the coherence at frequency λ is

X12(λ) =f12(λ)

[f11(λ)f22(λ)]1/2

and represents the correlation between dZ1(λ) and dZ2(λ).

The squared coherency function is |X12(λ)|2 satisfies 0 ≤ |X12(λ)|2 ≤ 1.

Remark. If Xt,2 =+∞∑

k=−∞ψkXt−k,1, then |X12(λ)|2 ≡ 1.

24 novembre 2014 26 / 29

Page 58: Model choice. Akaike’s criterion Akaike criterion ...

Multivariate time-series

Periodogram

Define J(ωj) = n−1/2n∑

t=1

Xte−itωj , ωj = 2πj/n

for j between −[(n − 1)/2] and [n/2].

Then In(ωj) = J(ωj)J∗(ωj) where ∗ means transpose and complexconjugate.

I12(ωj) =1

n

(n∑

t=1

Xt1e−itωj

)(n∑

t=1

Xt2e itωj

)is the cross periodogram.

24 novembre 2014 27 / 29

Page 59: Model choice. Akaike’s criterion Akaike criterion ...

Multivariate time-series

Estimation of spectral density and coherence

Again, one estimates f (λ) by

f (λ) =1

mn∑k=−mn

Wn(k)In

(g(n, λ) + 2π

k

n

).

If Xt =∑+∞

k=−∞ CkZt−k {Zt} ∼ IID(0, S) then

fij(λ) ∼ AN

fij(λ), fij(λ)mn∑

k=−mn

W 2n (k)

0 < λ < π.

The natural estimator of |X12(λ)|2 is

χ212(λ) =

|f12(λ)|2

f11(λ)f22(λ).

24 novembre 2014 28 / 29

Page 60: Model choice. Akaike’s criterion Akaike criterion ...

Multivariate time-series

An example of coherency estimation

0 1 2 3 4 5 6

0.0

0.2

0.4

0.6

0.8

1.0

frequency

squa

red

cohe

renc

y

Squared coherency between SOI and recruitment

The horizontalline represents a(conservative)test of theassumption|X12(λ)|2 = 0.Strongcoherency atperiod 1 yr. andlonger than 3.

24 novembre 2014 29 / 29