Model choice. Akaike’s criterion Akaike criterion ...
Transcript of Model choice. Akaike’s criterion Akaike criterion ...
Model choice. Akaike’s criterion
Akaike criterion: Kullback-Leibler discrepancy
Given a family of probability densities {f (·;ψ), ψ ∈ Ψ}, Kullback-Leibler’sindex of f (·;ψ) relative to f (·; θ) is
∆(ψ|θ) = Eθ(−2 log(f (X ;ψ))) =
∫Rn
−2 log(f (x ;ψ))f (x ; θ) dx .
Kullback-Leibler’s discrepancy between f (·;ψ) and f (·; θ) is
d(ψ|θ) = ∆(ψ|θ)−∆(θ|θ) =
∫Rn
−2 log
(f (x ;ψ)
f (x ; θ)
)f (x ; θ) dx .
Jensen’s inequality implies E(log(Y )) ≤ log(E(Y )) for any randomvariable. Hence
d(ψ|θ) ≥ −2 log
(∫Rn
f (x ;ψ)
f (x ; θ)f (x ; θ) dx
)= 0
with equality only if f (x ;ψ) = f (x ; θ) a.e. [f (·; θ)].
24 novembre 2014 1 / 29
Model choice. Akaike’s criterion
Akaike criterion: Kullback-Leibler discrepancy
Given a family of probability densities {f (·;ψ), ψ ∈ Ψ}, Kullback-Leibler’sindex of f (·;ψ) relative to f (·; θ) is
∆(ψ|θ) = Eθ(−2 log(f (X ;ψ))) =
∫Rn
−2 log(f (x ;ψ))f (x ; θ) dx .
Kullback-Leibler’s discrepancy between f (·;ψ) and f (·; θ) is
d(ψ|θ) = ∆(ψ|θ)−∆(θ|θ) =
∫Rn
−2 log
(f (x ;ψ)
f (x ; θ)
)f (x ; θ) dx .
Jensen’s inequality implies E(log(Y )) ≤ log(E(Y )) for any randomvariable. Hence
d(ψ|θ) ≥ −2 log
(∫Rn
f (x ;ψ)
f (x ; θ)f (x ; θ) dx
)= 0
with equality only if f (x ;ψ) = f (x ; θ) a.e. [f (·; θ)].
24 novembre 2014 1 / 29
Model choice. Akaike’s criterion
Approximating Kullback-Leibler discrepancy
Given observations X1, . . . ,Xn, we would like to minimize d(ψ|θ) amongall candidate models ψ, given the true model θ.
As the true model is unknown, we estimate d(ψ|θ). Let ψ = (φ, ϑ, σ2)the parameters of an ARMA(p,q) model and ψ the MLE based onX1, . . . ,Xn. Let Y an independent realization of the same process. Then
−2 log LY (φ, ϑ, σ2) = n log(2π) + n log(σ2) + log(r0 . . . rn−1) +SY (φ, ϑ)
σ2
Indeed remember that for an ARMA(p,q) process
L(φ, ϑ, σ2) = (2πσ2)−n/2(r0 . . . rn−1)−1/2 exp
{− 1
2σ2S(φ, ϑ)
}with S(φ, ϑ) =
n∑j=1
(xj − xj)2
rj−1.
r0, . . . , rn−1 depend only on parameters (φ, ϑ) and not on observed data.Data enter likelihood only through the terms (xj − xj)
2 in S(φ, ϑ).
24 novembre 2014 2 / 29
Model choice. Akaike’s criterion
Approximating Kullback-Leibler discrepancy
Given observations X1, . . . ,Xn, we would like to minimize d(ψ|θ) amongall candidate models ψ, given the true model θ.As the true model is unknown, we estimate d(ψ|θ).
Let ψ = (φ, ϑ, σ2)the parameters of an ARMA(p,q) model and ψ the MLE based onX1, . . . ,Xn. Let Y an independent realization of the same process. Then
−2 log LY (φ, ϑ, σ2) = n log(2π) + n log(σ2) + log(r0 . . . rn−1) +SY (φ, ϑ)
σ2
Indeed remember that for an ARMA(p,q) process
L(φ, ϑ, σ2) = (2πσ2)−n/2(r0 . . . rn−1)−1/2 exp
{− 1
2σ2S(φ, ϑ)
}with S(φ, ϑ) =
n∑j=1
(xj − xj)2
rj−1.
r0, . . . , rn−1 depend only on parameters (φ, ϑ) and not on observed data.Data enter likelihood only through the terms (xj − xj)
2 in S(φ, ϑ).
24 novembre 2014 2 / 29
Model choice. Akaike’s criterion
Approximating Kullback-Leibler discrepancy
Given observations X1, . . . ,Xn, we would like to minimize d(ψ|θ) amongall candidate models ψ, given the true model θ.As the true model is unknown, we estimate d(ψ|θ). Let ψ = (φ, ϑ, σ2)the parameters of an ARMA(p,q) model and ψ the MLE based onX1, . . . ,Xn. Let Y an independent realization of the same process. Then
−2 log LY (φ, ϑ, σ2) = n log(2π) + n log(σ2) + log(r0 . . . rn−1) +SY (φ, ϑ)
σ2
Indeed remember that for an ARMA(p,q) process
L(φ, ϑ, σ2) = (2πσ2)−n/2(r0 . . . rn−1)−1/2 exp
{− 1
2σ2S(φ, ϑ)
}with S(φ, ϑ) =
n∑j=1
(xj − xj)2
rj−1.
r0, . . . , rn−1 depend only on parameters (φ, ϑ) and not on observed data.Data enter likelihood only through the terms (xj − xj)
2 in S(φ, ϑ).
24 novembre 2014 2 / 29
Model choice. Akaike’s criterion
Approximating Kullback-Leibler discrepancy
Given observations X1, . . . ,Xn, we would like to minimize d(ψ|θ) amongall candidate models ψ, given the true model θ.As the true model is unknown, we estimate d(ψ|θ). Let ψ = (φ, ϑ, σ2)the parameters of an ARMA(p,q) model and ψ the MLE based onX1, . . . ,Xn. Let Y an independent realization of the same process. Then
−2 log LY (φ, ϑ, σ2) = n log(2π) + n log(σ2) + log(r0 . . . rn−1) +SY (φ, ϑ)
σ2
Indeed remember that for an ARMA(p,q) process
L(φ, ϑ, σ2) = (2πσ2)−n/2(r0 . . . rn−1)−1/2 exp
{− 1
2σ2S(φ, ϑ)
}with S(φ, ϑ) =
n∑j=1
(xj − xj)2
rj−1.
r0, . . . , rn−1 depend only on parameters (φ, ϑ) and not on observed data.Data enter likelihood only through the terms (xj − xj)
2 in S(φ, ϑ).24 novembre 2014 2 / 29
Model choice. Akaike’s criterion
Approximating Kullback-Leibler discrepancy
Given observations X1, . . . ,Xn, we would like to minimize d(ψ|θ) amongall candidate models ψ, given the true model θ.As the true model is unknown, we estimate d(ψ|θ). Let ψ = (φ, ϑ, σ2) theparameters of an ARMA(p,q) model and ψ the MLE based on X1, . . . ,Xn.Let Y an independent realization of the same process. Then
−2 log LY (φ, ϑ, σ2) = n log(2π) + n log(σ2) + log(r0 . . . rn−1) +SY (φ, ϑ)
σ2
= −2 log LX (φ, ϑ, σ2) +SY (φ, ϑ)
σ2− SX (φ, ϑ)
σ2
= −2 log LX (φ, ϑ, σ2) +SY (φ, ϑ)
σ2− n =⇒
Eθ(∆(ψ|θ)) = E(φ,ϑ,σ2)(−2 log LX (φ, ϑ, σ2)) + E(φ,ϑ,σ2)
(SY (φ, ϑ)
σ2
)− n.
24 novembre 2014 3 / 29
Model choice. Akaike’s criterion
Approximating Kullback-Leibler discrepancy
Given observations X1, . . . ,Xn, we would like to minimize d(ψ|θ) amongall candidate models ψ, given the true model θ.As the true model is unknown, we estimate d(ψ|θ). Let ψ = (φ, ϑ, σ2) theparameters of an ARMA(p,q) model and ψ the MLE based on X1, . . . ,Xn.Let Y an independent realization of the same process. Then
−2 log LY (φ, ϑ, σ2) = n log(2π) + n log(σ2) + log(r0 . . . rn−1) +SY (φ, ϑ)
σ2
= −2 log LX (φ, ϑ, σ2) +SY (φ, ϑ)
σ2− SX (φ, ϑ)
σ2
= −2 log LX (φ, ϑ, σ2) +SY (φ, ϑ)
σ2− n =⇒
Eθ(∆(ψ|θ)) = E(φ,ϑ,σ2)(−2 log LX (φ, ϑ, σ2)) + E(φ,ϑ,σ2)
(SY (φ, ϑ)
σ2
)− n.
24 novembre 2014 3 / 29
Model choice. Akaike’s criterion
Approximating Kullback-Leibler discrepancy
Given observations X1, . . . ,Xn, we would like to minimize d(ψ|θ) amongall candidate models ψ, given the true model θ.As the true model is unknown, we estimate d(ψ|θ). Let ψ = (φ, ϑ, σ2) theparameters of an ARMA(p,q) model and ψ the MLE based on X1, . . . ,Xn.Let Y an independent realization of the same process. Then
−2 log LY (φ, ϑ, σ2) = n log(2π) + n log(σ2) + log(r0 . . . rn−1) +SY (φ, ϑ)
σ2
= −2 log LX (φ, ϑ, σ2) +SY (φ, ϑ)
σ2− SX (φ, ϑ)
σ2
= −2 log LX (φ, ϑ, σ2) +SY (φ, ϑ)
σ2− n =⇒
Eθ(∆(ψ|θ)) = E(φ,ϑ,σ2)(−2 log LX (φ, ϑ, σ2)) + E(φ,ϑ,σ2)
(SY (φ, ϑ)
σ2
)− n.
24 novembre 2014 3 / 29
Model choice. Akaike’s criterion
Approximating Kullback-Leibler discrepancy
Given observations X1, . . . ,Xn, we would like to minimize d(ψ|θ) amongall candidate models ψ, given the true model θ.As the true model is unknown, we estimate d(ψ|θ). Let ψ = (φ, ϑ, σ2) theparameters of an ARMA(p,q) model and ψ the MLE based on X1, . . . ,Xn.Let Y an independent realization of the same process. Then
−2 log LY (φ, ϑ, σ2) = n log(2π) + n log(σ2) + log(r0 . . . rn−1) +SY (φ, ϑ)
σ2
= −2 log LX (φ, ϑ, σ2) +SY (φ, ϑ)
σ2− SX (φ, ϑ)
σ2
= −2 log LX (φ, ϑ, σ2) +SY (φ, ϑ)
σ2− n =⇒
Eθ(∆(ψ|θ)) = E(φ,ϑ,σ2)(−2 log LX (φ, ϑ, σ2)) + E(φ,ϑ,σ2)
(SY (φ, ϑ)
σ2
)− n.
24 novembre 2014 3 / 29
Model choice. Akaike’s criterion
Kullback-Leibler discrepancy and AICC
Using linear approximations, and asymptotic distributions of estimators,one arrives at
E(φ,ϑ,σ2)
(SY (φ, ϑ)
)≈ σ2(n + p + q).
Similarly nσ2 = SX (φ, ϑ) for large n is distributed as σ2χ2(n − p − q − 2)and is asymptotically independent of (φ, ϑ).
Hence
E(φ,ϑ,σ2)
(SY (φ, ϑ)
σ2
)≈ σ2(n + p + q)
σ2(n − p − q − 2)/n
FromEθ(∆(ψ|θ)) = E(φ,ϑ,σ2)(−2 log LX (φ, ϑ, σ2)) + E(φ,ϑ,σ2)
(SY (φ,ϑ)
σ2
)− n
AICC = −2 log LX (φ, ϑ, σ2) +2(p + q + 1)n
n − p − q − 2
is an approximate unbiased estimate of ∆(θ|θ).
24 novembre 2014 4 / 29
Model choice. Akaike’s criterion
Kullback-Leibler discrepancy and AICC
Using linear approximations, and asymptotic distributions of estimators,one arrives at
E(φ,ϑ,σ2)
(SY (φ, ϑ)
)≈ σ2(n + p + q).
Similarly nσ2 = SX (φ, ϑ) for large n is distributed as σ2χ2(n − p − q − 2)and is asymptotically independent of (φ, ϑ). Hence
E(φ,ϑ,σ2)
(SY (φ, ϑ)
σ2
)≈ σ2(n + p + q)
σ2(n − p − q − 2)/n
FromEθ(∆(ψ|θ)) = E(φ,ϑ,σ2)(−2 log LX (φ, ϑ, σ2)) + E(φ,ϑ,σ2)
(SY (φ,ϑ)
σ2
)− n
AICC = −2 log LX (φ, ϑ, σ2) +2(p + q + 1)n
n − p − q − 2
is an approximate unbiased estimate of ∆(θ|θ).
24 novembre 2014 4 / 29
Model choice. Akaike’s criterion
Kullback-Leibler discrepancy and AICC
Using linear approximations, and asymptotic distributions of estimators,one arrives at
E(φ,ϑ,σ2)
(SY (φ, ϑ)
)≈ σ2(n + p + q).
Similarly nσ2 = SX (φ, ϑ) for large n is distributed as σ2χ2(n − p − q − 2)and is asymptotically independent of (φ, ϑ). Hence
E(φ,ϑ,σ2)
(SY (φ, ϑ)
σ2
)≈ σ2(n + p + q)
σ2(n − p − q − 2)/n
FromEθ(∆(ψ|θ)) = E(φ,ϑ,σ2)(−2 log LX (φ, ϑ, σ2)) + E(φ,ϑ,σ2)
(SY (φ,ϑ)
σ2
)− n
AICC = −2 log LX (φ, ϑ, σ2) +2(p + q + 1)n
n − p − q − 2
is an approximate unbiased estimate of ∆(θ|θ).24 novembre 2014 4 / 29
Model choice. Akaike’s criterion
Criteria for model choice
The order is chosen by minimizing the value of AICC (Corrected Akaike’s
Information Criterion): −2 log LX (φ, ϑ, σ2) + 2(p+q+1)nn−p−q−2 .
The second term can be considered a penalty for models with a largenumber of parameters.
For n large it is approximately the same as Akaike’s information Criterion(AIC): −2 log LX (φ, ϑ, σ2) + 2(p + q + 1), but carries a higher penalty forfinite n, and thus is somewhat less likely to overfit.In R: AICC <- AIC(myfit,k=2*n/(n-p-q-2))
A rule of thumb is the fits of model 1 and model 2 are not significantlydifferent if |AICC1 − AICC2| < 2 (only the difference matters, not theabsolute value of AICC ). Hence, we may decide to choose model 1 if itsimpler than 2 (or its residuals are closer to white-noise) even ifAICC1 > AICC2 as long as AICC1 < AICC2 + 2.
24 novembre 2014 5 / 29
Model choice. Akaike’s criterion
Criteria for model choice
The order is chosen by minimizing the value of AICC (Corrected Akaike’s
Information Criterion): −2 log LX (φ, ϑ, σ2) + 2(p+q+1)nn−p−q−2 .
The second term can be considered a penalty for models with a largenumber of parameters.
For n large it is approximately the same as Akaike’s information Criterion(AIC): −2 log LX (φ, ϑ, σ2) + 2(p + q + 1), but carries a higher penalty forfinite n, and thus is somewhat less likely to overfit.In R: AICC <- AIC(myfit,k=2*n/(n-p-q-2))
A rule of thumb is the fits of model 1 and model 2 are not significantlydifferent if |AICC1 − AICC2| < 2 (only the difference matters, not theabsolute value of AICC ). Hence, we may decide to choose model 1 if itsimpler than 2 (or its residuals are closer to white-noise) even ifAICC1 > AICC2 as long as AICC1 < AICC2 + 2.
24 novembre 2014 5 / 29
Model choice. Akaike’s criterion
Criteria for model choice
The order is chosen by minimizing the value of AICC (Corrected Akaike’s
Information Criterion): −2 log LX (φ, ϑ, σ2) + 2(p+q+1)nn−p−q−2 .
The second term can be considered a penalty for models with a largenumber of parameters.
For n large it is approximately the same as Akaike’s information Criterion(AIC): −2 log LX (φ, ϑ, σ2) + 2(p + q + 1), but carries a higher penalty forfinite n, and thus is somewhat less likely to overfit.In R: AICC <- AIC(myfit,k=2*n/(n-p-q-2))
A rule of thumb is the fits of model 1 and model 2 are not significantlydifferent if |AICC1 − AICC2| < 2 (only the difference matters, not theabsolute value of AICC ). Hence, we may decide to choose model 1 if itsimpler than 2 (or its residuals are closer to white-noise) even ifAICC1 > AICC2 as long as AICC1 < AICC2 + 2.
24 novembre 2014 5 / 29
Model choice. Akaike’s criterion
Tests on residuals
Xt(ϕ, ϑ) predicted value of Xt given the estimates (ϕ, ϑ).
Wt =Xt − Xt(ϕ, ϑ)(rt−1(ϕ, ϑ)
)1/2 standardized residuals.
Portmanteau tests on ACF of Wt : Box-Pierce; Ljung-Box;
Test on turning points
Rank tests
. . .
24 novembre 2014 6 / 29
Multivariate time-series
Autocovariance
A mutivariate stochastic process {Xt ∈ Rm}, t ∈ Z is weakly stationary if
E(X 2t,i ) <∞ ∀ t, i E(Xt) ≡ µ, Cov(Xt+h,Xt) ≡ Γ(h).
In particular γij(h) = Cov(Xt+h,i ,Xt,j) = E((Xt+h,i − µi )(Xt,j − µj)).
Note that in general γij(h) 6= γji (h), while
γij(h) = Cov(Xt+h,i ,Xt,j) = (stationarity) = Cov(Xt,i ,Xt−h,j)
= (symmetry) = Cov(Xt−h,j ,Xt,i ) = γji (−h).
Another simple property is |γi ,j(h)| ≤ (γii (0)γjj(0))1/2 .
The ACF ρij(h) =γij(h)
(γii (0)γjj(0))1/2.
24 novembre 2014 7 / 29
Multivariate time-series
Autocovariance
A mutivariate stochastic process {Xt ∈ Rm}, t ∈ Z is weakly stationary if
E(X 2t,i ) <∞ ∀ t, i E(Xt) ≡ µ, Cov(Xt+h,Xt) ≡ Γ(h).
In particular γij(h) = Cov(Xt+h,i ,Xt,j) = E((Xt+h,i − µi )(Xt,j − µj)).
Note that in general γij(h) 6= γji (h), while
γij(h) = Cov(Xt+h,i ,Xt,j) = (stationarity) = Cov(Xt,i ,Xt−h,j)
= (symmetry) = Cov(Xt−h,j ,Xt,i ) = γji (−h).
Another simple property is |γi ,j(h)| ≤ (γii (0)γjj(0))1/2 .
The ACF ρij(h) =γij(h)
(γii (0)γjj(0))1/2.
24 novembre 2014 7 / 29
Multivariate time-series
Autocovariance
A mutivariate stochastic process {Xt ∈ Rm}, t ∈ Z is weakly stationary if
E(X 2t,i ) <∞ ∀ t, i E(Xt) ≡ µ, Cov(Xt+h,Xt) ≡ Γ(h).
In particular γij(h) = Cov(Xt+h,i ,Xt,j) = E((Xt+h,i − µi )(Xt,j − µj)).
Note that in general γij(h) 6= γji (h), while
γij(h) = Cov(Xt+h,i ,Xt,j) = (stationarity) = Cov(Xt,i ,Xt−h,j)
= (symmetry) = Cov(Xt−h,j ,Xt,i ) = γji (−h).
Another simple property is |γi ,j(h)| ≤ (γii (0)γjj(0))1/2 .
The ACF ρij(h) =γij(h)
(γii (0)γjj(0))1/2.
24 novembre 2014 7 / 29
Multivariate time-series
Multivariate White-noise and MA
A mutivariate stochastic process {Zt ∈ Rm} is a white-noise withcovariance S , {Zt} ∼WN(0, S), if
{Zt} is stationary with mean 0 and ACVF Γ(h) =
{S h = 0
0 h 6= 0.
{Xt ∈ Rm} is a linear process if
Xt =+∞∑
k=−∞CkZt−k {Zt} ∼WN(0,S)
and Ck are matrices s.t.+∞∑
k=−∞|(Ck)ij | < +∞ for all i , j = 1 . . .m.
{Xt} is stationary and ΓX (h) =∞∑
k=−∞Ck+hSC t
k .
24 novembre 2014 8 / 29
Multivariate time-series
Multivariate White-noise and MA
A mutivariate stochastic process {Zt ∈ Rm} is a white-noise withcovariance S , {Zt} ∼WN(0, S), if
{Zt} is stationary with mean 0 and ACVF Γ(h) =
{S h = 0
0 h 6= 0.
{Xt ∈ Rm} is a linear process if
Xt =+∞∑
k=−∞CkZt−k {Zt} ∼WN(0, S)
and Ck are matrices s.t.+∞∑
k=−∞|(Ck)ij | < +∞ for all i , j = 1 . . .m.
{Xt} is stationary and ΓX (h) =∞∑
k=−∞Ck+hSC t
k .
24 novembre 2014 8 / 29
Multivariate time-series
Multivariate White-noise and MA
A mutivariate stochastic process {Zt ∈ Rm} is a white-noise withcovariance S , {Zt} ∼WN(0, S), if
{Zt} is stationary with mean 0 and ACVF Γ(h) =
{S h = 0
0 h 6= 0.
{Xt ∈ Rm} is a linear process if
Xt =+∞∑
k=−∞CkZt−k {Zt} ∼WN(0, S)
and Ck are matrices s.t.+∞∑
k=−∞|(Ck)ij | < +∞ for all i , j = 1 . . .m.
{Xt} is stationary and ΓX (h) =∞∑
k=−∞Ck+hSC t
k .
24 novembre 2014 8 / 29
Multivariate time-series
Estimation of mean
The mean µ can be estimated through Xn. From the univariate theory, weknow
E(Xn) = µ, V((Xn)i )→ 0 (as n→∞), if γii (h)h→∞−−−→ 0
nV((Xn)i )→+∞∑
h=−∞γii (h) if
+∞∑h=−∞
|γii (h)| < +∞.
Moreover (Xn)i is asymptotically normal. Stronger assumptions arerequired for the vector Xn to be asymptotically normal
Theorem
If Xt = µ++∞∑
k=−∞CkZt−k {Zt} ∼WN(0, S)
then n1/2(Xn − µ) =⇒ N(0,∞∑
k=−∞Ck+hSC t
k ).
24 novembre 2014 9 / 29
Multivariate time-series
Confidence intervals for the mean
In principle, from Xn ∼ N(µ,1
n
∞∑k=−∞
Ck+hSC tk ) one could build an
m-dimensional confidence ellipsoid.But...
not intuitive, Ck and S not known and have to be estimated. . .
Instead, build confidence intervals from (Xn)i ∼ N(µi ,1
n
+∞∑h=−∞
γii (h)).
+∞∑h=−∞
γii (h) = 2πfi (0) can be consistently estimated from
2πfi (0) =r∑
h=−r
(1− |h|
r
)γii (h) where rn →∞ and
rnn→ 0.
Componentwise confidence intervals can be combined. If we found ui (α)s.t. P(|µi − (Xn)i | < ui (a)) ≥ 1− α, then
P(|µi − (Xn)i |<ui (a), i = 1,m) ≥ 1−m∑i=1
P(|µi − (Xn)i | ≥ ui (a)
)≥ 1−mα.
Choosing α =0.05
m, one has a 95%-confidence m-rectangle.
24 novembre 2014 10 / 29
Multivariate time-series
Confidence intervals for the mean
In principle, from Xn ∼ N(µ,1
n
∞∑k=−∞
Ck+hSC tk ) one could build an
m-dimensional confidence ellipsoid.But... not intuitive, Ck and S not known and have to be estimated. . .
Instead, build confidence intervals from (Xn)i ∼ N(µi ,1
n
+∞∑h=−∞
γii (h)).
+∞∑h=−∞
γii (h) = 2πfi (0) can be consistently estimated from
2πfi (0) =r∑
h=−r
(1− |h|
r
)γii (h) where rn →∞ and
rnn→ 0.
Componentwise confidence intervals can be combined. If we found ui (α)s.t. P(|µi − (Xn)i | < ui (a)) ≥ 1− α, then
P(|µi − (Xn)i |<ui (a), i = 1,m) ≥ 1−m∑i=1
P(|µi − (Xn)i | ≥ ui (a)
)≥ 1−mα.
Choosing α =0.05
m, one has a 95%-confidence m-rectangle.
24 novembre 2014 10 / 29
Multivariate time-series
Confidence intervals for the mean
In principle, from Xn ∼ N(µ,1
n
∞∑k=−∞
Ck+hSC tk ) one could build an
m-dimensional confidence ellipsoid.But... not intuitive, Ck and S not known and have to be estimated. . .
Instead, build confidence intervals from (Xn)i ∼ N(µi ,1
n
+∞∑h=−∞
γii (h)).
+∞∑h=−∞
γii (h) = 2πfi (0) can be consistently estimated from
2πfi (0) =r∑
h=−r
(1− |h|
r
)γii (h) where rn →∞ and
rnn→ 0.
Componentwise confidence intervals can be combined. If we found ui (α)s.t. P(|µi − (Xn)i | < ui (a)) ≥ 1− α, then
P(|µi − (Xn)i |<ui (a), i = 1,m) ≥ 1−m∑i=1
P(|µi − (Xn)i | ≥ ui (a)
)≥ 1−mα.
Choosing α =0.05
m, one has a 95%-confidence m-rectangle.
24 novembre 2014 10 / 29
Multivariate time-series
Confidence intervals for the mean
In principle, from Xn ∼ N(µ,1
n
∞∑k=−∞
Ck+hSC tk ) one could build an
m-dimensional confidence ellipsoid.But... not intuitive, Ck and S not known and have to be estimated. . .
Instead, build confidence intervals from (Xn)i ∼ N(µi ,1
n
+∞∑h=−∞
γii (h)).
+∞∑h=−∞
γii (h) = 2πfi (0) can be consistently estimated from
2πfi (0) =r∑
h=−r
(1− |h|
r
)γii (h) where rn →∞ and
rnn→ 0.
Componentwise confidence intervals can be combined. If we found ui (α)s.t. P(|µi − (Xn)i | < ui (a)) ≥ 1− α, then
P(|µi − (Xn)i |<ui (a), i = 1,m) ≥ 1−m∑i=1
P(|µi − (Xn)i | ≥ ui (a)
)≥ 1−mα.
Choosing α =0.05
m, one has a 95%-confidence m-rectangle.
24 novembre 2014 10 / 29
Multivariate time-series
Confidence intervals for the mean
In principle, from Xn ∼ N(µ,1
n
∞∑k=−∞
Ck+hSC tk ) one could build an
m-dimensional confidence ellipsoid.But... not intuitive, Ck and S not known and have to be estimated. . .
Instead, build confidence intervals from (Xn)i ∼ N(µi ,1
n
+∞∑h=−∞
γii (h)).
+∞∑h=−∞
γii (h) = 2πfi (0) can be consistently estimated from
2πfi (0) =r∑
h=−r
(1− |h|
r
)γii (h) where rn →∞ and
rnn→ 0.
Componentwise confidence intervals can be combined. If we found ui (α)s.t. P(|µi − (Xn)i | < ui (a)) ≥ 1− α, then
P(|µi − (Xn)i |<ui (a), i = 1,m) ≥ 1−m∑i=1
P(|µi − (Xn)i | ≥ ui (a)
)≥ 1−mα.
Choosing α =0.05
m, one has a 95%-confidence m-rectangle.
24 novembre 2014 10 / 29
Multivariate time-series
Estimation of ACVF (bivariate case, m = 2)
Γ(h) =
1
n
n−h∑t=1
(Xt+h − Xn)(Xt − Xn)t 0 ≤ h < n
1
n
n∑t=−h+1
(Xt+h − Xn)(Xt − Xn)t −n < h < 0.
ρij(h) = γij(h)(γii (0)γjj(0))−1/2.
Theorem
If Xt = µ++∞∑
k=−∞CkZt−k {Zt} ∼ IID(0,S)
then∀ h γij(h)
P−→ γij(h) ρij(h)P−→ ρij(h) as n→∞.
24 novembre 2014 11 / 29
Multivariate time-series
Estimation of ACVF (bivariate case, m = 2)
Γ(h) =
1
n
n−h∑t=1
(Xt+h − Xn)(Xt − Xn)t 0 ≤ h < n
1
n
n∑t=−h+1
(Xt+h − Xn)(Xt − Xn)t −n < h < 0.
ρij(h) = γij(h)(γii (0)γjj(0))−1/2.
Theorem
If Xt = µ++∞∑
k=−∞CkZt−k {Zt} ∼ IID(0, S)
then∀ h γij(h)
P−→ γij(h) ρij(h)P−→ ρij(h) as n→∞.
24 novembre 2014 11 / 29
Multivariate time-series
An example: Southern Oscillation Index
Southern Oscillation Index (an environmental measure) compared to fishrecruitment in South Pacific (1950 to 1985)
Southern Oscillation Index
1950 1960 1970 1980
-1.0
0.00.51.0
Recruitment
1950 1960 1970 1980
020
60100
24 novembre 2014 12 / 29
Multivariate time-series
ACF of Southern Oscillation Index
0.0 0.5 1.0 1.5
-0.5
0.0
0.5
1.0
Lag
ACF
soi
0.0 0.5 1.0 1.5
-0.5
0.0
0.5
1.0
Lag
soi & rec
-1.5 -1.0 -0.5 0.0
-0.5
0.0
0.5
1.0
Lag
ACF
rec & soi
0.0 0.5 1.0 1.5
-0.5
0.0
0.5
1.0
Lag
rec
Bottom leftpanel is γ12 ofnegative lags.
24 novembre 2014 13 / 29
Multivariate time-series
An example from Box and Jenkins
Sales (V2) with a leading indicator (V1)
1011
1213
14
V1
200
220
240
260
0 50 100 150
V2
Time
sales
24 novembre 2014 14 / 29
Multivariate time-series
ACF of sales data
0 5 10 15
-0.2
0.20.40.60.81.0
Lag
ACF
V1
0 5 10 15
-0.2
0.20.40.60.81.0
Lag
V1 & V2
-15 -10 -5 0
-0.2
0.20.40.60.81.0
Lag
ACF
V2 & V1
0 5 10 15
-0.2
0.20.40.60.81.0
Lag
V2
Data are notstationary.
24 novembre 2014 15 / 29
Multivariate time-series
Differenced sales data
-0.5
0.0
0.5
V1
-20
24
0 50 100 150
V2
Time
dsales
24 novembre 2014 16 / 29
Multivariate time-series
ACF of sales data
0 5 10 15
-0.5
0.0
0.5
1.0
Lag
ACF
V1
0 5 10 15
-0.5
0.0
0.5
1.0
Lag
V1 & V2
-15 -10 -5 0
-0.5
0.0
0.5
1.0
Lag
ACF
V2 & V1
0 5 10 15
-0.5
0.0
0.5
1.0
Lag
V2
Only cross-correlationrelevant onlyat lags−2, −3.
24 novembre 2014 17 / 29
Multivariate time-series
Testing for independence of time-series: basis
Generally asymptotic distribution of γij(h) is complicated. But
Theorem
Let
Xt,1 =∞∑
j=−∞αjZt−j ,1 Xt,2 =
∞∑j=−∞
βjZt−j ,2
with {Zt,1} ∼WN(0, σ21), {Zt,2} ∼WN(0, σ22) and independent.
Then
nV(γ12(h))n→∞−−−→
∞∑j=−∞
γ11(j)γ22(j).
n1/2ρ12(h) =⇒ N
0,∞∑
j=−∞ρ11(j)ρ22(j)
.
24 novembre 2014 18 / 29
Multivariate time-series
Testing for independence of time-series: basis
Generally asymptotic distribution of γij(h) is complicated. But
Theorem
Let
Xt,1 =∞∑
j=−∞αjZt−j ,1 Xt,2 =
∞∑j=−∞
βjZt−j ,2
with {Zt,1} ∼WN(0, σ21), {Zt,2} ∼WN(0, σ22) and independent.
Then
nV(γ12(h))n→∞−−−→
∞∑j=−∞
γ11(j)γ22(j).
n1/2ρ12(h) =⇒ N
0,∞∑
j=−∞ρ11(j)ρ22(j)
.
24 novembre 2014 18 / 29
Multivariate time-series
Testing for independence of time-series: basis
Generally asymptotic distribution of γij(h) is complicated. But
Theorem
Let
Xt,1 =∞∑
j=−∞αjZt−j ,1 Xt,2 =
∞∑j=−∞
βjZt−j ,2
with {Zt,1} ∼WN(0, σ21), {Zt,2} ∼WN(0, σ22) and independent.
Then
nV(γ12(h))n→∞−−−→
∞∑j=−∞
γ11(j)γ22(j).
n1/2ρ12(h) =⇒ N
0,∞∑
j=−∞ρ11(j)ρ22(j)
.
24 novembre 2014 18 / 29
Multivariate time-series
Testing for independence of time-series: basis
Generally asymptotic distribution of γij(h) is complicated. But
Theorem
Let
Xt,1 =∞∑
j=−∞αjZt−j ,1 Xt,2 =
∞∑j=−∞
βjZt−j ,2
with {Zt,1} ∼WN(0, σ21), {Zt,2} ∼WN(0, σ22) and independent.
Then
nV(γ12(h))n→∞−−−→
∞∑j=−∞
γ11(j)γ22(j).
n1/2ρ12(h) =⇒ N
0,∞∑
j=−∞ρ11(j)ρ22(j)
.
24 novembre 2014 18 / 29
Multivariate time-series
Testing for independence of time-series: an example
Suppose {Xt,1} and {Xt,2} are independent AR(1) processes with
ρi ,i (h) = 0.8|h|.
Then asymptotic variance of ρ12(h) is
n−1∞∑
h=−∞0.64|h| ∼ n−14.556
Values of ρ12(h) quite larger than 1.96n−1 should be common even if thetwo series are independent.Instead, if one series is white-noise, then V(ρ12(h)) ≈ 1
n .
Hence, in testing for independence, it is often recommended to“prewhiten” one series.
24 novembre 2014 19 / 29
Multivariate time-series
Testing for independence of time-series: an example
Suppose {Xt,1} and {Xt,2} are independent AR(1) processes with
ρi ,i (h) = 0.8|h|.
Then asymptotic variance of ρ12(h) is
n−1∞∑
h=−∞0.64|h| ∼ n−14.556
Values of ρ12(h) quite larger than 1.96n−1 should be common even if thetwo series are independent.
Instead, if one series is white-noise, then V(ρ12(h)) ≈ 1n .
Hence, in testing for independence, it is often recommended to“prewhiten” one series.
24 novembre 2014 19 / 29
Multivariate time-series
Testing for independence of time-series: an example
Suppose {Xt,1} and {Xt,2} are independent AR(1) processes with
ρi ,i (h) = 0.8|h|.
Then asymptotic variance of ρ12(h) is
n−1∞∑
h=−∞0.64|h| ∼ n−14.556
Values of ρ12(h) quite larger than 1.96n−1 should be common even if thetwo series are independent.Instead, if one series is white-noise, then V(ρ12(h)) ≈ 1
n .
Hence, in testing for independence, it is often recommended to“prewhiten” one series.
24 novembre 2014 19 / 29
Multivariate time-series
Testing for independence of time-series: an example
Suppose {Xt,1} and {Xt,2} are independent AR(1) processes with
ρi ,i (h) = 0.8|h|.
Then asymptotic variance of ρ12(h) is
n−1∞∑
h=−∞0.64|h| ∼ n−14.556
Values of ρ12(h) quite larger than 1.96n−1 should be common even if thetwo series are independent.Instead, if one series is white-noise, then V(ρ12(h)) ≈ 1
n .
Hence, in testing for independence, it is often recommended to“prewhiten” one series.
24 novembre 2014 19 / 29
Multivariate time-series
‘Pre-whitening’ a time series
Instead of testing ρ12(h) of the original series, one trasforms them intowhite noise.If {Xt,1} and {Xt,2} are invertible ARMA, then
Zt,i =∞∑k=0
π(i)k Xt−k,i ∼WN(0, σ2i ), i = 1, 2
where∞∑k=0
π(i)k zk = π(i)(z) = φ(i)(z)/θ(i)(z).
{Xt,1} and {Xt,2} are independent if and only if {Zt,1} and {Zt,2}, henceone test for ρZ1,Z2(h).
As φ(i)(z) and θ(i)(z) not known, one fits an ARMA to the series, anduses the residuals Wt,i in place of Zt,i .It may be enough doing this just to one series.
24 novembre 2014 20 / 29
Multivariate time-series
‘Pre-whitening’ a time series
Instead of testing ρ12(h) of the original series, one trasforms them intowhite noise.If {Xt,1} and {Xt,2} are invertible ARMA, then
Zt,i =∞∑k=0
π(i)k Xt−k,i ∼WN(0, σ2i ), i = 1, 2
where∞∑k=0
π(i)k zk = π(i)(z) = φ(i)(z)/θ(i)(z).
{Xt,1} and {Xt,2} are independent if and only if {Zt,1} and {Zt,2}, henceone test for ρZ1,Z2(h).
As φ(i)(z) and θ(i)(z) not known, one fits an ARMA to the series, anduses the residuals Wt,i in place of Zt,i .It may be enough doing this just to one series.
24 novembre 2014 20 / 29
Multivariate time-series
‘Pre-whitening’ a time series
Instead of testing ρ12(h) of the original series, one trasforms them intowhite noise.If {Xt,1} and {Xt,2} are invertible ARMA, then
Zt,i =∞∑k=0
π(i)k Xt−k,i ∼WN(0, σ2i ), i = 1, 2
where∞∑k=0
π(i)k zk = π(i)(z) = φ(i)(z)/θ(i)(z).
{Xt,1} and {Xt,2} are independent if and only if {Zt,1} and {Zt,2}, henceone test for ρZ1,Z2(h).
As φ(i)(z) and θ(i)(z) not known, one fits an ARMA to the series, anduses the residuals Wt,i in place of Zt,i .It may be enough doing this just to one series.
24 novembre 2014 20 / 29
Multivariate time-series
Siimulated data
1st series is AR(1) with ϕ = 0.9; 2nd series is AR(2) with ϕ1 = 0.7,ϕ2 = 0.27.
-4-2
02
4
dat1
-6-4
-20
24
0 50 100 150 200
dat2
Time
dat_sim
24 novembre 2014 21 / 29
Multivariate time-series
ACF of simulated data
0 5 10 15 20
0.00.20.40.60.81.0
Lag
ACF
dat1
0 5 10 15 20
0.00.20.40.60.81.0
Lag
dat1 & dat2
-20 -15 -10 -5 0
0.00.20.40.60.81.0
Lag
ACF
dat2 & dat1
0 5 10 15 20
0.00.20.40.60.81.0
Lag
dat2
24 novembre 2014 22 / 29
Multivariate time-series
ACF of residuals
MLE fits the correct model to both series.
0 5 10 15 20
0.00.20.40.60.81.0
Lag
ACF
fitunk1$res
0 5 10 15 20
0.00.20.40.60.81.0
Lag
fitunk1$res & fitunk2$res
-20 -15 -10 -5 0
0.00.20.40.60.81.0
Lag
ACF
fitunk2$res & fitunk1$res
0 5 10 15 20
0.00.20.40.60.81.0
Lag
fitunk2$res
A few cross-correlationcoefficientmay appearslightlysignificant.
24 novembre 2014 23 / 29
Multivariate time-series
ACF of residuals
MLE fits the correct model to both series.
0 5 10 15 20
0.00.20.40.60.81.0
Lag
ACF
fitunk1$res
0 5 10 15 20
0.00.20.40.60.81.0
Lag
fitunk1$res & fitunk2$res
-20 -15 -10 -5 0
0.00.20.40.60.81.0
Lag
ACF
fitunk2$res & fitunk1$res
0 5 10 15 20
0.00.20.40.60.81.0
Lag
fitunk2$res
A few cross-correlationcoefficientmay appearslightlysignificant.
24 novembre 2014 23 / 29
Multivariate time-series
Bartlett’s formula
More generally
Theorem
If {Xt} is a bivariate Gaussian time series with+∞∑
h=−∞|γij(h)| <∞, then
limn→∞
nCov (ρ12(h), ρ12(k)) =+∞∑
j=−∞
[ρ11(j)ρ22(j + k − h)
+ρ12(j + k)ρ21(j − h)− ρ12(h) (ρ11(j)ρ12(j + k) + ρ22(j)ρ21(j − k))
−ρ12(k) (ρ11(j)ρ12(j + h) + ρ22(j)ρ21(j − h))
+ρ12(h)ρ12(k)
(1
2ρ211(j) + ρ212(j) +
1
2ρ222(j)
)]
24 novembre 2014 24 / 29
Multivariate time-series
Spectral density of multivariate series
If+∞∑
h=−∞|γij(h)| <∞, one can define
f (λ) =1
2π
∞∑h=−∞
e−ihλΓ(h), λ ∈ [−π, π]
and one obtainsΓ(h) =
∫ π
−πe iλhf (λ) dλ
and
Xt =
∫ π
−πe iλh dZ (λ)
where Zi (·) are (complex) processes with independent increments s.t.∫ λ2
λ1
fij(λ) dλ = E(
(Zi (λ2)− Zi (λj))(Zj(λ2)− Zj(λ1))).
24 novembre 2014 25 / 29
Multivariate time-series
Spectral density of multivariate series
If+∞∑
h=−∞|γij(h)| <∞, one can define
f (λ) =1
2π
∞∑h=−∞
e−ihλΓ(h), λ ∈ [−π, π]
and one obtainsΓ(h) =
∫ π
−πe iλhf (λ) dλ
and
Xt =
∫ π
−πe iλh dZ (λ)
where Zi (·) are (complex) processes with independent increments s.t.∫ λ2
λ1
fij(λ) dλ = E(
(Zi (λ2)− Zi (λj))(Zj(λ2)− Zj(λ1))).
24 novembre 2014 25 / 29
Multivariate time-series
Coherence of series
For a bivariate series the coherence at frequency λ is
X12(λ) =f12(λ)
[f11(λ)f22(λ)]1/2
and represents the correlation between dZ1(λ) and dZ2(λ).
The squared coherency function is |X12(λ)|2 satisfies 0 ≤ |X12(λ)|2 ≤ 1.
Remark. If Xt,2 =+∞∑
k=−∞ψkXt−k,1, then |X12(λ)|2 ≡ 1.
24 novembre 2014 26 / 29
Multivariate time-series
Coherence of series
For a bivariate series the coherence at frequency λ is
X12(λ) =f12(λ)
[f11(λ)f22(λ)]1/2
and represents the correlation between dZ1(λ) and dZ2(λ).
The squared coherency function is |X12(λ)|2 satisfies 0 ≤ |X12(λ)|2 ≤ 1.
Remark. If Xt,2 =+∞∑
k=−∞ψkXt−k,1, then |X12(λ)|2 ≡ 1.
24 novembre 2014 26 / 29
Multivariate time-series
Periodogram
Define J(ωj) = n−1/2n∑
t=1
Xte−itωj , ωj = 2πj/n
for j between −[(n − 1)/2] and [n/2].
Then In(ωj) = J(ωj)J∗(ωj) where ∗ means transpose and complexconjugate.
I12(ωj) =1
n
(n∑
t=1
Xt1e−itωj
)(n∑
t=1
Xt2e itωj
)is the cross periodogram.
24 novembre 2014 27 / 29
Multivariate time-series
Estimation of spectral density and coherence
Again, one estimates f (λ) by
f (λ) =1
2π
mn∑k=−mn
Wn(k)In
(g(n, λ) + 2π
k
n
).
If Xt =∑+∞
k=−∞ CkZt−k {Zt} ∼ IID(0, S) then
fij(λ) ∼ AN
fij(λ), fij(λ)mn∑
k=−mn
W 2n (k)
0 < λ < π.
The natural estimator of |X12(λ)|2 is
χ212(λ) =
|f12(λ)|2
f11(λ)f22(λ).
24 novembre 2014 28 / 29
Multivariate time-series
An example of coherency estimation
0 1 2 3 4 5 6
0.0
0.2
0.4
0.6
0.8
1.0
frequency
squa
red
cohe
renc
y
Squared coherency between SOI and recruitment
The horizontalline represents a(conservative)test of theassumption|X12(λ)|2 = 0.Strongcoherency atperiod 1 yr. andlonger than 3.
24 novembre 2014 29 / 29