Model choice. Akaike’s criterion Akaike criterion ...

Model choice. Akaike’s criterion

Akaike criterion: Kullback-Leibler discrepancy

Given a family of probability densities {f (·;ψ), ψ ∈ Ψ}, Kullback-Leibler’sindex of f (·;ψ) relative to f (·; θ) is

∆(ψ|θ) = Eθ(−2 log(f (X ;ψ))) =

−2 log(f (x ;ψ))f (x ; θ) dx .

Kullback-Leibler’s discrepancy between f (·;ψ) and f (·; θ) is

d(ψ|θ) = ∆(ψ|θ)−∆(θ|θ) =

−2 log

(f (x ;ψ)

f (x ; θ)

)f (x ; θ) dx .

Jensen’s inequality implies E(log(Y )) ≤ log(E(Y )) for any randomvariable. Hence

d(ψ|θ) ≥ −2 log

(∫Rn

f (x ;ψ)

f (x ; θ)f (x ; θ) dx

with equality only if f (x ;ψ) = f (x ; θ) a.e. [f (·; θ)].

24 novembre 2014 1 / 29

Akaike criterion: Kullback-Leibler discrepancy

Given a family of probability densities {f (·;ψ), ψ ∈ Ψ}, Kullback-Leibler’sindex of f (·;ψ) relative to f (·; θ) is

∆(ψ|θ) = Eθ(−2 log(f (X ;ψ))) =

−2 log(f (x ;ψ))f (x ; θ) dx .

Kullback-Leibler’s discrepancy between f (·;ψ) and f (·; θ) is

d(ψ|θ) = ∆(ψ|θ)−∆(θ|θ) =

−2 log

(f (x ;ψ)

f (x ; θ)

)f (x ; θ) dx .

Jensen’s inequality implies E(log(Y )) ≤ log(E(Y )) for any randomvariable. Hence

d(ψ|θ) ≥ −2 log

(∫Rn

f (x ;ψ)

f (x ; θ)f (x ; θ) dx

with equality only if f (x ;ψ) = f (x ; θ) a.e. [f (·; θ)].

24 novembre 2014 1 / 29

Approximating Kullback-Leibler discrepancy

Given observations X1, . . . ,Xn, we would like to minimize d(ψ|θ) amongall candidate models ψ, given the true model θ.

As the true model is unknown, we estimate d(ψ|θ). Let ψ = (φ, ϑ, σ2)the parameters of an ARMA(p,q) model and ψ the MLE based onX1, . . . ,Xn. Let Y an independent realization of the same process. Then

−2 log LY (φ, ϑ, σ2) = n log(2π) + n log(σ2) + log(r0 . . . rn−1) +SY (φ, ϑ)

Indeed remember that for an ARMA(p,q) process

L(φ, ϑ, σ2) = (2πσ2)−n/2(r0 . . . rn−1)−1/2 exp

{− 1

2σ2S(φ, ϑ)

}with S(φ, ϑ) =

n∑j=1

(xj − xj)2

rj−1.

r0, . . . , rn−1 depend only on parameters (φ, ϑ) and not on observed data.Data enter likelihood only through the terms (xj − xj)

2 in S(φ, ϑ).

24 novembre 2014 2 / 29

Given observations X1, . . . ,Xn, we would like to minimize d(ψ|θ) amongall candidate models ψ, given the true model θ.As the true model is unknown, we estimate d(ψ|θ).

Let ψ = (φ, ϑ, σ2)the parameters of an ARMA(p,q) model and ψ the MLE based onX1, . . . ,Xn. Let Y an independent realization of the same process. Then

L(φ, ϑ, σ2) = (2πσ2)−n/2(r0 . . . rn−1)−1/2 exp

{− 1

2σ2S(φ, ϑ)

}with S(φ, ϑ) =

n∑j=1

(xj − xj)2

rj−1.

2 in S(φ, ϑ).

24 novembre 2014 2 / 29

Given observations X1, . . . ,Xn, we would like to minimize d(ψ|θ) amongall candidate models ψ, given the true model θ.As the true model is unknown, we estimate d(ψ|θ). Let ψ = (φ, ϑ, σ2)the parameters of an ARMA(p,q) model and ψ the MLE based onX1, . . . ,Xn. Let Y an independent realization of the same process. Then

L(φ, ϑ, σ2) = (2πσ2)−n/2(r0 . . . rn−1)−1/2 exp

{− 1

2σ2S(φ, ϑ)

}with S(φ, ϑ) =

n∑j=1

(xj − xj)2

rj−1.

2 in S(φ, ϑ).

24 novembre 2014 2 / 29

Given observations X1, . . . ,Xn, we would like to minimize d(ψ|θ) amongall candidate models ψ, given the true model θ.As the true model is unknown, we estimate d(ψ|θ). Let ψ = (φ, ϑ, σ2)the parameters of an ARMA(p,q) model and ψ the MLE based onX1, . . . ,Xn. Let Y an independent realization of the same process. Then

L(φ, ϑ, σ2) = (2πσ2)−n/2(r0 . . . rn−1)−1/2 exp

{− 1

2σ2S(φ, ϑ)

}with S(φ, ϑ) =

n∑j=1

(xj − xj)2

rj−1.

2 in S(φ, ϑ).24 novembre 2014 2 / 29

Given observations X1, . . . ,Xn, we would like to minimize d(ψ|θ) amongall candidate models ψ, given the true model θ.As the true model is unknown, we estimate d(ψ|θ). Let ψ = (φ, ϑ, σ2) theparameters of an ARMA(p,q) model and ψ the MLE based on X1, . . . ,Xn.Let Y an independent realization of the same process. Then