Introduction to State Space Methods · 2004. 9. 13. · Introduction to State Space Methods Siem...
Transcript of Introduction to State Space Methods · 2004. 9. 13. · Introduction to State Space Methods Siem...
Introduction to State Space Methods
Siem Jan Koopman
Vrije Universiteit Amsterdam
Tinbergen Institute
Introduction to State Space Methods – p. 1
State Space Model
Linear Gaussian state space model is defined in three parts:
→ State equation:
αt+1 = Ttαt +Rtζt, ζt ∼ NID(0, Qt),
→ Observation equation:
yt = Ztαt + εt, εt ∼ NID(0, Gt),
→ Initial state distribution α1 ∼ N (a1, P1).
Notice that• ζt and εs independent for all t, s, and independent from α1;• observation yt can be multivariate;• state vector αt is unobserved;• matrices Tt, Zt, Rt, Qt, Gt determine structure of model.
Introduction to State Space Methods – p. 2
State Space Model
• state space model is linear and Gaussian: therefore propertiesand results of multivariate normal distribution apply;
• state vector αt evolves as a VAR(1) process;• system matrices usually contain unknown parameters;• estimation has therefore two aspects:
◦ measuring the unobservable state (prediction, filtering andsmoothing);
◦ estimation of unknown parameters (maximum likelihoodestimation);
• state space methods offer a unified approach to a wide range ofmodels and techniques: dynamic regression, ARIMA, UC models,latent variable models, spline-fitting and many ad-hoc filters;
• next, some well-known model specifications in state space form ...
Introduction to State Space Methods – p. 3
Regression with Time Varying Coefficients
General state space model:
αt+1 = Ttαt +Rtζt, ζt ∼ NID(0, Qt),
yt = Ztαt + εt, εt ∼ NID(0, Gt).
Put regressors in Zt,Tt = I, Rt = I,
Result is regression model with coefficient αt following a random walk.
Introduction to State Space Methods – p. 4
ARMA in State Space Form
Example: AR(2) model yt+1 = φ1yt + φ2yt−1 + ζt, in state space:
αt+1 = Ttαt +Rtζt, ζt ∼ NID(0, Qt),
yt = Ztαt + εt, εt ∼ NID(0, Gt).
with 2 × 1 state vector αt and system matrices:
Zt =[
1 0]
, Gt = 0
Tt =
[
φ1 1
φ2 0
]
, Rt =
[
1
0
]
, Qt = σ2
• Zt and Gt = 0 imply that α1t = yt;• First state equation implies yt+1 = φ1yt + α2t + ζt withζt ∼ NID(0, σ2);
• Second state equation implies α2,t+1 = φ2yt;
Introduction to State Space Methods – p. 5
ARMA in State Space Form
Example: MA(1) model yt+1 = ζt + θζt−1, in state space:
αt+1 = Ttαt +Rtζt, ζt ∼ NID(0, Qt),
yt = Ztαt + εt, εt ∼ NID(0, Gt).
with 2 × 1 state vector αt and system matrices:
Zt =[
1 0]
, Gt = 0
Tt =
[
0 1
0 0
]
, Rt =
[
1
θ
]
, Qt = σ2
• Zt and Gt = 0 imply that α1t = yt;
• First state equation implies yt+1 = α2t + ζt with ζt ∼ NID(0, σ2);
• Second state equation implies α2,t+1 = θζt;
Introduction to State Space Methods – p. 6
ARMA in State Space Form
Example: ARMA(2,1) model
yt = φ1yt−1 + φ2yt−2 + ζt + θζt−1
in state space form
αt =
[
yt
φ2yt−1 + θζt
]
Zt =[
1 0]
, Gt = 0,
Tt =
[
φ1 1
φ2 0
]
, Rt =
[
1
θ
]
, Qt = σ2
All ARIMA(p, d, q) models have a (non-unique) state spacerepresentation.
Introduction to State Space Methods – p. 7
UC models in State Space Form
State space model: αt+1 = Ttαt +Rtζt, yt = Ztαt + εt.
LL model ∆µt+1 = ηt and yt = µt + εt:
αt = µt, Tt = 1, Rt = 1, Qt = σ2η,
Zt = 1, Gt = σ2ε .
LLT model ∆µt+1 = βt + ηt, ∆βt+1 = ξt and yt = µt + εt:
αt =
[
µt
βt
]
, Tt =
[
1 1
0 1
]
, Rt =
[
1 0
0 1
]
, Qt =
[
σ2η 0
0 σ2ξ
]
,
Zt =[
1 0]
, Gt = σ2ε .
Introduction to State Space Methods – p. 8
UC models in State Space Form
State space model: αt+1 = Ttαt +Rtζt, yt = Ztαt + εt.
LLT model with season: ∆µt+1 = βt + ηt, ∆βt+1 = ξt,S(L)γt+1 = ωt and yt = µt + γt + εt:
αt =[
µt βt γt γt−1 γt−2
]
′
,
Tt =
1 1 0 0 0
0 1 0 0 0
0 0 −1 −1 −1
0 0 1 0 0
0 0 0 1 0
, Qt =
σ2η 0 0
0 σ2ξ 0
0 0 σ2ω
, Rt =
1 0 0
0 1 0
0 0 1
0 0 0
0 0 0
,
Zt =[
1 0 1 0 0]
, Gt = σ2ε .
Introduction to State Space Methods – p. 9
Kalman Filter
• The Kalman filter calculates the mean and variance of theunobserved state, given the observations.
• The state is Gaussian: the complete distribution is characterizedby the mean and variance.
• The filter is a recursive algorithm; the current best estimate isupdated whenever a new observation is obtained.
• To start the recursion, we need a1 and P1, which we assumedgiven.
• There are various ways to initialize when a1 and P1 are unknown,which we will not discuss here.
Introduction to State Space Methods – p. 10
Kalman Filter
The unobserved state αt can be estimated from the observations withthe Kalman filter :
vt = yt − Ztat,
Ft = ZtPtZ′
t +Gt,
Kt = TtPtZ′
tF−1t ,
at+1 = Ttat +Ktvt,
Pt+1 = TtPtT′
t +RtQtR′
t −KtFtK′
t,
for t = 1, . . . , n and starting with given values for a1 and P1.
• Writing Yt = {y1, . . . , yt},
at+1 = E(αt+1|Yt), Pt+1 = var(αt+1|Yt).
Introduction to State Space Methods – p. 11
Kalman Filter
State space model: αt+1 = Ttαt +Rtζt, yt = Ztαt + εt.
• Writing Yt = {y1, . . . , yt}, define
at+1 = E(αt+1|Yt), Pt+1 = var(αt+1|Yt);
• The prediction error is
vt = yt − E(yt|Yt−1)
= yt − E(Ztαt + εt|Yt−1)
= yt − Zt E(αt|Yt−1)
= yt − Ztat;
• It follows that vt = Zt(αt − at) + εt and E(vt) = 0;
• The prediction error variance is Ft = var(vt) = ZtPtZ′
t +Gt.
Introduction to State Space Methods – p. 12
Lemma
The proof of the Kalman filter uses a lemma from multivariate Normalregression theory.
Lemma Suppose x, y and z are jointly Normally distributed vectorswith E(z) = 0 and Σyz = 0. Then
E(x|y, z) = E(x|y) + ΣxzΣ−1zz z,
var(x|y, z) = var(x|y) − ΣxzΣ−1zz Σ′
xz,
Introduction to State Space Methods – p. 13
Kalman Filter
State space model: αt+1 = Ttαt +Rtζt, yt = Ztαt + εt.
• We have Yt = {Yt−1, yt} = {Yt−1, vt} and E(vtyt−j) = 0 forj = 1, . . . , t− 1;
• Lemma E(x|y, z) = E(x|y) + ΣxzΣ−1zz z, and take x = αt+1,
y = Yt−1 and z = vt = Zt(αt − at) + εt;
• It follows that E(αt+1|Yt−1) = Ttat;
• Furter, E(αt+1v′
t) = Tt E(αtv′
t) +Rt E(ζtv′
t) = TtPtZ′
t;• We carry out lemma and obtain the state update
at+1 = E(αt+1|Yt−1, yt)
= Ttat + TtPtZ′
tF−1t vt
= Ttat +Ktvt;
with Kt = TtPtZ′
tF−1t
Introduction to State Space Methods – p. 14
Kalman Filter
Our best prediction of yt is Ztat. When the actual observation arrives,calculate the prediction error vt = yt − Ztat and its varianceFt = ZtPtZ
′
t +Gt. The new best estimates of the state mean is basedon both the old estimate at and the new information vt:
at+1 = Ttat +Ktvt,
similarly for the variance:
Pt+1 = TtPtT′
t +RtQtR′
t −KtFtK′
t.
The Kalman gain
Kt = TtPtZ′
tF−1t
is the optimal weighting matrix for the new evidence.
Introduction to State Space Methods – p. 15
Kalman Filter Illustration
1880 1900 1920 1940 1960
500
750
1000
1250
observation filtered level a_t
1880 1900 1920 1940 1960
6000
7000
8000
9000
10000state variance P_t
1880 1900 1920 1940 1960
−250
0
250
prediction error v_t
1880 1900 1920 1940 1960
21000
22000
23000
24000
25000prediction error variance F_t
Introduction to State Space Methods – p. 16
Smoothing
• The filter calculates the mean and variance conditional on Yt;• The Kalman smoother calculates the mean and variance
conditional on the full set of observations Yn;• After the filtered estimates are calculated, the smoothing
recursion starts at the last observations and runs until the first.
α̂t = E(αt|Yn), Vt = var(αt|Yt),
rt = weighted sum of innovations, Nt = var(rt),
Lt = Tt −KtZt.
Starting with rn = 0, Nn = 0, the smoothing recursions are given by
rt−1 = F−1t vt + Ltrt, Nt−1 = F−1
t + L2tNt,
α̂t = at + Ptrt−1, Vt = Pt − P 2t Nt−1.
Introduction to State Space Methods – p. 17
Smoothing Illustration
1880 1900 1920 1940 1960
500
750
1000
1250
observations smoothed state
1880 1900 1920 1940 1960
2500
3000
3500
4000 V_t
1880 1900 1920 1940 1960
−0.02
0.00
0.02 r_t
1880 1900 1920 1940 1960
0.000025
0.000050
0.000075
0.000100 N_t
Introduction to State Space Methods – p. 18
Filtering and Smoothing
1870 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970
500
600
700
800
900
1000
1100
1200
1300
1400observation smoothed level
filtered level
Introduction to State Space Methods – p. 19
Missing Observations
Missing observations are very easy to handle in Kalman filtering:
• suppose yj is missing
• put vj = 0, Kj = 0 and Fj = ∞ in the algorithm
• proceed further calculations as normal
The filter algorithm extrapolates according to the state equation until anew observation arrives. The smoother interpolates betweenobservations.
Introduction to State Space Methods – p. 20
Missing Observations
1870 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970
500
750
1000
1250observation a_t
1870 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970
10000
20000
30000
P_t
Introduction to State Space Methods – p. 21
Missing Observations, Filter and Smoohter
1880 1900 1920 1940 1960
500
750
1000
1250
filtered state
1880 1900 1920 1940 1960
10000
20000
30000
P_t
1880 1900 1920 1940 1960
500
750
1000
1250
smoothed state
1880 1900 1920 1940 1960
2500
5000
7500
10000V_t
Introduction to State Space Methods – p. 22
Forecasting
Forecasting requires no extra theory: just treat future observations asmissing:
• put vj = 0, Kj = 0 and Fj = ∞ for j = n+ 1, . . . , n+ k
• proceed further calculations as normal• forecast for yj is Zjaj
Introduction to State Space Methods – p. 23
Forecasting
1870 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000
500
750
1000
1250observation a_t
1870 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000
10000
20000
30000
40000
50000P_t
Introduction to State Space Methods – p. 24
Parameter Estimation
The system matrices in a state space model typically depends on aparameter vector ψ. The model is completely Gaussian; we estimateby Maximum Likelihood.The loglikelihood af a time series is
logL =n
∑
t=1
log p(yt|Yt−1).
In the state space model, p(yt|Yt−1) is a Gaussian density with meanat and variance Ft:
logL = −n
2log 2π −
1
2
n∑
t=1
(
logFt + F−1t v2
t
)
,
with vt and Ft from the Kalman filter. This is called the prediction errordecomposition of the likelihood. Estimation proceeds by numericallymaximising logL.
Introduction to State Space Methods – p. 25
ML Estimate of Nile Data
1870 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970
500
600
700
800
900
1000
1100
1200
1300
1400observation level (q = 1000)
level (q = 0) level (q = 0.0973 = ML estimate)
Introduction to State Space Methods – p. 26
Diagnostics
• Null hypothesis: standardised residuals
vt/Ft ∼ NID(0, 1)
• Apply standard test for Normality, heteroskedasticity, serialcorrelation;
• A recursive algorithm is available to calculate smootheddisturbances (auxilliary residuals), which can be used to detectbreaks and outliers;
• Model comparison and parameter restrictions: use likelihoodbased procedures (LR test, AIC, BIC).
Introduction to State Space Methods – p. 27
Nile Data Residuals Diagnostics
1880 1900 1920 1940 1960
−2
0
2Residual Nile
0 5 10 15 20
−0.5
0.0
0.5
1.0 CorrelogramResidual Nile
−4 −3 −2 −1 0 1 2 3 4
0.1
0.2
0.3
0.4
0.5N(s=0.996)
−2 −1 0 1 2
−2
0
2
QQ plotnormal
Introduction to State Space Methods – p. 28