Time Series Analysis - Saarland UniversityTime Series Analysis (SS 2019) Lecture 3 Slide 1 Linear...

24
Examples For an independent white noise (ε t ) t Z , we consider a stochastic process X satisfying for all t Z the recursion X t = φX t -1 + ε t for some φ R. Then we have in case of |φ| < 1: 1 X t = ε t + φε t -1 + φ 2 ε t -2 + ..., 2 σ(X t -1 , X t -2 ,...)= σ(ε t -1 t -2 ,...), 3 E (X t |X t -1 , X t -2 ,...)= E (X t |X t -1 )= φX t -1 , 4 E (X t - E (X t |X t -1 , X t -2 ,...)) 2 = E (X t - E (X t |X t -1 )) 2 = σ 2 ε . For |φ| > 1, we have: 1 X t = - ε t +1 φ - ε t +2 φ 2 - ..., 2 σ(X t -1 , X t -2 ,...)= σ(X t -1 t -1 t -2 ,...), 3 E (X t |X t -1 , X t -2 ,...)= E (X t |X t -1 ), 4 If ε is Gaussian white noise, we have E (X t |X t -1 )= X t -1 . Time Series Analysis (SS 2019) Lecture 3 Slide 1

Transcript of Time Series Analysis - Saarland UniversityTime Series Analysis (SS 2019) Lecture 3 Slide 1 Linear...

Page 1: Time Series Analysis - Saarland UniversityTime Series Analysis (SS 2019) Lecture 3 Slide 1 Linear forecasting of stationary time series starting point: weakly stationary time series

Examples

For an independent white noise (εt)t∈Z, we consider a stochasticprocess X satisfying for all t ∈ Z the recursion Xt = φXt−1 + εtfor some φ ∈ R. Then we have in case of |φ| < 1:

1 Xt = εt + φεt−1 + φ2εt−2 + . . .,2 σ(Xt−1,Xt−2, . . .) = σ(εt−1, εt−2, . . .),3 E (Xt |Xt−1,Xt−2, . . .) = E (Xt |Xt−1) = φXt−1,4 E (Xt − E (Xt |Xt−1,Xt−2, . . .))2 = E (Xt − E (Xt |Xt−1))2 = σ2

ε .

For |φ| > 1, we have:1 Xt = − εt+1

φ −εt+2

φ2 − . . .,2 σ(Xt−1,Xt−2, . . .) = σ(Xt−1, εt−1, εt−2, . . .),3 E (Xt |Xt−1,Xt−2, . . .) = E (Xt |Xt−1),4 If ε is Gaussian white noise, we have E (Xt |Xt−1) = �Xt−1.

Time Series Analysis (SS 2019) Lecture 3 Slide 1

Page 2: Time Series Analysis - Saarland UniversityTime Series Analysis (SS 2019) Lecture 3 Slide 1 Linear forecasting of stationary time series starting point: weakly stationary time series

Linear forecasting of stationary time series

starting point: weakly stationary time series X with known meanµ := E (Xt) and autocovariance function γX ; we are looking for an’optimal’ linear combination

Xn+h := a0 + a1Xn + · · ·+ anX1,

to forecast Xn+h (h ∈ N) when given X1, . . . ,Xn, where ’optimal’stands for minimizing the mean squared forecast error

E (Xn+h − Xn+h)2.

Time Series Analysis (SS 2019) Lecture 3 Slide 2

Page 3: Time Series Analysis - Saarland UniversityTime Series Analysis (SS 2019) Lecture 3 Slide 1 Linear forecasting of stationary time series starting point: weakly stationary time series

More general problem: for a random vector W = (Wn, . . . ,W1)′ withcovariance matrix Γ and a random variable Y with finite variance, wewant to find a linear combination

Y := a0 + a1Wn + · · ·+ anW1

with minimal mean squared error E (Y − Y )2.

TheoremIn the above situation, we have:

1 Y = E (Y ) + a′(W − E (W )), with a any solution ofΓa = Cov(W ,Y ) (such a always exists).

2 E (Y − Y )2 = Var(Y − Y ) = Var(Y )− a′ Cov(W ,Y ) =Var(Y )− a′Γa.

3 Cov(W , Y − Y ) = 0.

Time Series Analysis (SS 2019) Lecture 3 Slide 3

Page 4: Time Series Analysis - Saarland UniversityTime Series Analysis (SS 2019) Lecture 3 Slide 1 Linear forecasting of stationary time series starting point: weakly stationary time series

Definition (Linear prediction)

Given a random vector W = (Wn, . . . ,W1)′ with covariance matrix Γ,a random variable Y with finite variance, and a any solution ofΓa = Cov(W ,Y ), we call Y := E (Y ) + a′(W − E (W )) linearprediction of Y given W , in symbols: P(Y |W ).

TheoremThe linear prediction has the following properties:

1 E (Y − P(Y |W )) = 0,

2 P(α1Y1 + α2Y2|W ) = α1P(Y1|W ) + α2P(Y2|W ) for allα1, α2 ∈ R,

3 P(n∑

i=1

αiWi + β|W ) =n∑

i=1

αiWi + β for all α1, . . . , αn, β ∈ R,

4 P(Y |W ) = E (Y ), when Cov(W ,Y ) = 0.

Time Series Analysis (SS 2019) Lecture 3 Slide 4

Page 5: Time Series Analysis - Saarland UniversityTime Series Analysis (SS 2019) Lecture 3 Slide 1 Linear forecasting of stationary time series starting point: weakly stationary time series

RemarksObviously, P(Y |W ) is σ(W1, . . . ,Wn)-measurable, which implies

1 E (Y − E (Y |W1, . . . ,Wn))2 ≤ E (Y − P(Y |W ))2

2 If E (Y |W1, . . . ,Wn) is linear in W1, . . . ,Wn, we haveP(Y |W ) = E (Y |W1, . . . ,Wn).

If W is a (univariate) random variable with positive variance, wehave

P(Y |W ) = β0 + β1W

and

E (Y − P(Y |W ))2 = Var(Y )− Cov(W ,Y )2

Var(W )

with

β1 =Cov(W ,Y )

Var(W )and β0 = E (Y )− β1E (W ) .

Time Series Analysis (SS 2019) Lecture 3 Slide 5

Page 6: Time Series Analysis - Saarland UniversityTime Series Analysis (SS 2019) Lecture 3 Slide 1 Linear forecasting of stationary time series starting point: weakly stationary time series

Examples

For a white noise (εt)t∈Z, we investigate the stationary processX θt = εt + θεt−1 for |θ| < 1.

1 P(Xt |Xt−1) = θ1+θ2Xt−1,

2 E (Xt − P(Xt |Xt−1))2 = σ2ε(1 + θ4

1+θ2 ),

3 P(Xt |Xt−1,Xt−2) = θ(1+θ2)1+θ2+θ4Xt−1 − θ2

1+θ2+θ4Xt−2,

4 E (Xt − P(Xt |Xt−1,Xt−2))2 = σ2ε(1 + θ6

1+θ2+θ4 ),5 P(Xt+h|Xt ,Xt−1) = 0 for all h ≥ 2.

For a random walk (Xt)t∈N0 with drift α0 and initial value x ∈ Rrelated to an independent white noise ε, we have for t ≥ 1:

1 P(Xt |Xt−1) = Xt−1 + α0,2 E (Xt − P(Xt |Xt−1))2 = σ2

ε .

Time Series Analysis (SS 2019) Lecture 3 Slide 6

Page 7: Time Series Analysis - Saarland UniversityTime Series Analysis (SS 2019) Lecture 3 Slide 1 Linear forecasting of stationary time series starting point: weakly stationary time series

Linear forecasting of time series

We apply the linear prediction operator P to calculate the linearforecast of Xn+h given X1, . . . ,Xn, when mean µ := E (Xt) andautocovariance function γX are known:

P(Xn+h|Xn, . . . ,X1) = µ +n∑

i=1

ai(Xn+1−i − µ), with a = (a1, . . . , an)′

any solution of Γna = γn(h), with

Γn = [γX (|i − j |)]ni ,j=1 =

γX (0) γX (1) . . . γX (n − 1)

γX (1) γX (0) . . . γX (n − 2)

.... . . . . .

...

γX (n − 1) γX (n − 2) . . . γX (0)

and γn(h) = (γX (h), γX (h + 1), . . . , γX (h + n − 1))′.

Time Series Analysis (SS 2019) Lecture 3 Slide 7

Page 8: Time Series Analysis - Saarland UniversityTime Series Analysis (SS 2019) Lecture 3 Slide 1 Linear forecasting of stationary time series starting point: weakly stationary time series

Remarks I

The equations comprising the system Γna = γn(1) are calledYule-Walker equations.

Dividing the Yule-Walker equations by γX (0), we arrive at asystem of equations making use only of autocorrelations ρX (h)with h = 1, . . . , n: Rna = ρn, with

Rn =

1 ρX (1) . . . . . . ρX (n − 1)

ρX (1) 1 ρX (1) . . . ρX (n − 2)

.... . . . . . . . .

...

ρX (n − 1) ρX (n − 2) . . . . . . 1

and ρn = (ρX (1), ρX (2), . . . , ρX (n))′.

Time Series Analysis (SS 2019) Lecture 3 Slide 8

Page 9: Time Series Analysis - Saarland UniversityTime Series Analysis (SS 2019) Lecture 3 Slide 1 Linear forecasting of stationary time series starting point: weakly stationary time series

Remarks II

γX (0) > 0 and asymptotic uncorrelatedness are sufficientconditions for Γn to be non-singular for all n ∈ N, and thereforealso for the existence of a unique solution to the Yule-Walkerequations.

In the following, for x = (x1, . . . , xn)′, x (r) denotes the reversedvector x (r) := (xn, . . . , x1)′.

Due to the special structure of Rn (Rn is a symmetricToeplitz-matrix), we have the following lemma:

Rnx(r) = (Rnx)(r) for all x ∈ Rn

Time Series Analysis (SS 2019) Lecture 3 Slide 9

Page 10: Time Series Analysis - Saarland UniversityTime Series Analysis (SS 2019) Lecture 3 Slide 1 Linear forecasting of stationary time series starting point: weakly stationary time series

Levinson-Durbin recursion

Theorem (Levinson-Durbin recursion)

For a weakly stationary process X with acf ρX , we denotevn := E (Xn+1 − P(Xn+1|X1, . . . ,Xn))2 and vn := vn

γX (0). We then

have, if vn > 0:

1 If an solves the equation Rnan = ρn, thenan+1 := (an+1,1, . . . , an+1,n+1)′ with

an+1,n+1 := γX (n+1)−a′nγn(1)(r)

vn= ρX (n+1)−a′nρ

(r)n

vnand

(an+1,1, . . . , an+1,n)′ = an − an+1,n+1a(r)n solves the equation

Rn+1an+1 = ρn+1.

2 vn+1 = vn(1− a2n+1,n+1) and vn+1 = vn(1− a2

n+1,n+1).

Time Series Analysis (SS 2019) Lecture 3 Slide 10

Page 11: Time Series Analysis - Saarland UniversityTime Series Analysis (SS 2019) Lecture 3 Slide 1 Linear forecasting of stationary time series starting point: weakly stationary time series

Remarks

1 The condition vn > 0 appearing in the preceding theorem isequivalent to Γn+1 and Rn+1 being non-singular.

2 Therefore, under the condition stated in the theorem, an+1 is theunique solution to the Yule-Walker equations used to computeP(Xn+2|X1, . . . ,Xn+1).

Examples

1 For the weakly stationary process X θt = εt + 1

2εt−1 with a white

noise (εt)t∈Z, we have:a) P(X4|X3,X2,X1) = 42

85X3 − 417X2 + 8

85X1,

b) E (X4 − P(X4|X3,X2,X1))2 = 341340σ

2ε .

Time Series Analysis (SS 2019) Lecture 3 Slide 11

Page 12: Time Series Analysis - Saarland UniversityTime Series Analysis (SS 2019) Lecture 3 Slide 1 Linear forecasting of stationary time series starting point: weakly stationary time series

2 For the weakly stationary process X θt = A cos(θt) + B sin(θt),

with uncorrelated random variables A and B with vanishingmean and unit variance, we have:

a) P(X2|X1) = cos(θ)X1,b) E (X2 − P(X2|X1))2 = sin2(θ),c) P(X3|X2,X1) = 2 cos(θ)X2 − X1,d) E (X3 − P(X3|X2,X1))2 = 0.

3 For a white noise ε and a random variable Y with mean µY andpositive variance σ2

Y , uncorrelated with all εt , we define a weaklystationary process Xt by Xt := Y + εt . Then we have for alln ∈ N:

a) P(Xn+1|Xn, . . . ,X1) = σ2ε

nσ2Y +σ2

ε· µY +

σ2Y

σ2Y +

σ2εn

· Xn+...+X1n ,

b) E (Xn+1 − P(Xn+1|Xn, . . . ,X1))2 = σ2ε(1 +

σ2Y

σ2ε+nσ2

Y).

Time Series Analysis (SS 2019) Lecture 3 Slide 12

Page 13: Time Series Analysis - Saarland UniversityTime Series Analysis (SS 2019) Lecture 3 Slide 1 Linear forecasting of stationary time series starting point: weakly stationary time series

Partial correlation

For 3 random variables Y1, Y2 and Y3, it might happen that Y3 ishighly correlated both with Y1 and Y2. Therefore it is possible thatY1 and Y2 might be highly correlated, but that this correlation stemsmostly from the impact of Y3 on Y1 and Y2. To measure this, partialcorrelation is used.

Definition (Partial correlation)

We take as given two univariate random variables Y1, Y2 and a(possibly multivariate) random variable Y3. We then call the

correlation of Y1 := Y1 − P(Y1|Y3) and Y2 := Y2 − P(Y2|Y3) thepartial correlation of Y1 and Y2 given Y3, in symbols:Corr(Y1,Y2|Y3) := Corr(Y1 − P(Y1|Y3),Y2 − P(Y2|Y3)).

Time Series Analysis (SS 2019) Lecture 3 Slide 13

Page 14: Time Series Analysis - Saarland UniversityTime Series Analysis (SS 2019) Lecture 3 Slide 1 Linear forecasting of stationary time series starting point: weakly stationary time series

Examples

For a white noise (εt)t∈Z, we consider the stochastic process Xwith Xt = φXt−1 + εt for all t ∈ Z and some φ ∈ R with|φ| < 1. We then have

Corr(X0,X2|X1) = 0.

For the weakly stationary process X θt = εt + θεt−1 with an

independent white noise (εt)t∈Z and |θ| < 1, we have

Corr(X0,X2|X1) = − θ2

1 + θ2 + θ4.

Time Series Analysis (SS 2019) Lecture 3 Slide 14

Page 15: Time Series Analysis - Saarland UniversityTime Series Analysis (SS 2019) Lecture 3 Slide 1 Linear forecasting of stationary time series starting point: weakly stationary time series

Partial autocorrelation function (PACF)

Definition (Partial autocorrelation function, PACF)

For a weakly stationary process (Xt)t∈Z and h ∈ N \ {1}, the partialcorrelation of X0 and Xh given X1, . . . ,Xh−1 is called partialautocorrelation αX (h) of lag h. For h = 1, one definesαX (1) = ρX (1).The function αX : N→ R, N 3 h 7→ αX (h) is called partialautocorrelation function (PACF) of X .

TheoremFor a weakly stationary process X with PACF αX ,

αX (n + 1) = an+1,n+1,

with an+1,n+1 denoting the coefficients obtained from theLevinson-Durbin recursion.

Time Series Analysis (SS 2019) Lecture 3 Slide 15

Page 16: Time Series Analysis - Saarland UniversityTime Series Analysis (SS 2019) Lecture 3 Slide 1 Linear forecasting of stationary time series starting point: weakly stationary time series

Definition (h-step-forecast, forecast error)1 For a weakly stationary process X = (Xt)t∈Z, we call

a) P(Xt+h|Xt ,Xt−1 . . . ,Xt+1−n) for h ∈ N and n ∈ N h-stepforecast of order n,

b) ∆n(h) := E (Xt+h − P(Xt+h|Xt ,Xt−1, . . . ,Xt+1−n))2 meansquared (forecast) error (mse) of the h-step forecast of order n,

c) P(Xt+h|Xt ,Xt−1, . . .) := limn→∞

P(Xt+h|Xt ,Xt−1, . . . ,Xt+1−n)

h-step forecast (based on an infinite past),d) ∆(h) := lim

n→∞∆n(h) = E (Xt+h − P(Xt+h|Xt ,Xt−1, . . .))2 mse

of the h-step forecast.

2 If ∆(1) = 0, the process X is called deterministic, singular orexactly linearly predictable.

3 If limh→∞

∆(h) = Var(X0), we call the process X purely

non-deterministic.

Time Series Analysis (SS 2019) Lecture 3 Slide 16

Page 17: Time Series Analysis - Saarland UniversityTime Series Analysis (SS 2019) Lecture 3 Slide 1 Linear forecasting of stationary time series starting point: weakly stationary time series

Remarks I

We always have ∆(h) ≤ ∆n+1(h) ≤ ∆n(h) for all n, h ∈ N.

The limit appearing in the definition of the h-step forecast,P(Xt+h|Xt ,Xt−1, . . .) := lim

n→∞P(Xt+h|Xt ,Xt−1, . . . ,Xt+1−n), is

to be understood as mse-convergence, i.e.

limn→∞

E (P(Xt+h|Xt ,Xt−1, . . .)−P(Xt+h|Xt ,Xt−1, . . . ,Xt+1−n))2 = 0.

A weakly stationary process is deterministic if and only if∆(h) = 0 for all h ∈ N.

Time Series Analysis (SS 2019) Lecture 3 Slide 17

Page 18: Time Series Analysis - Saarland UniversityTime Series Analysis (SS 2019) Lecture 3 Slide 1 Linear forecasting of stationary time series starting point: weakly stationary time series

Remarks II

As Xt+h can always be forecast by E (Xt+h) with an mse ofVar(Xt+h) = Var(X0), we have ∆n(h) ≤ Var(X0) and∆(h) ≤ Var(X0). The defining condition for purelynon-deterministic processes, lim

h→∞∆(h) = Var(X0), therefore

corresponds to the intuition that the process’ past does notsignificantly contribute to the forecasting of the process’ valuesfar in the future.

Time Series Analysis (SS 2019) Lecture 3 Slide 18

Page 19: Time Series Analysis - Saarland UniversityTime Series Analysis (SS 2019) Lecture 3 Slide 1 Linear forecasting of stationary time series starting point: weakly stationary time series

Examples

1 The weakly stationary process X θt = A cos(θt) + B sin(θt) with

uncorrelated random variables A and B with zero mean and unitvariance is deterministic.

2 The MA(1) process Xt := εt + θεt−1 is purely non-deterministic,as P(Xt+h|Xt ,Xt−1, . . . ,Xt+1−n) = 0 for all h ≥ 2 and n ∈ N.

3 The AR(1) process Xt = φXt−1 + εt with |φ| 6= 1 is purelynon-deterministic.

4 The process Xt := Y + εt with a white noise ε and a randomvariable Y with positive variance and uncorrelated to all εt , isneither deterministic nor purely non-deterministic.

Time Series Analysis (SS 2019) Lecture 3 Slide 19

Page 20: Time Series Analysis - Saarland UniversityTime Series Analysis (SS 2019) Lecture 3 Slide 1 Linear forecasting of stationary time series starting point: weakly stationary time series

Theorem (Wold decomposition)

If X = (Xt)t∈Z is weakly stationary but not deterministic, thefollowing holds true:

1 Zt := Xt − P(Xt |Xt−1,Xt−2, . . .) is a white noise with variance∆(1) and Zt = P(Zt |Xt ,Xt−1, . . .) for all t ∈ Z.

2 For ψj :=E(XtZt−j )

∆(1)(j ∈ N0), we have

∞∑j=0

ψ2j <∞. The process

Ut :=∞∑j=0

ψjZt−j is weakly stationary and purely

non-deterministic.

3 Vt := Xt − Ut = Xt −∞∑j=0

ψjZt−j is weakly stationary and

deterministic. We have Vt = P(Vt |Xs ,Xs−1, . . .) for all s, t ∈ Z.

4 U and V are uncorrelated, i.e. Cov(Us ,Vt) = 0 for all s, t ∈ Z.

Time Series Analysis (SS 2019) Lecture 3 Slide 20

Page 21: Time Series Analysis - Saarland UniversityTime Series Analysis (SS 2019) Lecture 3 Slide 1 Linear forecasting of stationary time series starting point: weakly stationary time series

Remarks1 The Wold decomposition Xt = Ut + Vt with uncorrelated purely

non-deterministic U and deterministic V is unique.2 The purely non-deterministic component U has a

MA(∞)-representation as a weighted average of the white noise

Z : Ut =∞∑j=0

ψjZt−j .

3 Zt = P(Zt |Xt ,Xt−1, . . .) means that Zt is known when the(infinite) past of X up to time t is known.

4 Vt = P(Vt |Xs ,Xs−1, . . .) means that Vt is known as soon as the(infinite) past of X up to some point s (possibly preceding t!) isknown.

5 Because ofa) Cov(Zt ,Xt−j) = 0 for all t ∈ Z, j ∈ N, andb) Cov(Zt ,P(Xt |Xt−1,Xt−2, . . .)) = 0 for all t ∈ Z,

Zt is called innovation.

Time Series Analysis (SS 2019) Lecture 3 Slide 21

Page 22: Time Series Analysis - Saarland UniversityTime Series Analysis (SS 2019) Lecture 3 Slide 1 Linear forecasting of stationary time series starting point: weakly stationary time series

Examples

1 For a white noise ε and a random variable Y with mean µY andpositive variance σ2

Y , uncorrelated with all εt , we define a weaklystationary process Xt by Xt := Y + εt . Then we have:

1 P(Xt |Xt−1,Xt−2, . . .) = Y , Zt = εt ,

2 ψj =

{1 , j = 0

0 , j > 0, Ut = εt , Vt = Y .

2 For the AR(1)-process Xt = φXt−1 + εt with |φ| > 1 and a whitenoise ε, we have:

1 P(Xt |Xt−1,Xt−2, . . .) = 1φXt−1,

2 Zt = Xt − 1φXt−1, ∆(1) = σ2

ε

φ2 ,

3 ψj = 1φj

, Ut = Xt , Vt = 0.

Time Series Analysis (SS 2019) Lecture 3 Slide 22

Page 23: Time Series Analysis - Saarland UniversityTime Series Analysis (SS 2019) Lecture 3 Slide 1 Linear forecasting of stationary time series starting point: weakly stationary time series

Definition (Causality, Invertibility)1 We call a weakly stationary process X causal w.r.t. a white noise

ε if there exist real numbers (ψj)j∈N0 with∞∑j=0

|ψj | <∞ and

Xt =∞∑j=0

ψjεt−j for all t ∈ Z.

2 We call a weakly stationary process X invertible w.r.t. a white

noise ε if there exist real numbers (ψj)j∈N0 with∞∑j=0

|ψj | <∞

and εt =∞∑j=0

ψjXt−j for all t ∈ Z.

Time Series Analysis (SS 2019) Lecture 3 Slide 23

Page 24: Time Series Analysis - Saarland UniversityTime Series Analysis (SS 2019) Lecture 3 Slide 1 Linear forecasting of stationary time series starting point: weakly stationary time series

Examples

1 For the AR(1)-process Xt = φXt−1 + εt with |φ| 6= 1 and a whitenoise ε, we have:

a) X is invertible w.r.t. ε.b) If |φ| < 1, X is causal w.r.t. ε.c) If |φ| > 1, X is not causal w.r.t. ε. In this case, X is causal

w.r.t. the white noise Zt := Xt − 1φXt−1.

2 For the MA(1)-Prozess Xt := εt + θεt−1 with θ ∈ R and a whitenoise ε, we have:

a) X is causal w.r.t. ε.b) If |θ| < 1, X is invertible w.r.t. ε.c) If |θ| > 1, X is not invertible w.r.t. ε. In this case, X is

invertible w.r.t. the white noiseZt := Xt − 1

θXt−1 + 1θ2Xt−2 ± . . ..

Time Series Analysis (SS 2019) Lecture 3 Slide 24