Y =+− −−μ εθε θε - huji.ac.ilpluto.huji.ac.il/~msby/time-files/ARMA_2010.pdf · 3...

Moving Average (MA) Models

General representation of MA models

1 1 ...t t t q t qY μ ε θ ε θ ε− −= + − − − ~ ( )tY MA q↔

• Can be viewed as a (finite) approximation for the infinite sum of random innovations (Wold Theorem, see later).

• The use of negative signs before the coefficients is meaningless. A positive coefficient for t kε − requires 0kθ < , k=1,2,…

Properties

μ=)( tYE ; ∑+= =qj jtYVar 122 )1()( σθ

21 1( ... )k k k q k q for k qγ θ θ θ θ θ σ+ −= − + + + ≤

qkfork >= 0γ ⇒ Cut-off after lag q.

Example: In a rotating panel survey, if units are in the sample for q+1 successive time points (like in Labour Force Surveys), the model holding for the sample estimates is MA(q).

• MA models often obtained after transforming

the original series. For example,

tt btaY ε++= 11 −− −+=−⇒ tttt bYY εε ~ MA(1).

2

Example and alternative representation

Example: MA(1) model, 1t t tY μ ε θε −− = − 2 2 2

1( ) (1 ) , ( , )t t tVar Y Cov Y Yθ σ θσ−= + = −

1 2( , ) 1t tCorr Y Y θ

θ−−

⇒ =+

and 11| ( , ) |2t t

Corr Y Y − ≤ .

• Important for model identification. When 0θ < , the series is ‘smooth’. When 0θ > , the series is ‘jumpy’.

For general MA(q) process;

{1

cos[ /( 1)] if divides (q+1)cos[ /( 2)] otherwise kk q

N kMax Nπρ π≤ ≤

+=+

N is the largest integer not exceeding ( 1) /q k+ .

Back shift operator: 1−= tt XBX 2

2( ) ( ) , , ( )k

t t t t t kB X B BX X B X X− −⇒ = = ⋅⋅⋅ = .

Alternative model representation 2

1 2(1 ... ) ( )q

t q t tY B B B Bμ θ θ θ ε ε− = − − − − = Θ

A polynomial in B of order q.

• MA processes are always stationary.

3

Invertibility

Question: Suppose that the series of interest follows an MA(q) process. Is the model identifiable in the sense that the model coefficients are determined uniquely by the autocorrelations of the process? Answer: NO.

Example: the MA(1) processes,

15.0 −+=− tttY εεμ and 12 −+=− tttY εεμ , both have 4.01 =ρ

Let )( μ−= tt YX . Then for any MA(1) process,

1 1 2( )t t t t t tX Xε θε ε θ θε− − −= − = − +

21 2t t tXε θ θ ε− −= − − =,…, 1

it t ii

Xε θ∞ −== −∑

• For | | 1θ > we assign more and more weight to

remote observations, makes little sense!

• In this example we will choose 0.5θ = − .

4

Invertibility (cont.)

Definition: An MA process is invertible if it can

be written as an infinite Autoregressive process,

1 1 2 2 ...t t t tY A Y A Yε − −= + + +% % % with coefficients jA

that converge to zero, (1| |jj A

∞

=< ∞∑ ). ( t tY Y μ= −% ).

Theorem: The MA(q) process is invertible if and only if the roots of the polynomial equation

0)( =Θ B are larger than 1 in absolute value, 1|| >kB . (Outside the unit circle when the root kB

is a complex number).

Example: For the MA(1) process (1 )t tY Bθ ε= −% ,

( ) 0 1 0 (1/ )B B Bθ θΘ = − = =⇔ ⇔ ;

| | 1 | | 1B θ>

5

Invertibility, general case

( )t tY B ε= Θ% and we ask whether we can divide

[1/ ( )]t tB Yε = Θ % with convergent coefficients.

Write, )1(...)1()( 1 BHBHB q−××−=Θ where

)/1( iH are the roots of 0)( =Θ B . If 1| | 1

iH> for

all i 1||

6

Autoregressive (AR) Models

General Representation of AR Models

1 1 2 2 ...t t t p t p tX X X Xφ φ φ ε− − −= + + + + ; ( )t tX Y μ= −

• Looks like an ordinary regression model but the

explanatory variables are past observations and

hence are dependent with each other, and with

the residuals of other equations. Nonetheless, if

the series is stationary (see next slide), the usual

Least Squares estimators are consistent for the φ -

coefficients and are best asymptotic normal

(BAN). Also, the usual mean residual sum of

squares is a consistent estimator of )(2 tVar εσ = .

Polynomial representation

ttYB ε=Φ )( ; p

pBBB φφ −−−=Φ ...1)( 1 → ~ ( )tY AR p

• Autoregressive models are always invertible.

7

Condition for stationarity

An AR(P) process is stationary if and only if the roots of the polynomial equation 0)( =Φ B are larger than 1 in absolute value, 1|| >kB , (the root

kB is outside the unit circle when kB is complex).

Example: For an AR(1) process ttXB εφ =− )1( ,

( ) 0 1 0 (1/ )B B Bφ φΦ = − = =⇔ ⇔ ;

| | 1 | | 1B φ> for all i ,

1||

8

Condition for stationarity (cont.)

• When at least one of the roots of 0)( =Φ B is

inside the unit circle, the series “explodes”.

Example: ttt XX ε+= −13 ; (variance grows geometrically). • When d roots equal 1 and the rest are outside the

unit circle, we may write dBBB )1)(()( 1 −Φ=Φ ,

where the roots of 0)(1 =Φ B are all outside the

unit circle. Hence,

1 1( ) ( )(1 ) ( )d

t t tB X B B X B UΦ =Φ − = Φ and

(1 )dt tU B X= − is stationary.

• This result shows why very often stationarity is achieved by differencing.

Example: Random walk, ttt XX ε+= −1 and

ttXB ε=− )1( ⇒white noise and hence stationary.

9

Variance and correlations of AR(P) processes

Let t tX Y μ= − and suppose,

1 1 2 2 ...t t t p t p tX X X Xφ φ φ ε− − −= + + + + ⇓

21 1 2 2 ...t t t t t p t p t t tX X X X X X X Xφ φ φ ε− − −= + + + + .

Taking expectations on both sides and noting that

0)()( =−= μtt YEXE yields, 2

22110 ... εσγφγφγφγ ++++= pp ; [ )()( 2ttt EXE εε = ]

⇒= kk ργγ 0 2

22110 )...1( εσρφρφρφγ =−−−− pp ,

or )...1/( 22112

0 ppρφρφρφσγ ε −−−−= .

...2211 ++= −−−−− kttkttktt XXXXXX φφ

kttktptp XXX −−− ++ εφ . Taking expectations,

pkpkkk −−− +++= γφγφγφγ ...2211

⇓ Yule-Walker (Y-W): pkpkkk −−− +++= ρφρφρφρ ...2211

0)( =Φ kB ρ ( kk ρρ =− )

10

Examples

For the AR(1) process ttt XX εφ += −1 ,

φφρρ == 01 ,…, k

kk φφρρ === − ...1

The Autocorrelations decay to zero geometrically.

For AR(2), tttt XXX εφφ ++= −− 2211 ,

12112011 ρφφρφρφρ +=+= −

21102112 φρφρφρφρ +=+=

⇓ 2

1 1 2 2 1 2 2/(1 ) [ /(1 )]ρ φ φ ρ φ φ φ= − = − +; .

Once we know 1ρ and 2ρ the other correlations are obtained from the Y-W equations,

3 1 2 2 1ρ φ ρ φ ρ= + , 4 1 3 2 2ρ φ ρ φ ρ= + , …

11

General solution of Y-W equations

For the general AR(p) process, if all the roots are distinct, ∑= =

pr

krrk GA1ρ ; }...1),/1{( prGr = are the

solutions of 0)( =Φ B and the rA s are constants.

Importance of general solution

For a stationary process |1/ | 1rG > for all r and

hence | | 1rG < and 0kρ → as k → ∞ .

Necessary condition for stationarity. (In practice we plot the correlogram of the estimated correlations and check decay to zero.)

Some of the roots }...1),/1{( prGr = can be complex, in which case the decay is sinusoidal.

The coefficients kA can be computed by solving the p equations,

1, 1...p kk r rr A G k pρ == =∑

with the rA ’s as the unknowns. The first equation is,

11p ir A= =∑ .

Example:

1 20.24t t t tY Y Y ε− −= − + ⇒ 1 11 21 1,

0.4 0.6G G− −= = ;

1 1 2/(1 ) 0.8065ρ φ φ= − = ⇒ 1 22.03, 1.03A A= = − .

12

Estimation of AR(p) coefficients by use of autocorrelation estimates

Y-W equations :

pp φρφρφρ 12111 ... −+++=

pp φρφφρρ 22112 ... −+++= Note: k kρ ρ−= . . .

pppp φφρφρρ +++= −− ...2211

In matrix form,

⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢

⎣

⎡

=

⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢

⎣

⎡

⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢

⎣

⎡

−−

−

−

pppp

p

p

ρ

ρρ

φ

φφ

ρρ

ρρρρ

MMMMMM2

1

2

1

21

21

11

1...

...1

...1

ρΩ×Φ =

% % 1ρ−⇒ Φ =Ω

% % and 1ˆˆ ρ̂−Φ = Ω

% %

Question: Why is Ω invertible?

Example:

AR(2); 1 1 2 11 1 2 1

φ ρ φ ρφ ρ φ ρ+ =+ =

21 2 2 1

1 22 21 1

(1 ) ;1 1

ρ ρ ρ ρφ φρ ρ− −

= =− −

⇒

13

Partial autocorrelations

An MA model can be identified by checking for a

cut-off in the Correlogram. [A cut-off at lag

(q+1) suggest an MA(q) process]. For AR

processes the autocorrelations only decay to zero

and it is hard to identify the order p.

Definition: Let kjφ denote the j-th coefficient

when fitting an AR(k) model to the series, i.e.,

tktkktktkt XXXX ξφφφ ++++= −−− ...2211 .

=kkφ Partial Correlation of lag k.

1 1 1 1

1 2 2 2

1 2

1 ...1 ...

... 1

k k

k k

k k k

ρ ρ φ ρρ ρ φ ρ

ρ ρ ρ

−

−

− −

⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥ ⎢ ⎥=⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎣ ⎦ ⎣ ⎦ ⎣ ⎦

M M M M M M

kkφ

14

Computation of partial autocorrelations (cont.)

( ) ( ) (k)ρk kΩ ×Φ =% % ⇒

][][

)(

*)(

k

kkk Det

DetΩ

Ω=φ ;

*)(kΩ is obtained from )(kΩ by replacing the last

column by (k)ρ%

.

• The partial autocorrelations can be estimated by replacing kρ by ˆkρ , but there are better methods.

• The fitting of an AR(k) model does not imply

that this is the correct model. Thus, for any

stationary series, 111 ρφ = , 2

2 122 2

11ρ ρ

φρ−

=−

…

The partial autocorrelations impose restrictions on the autocorrelations. For example, if 1 0.8ρ = ,

2 0.28ρ⇒ ≥ since 2

22 1 2| | 1 2 1 1φ ρ ρ≤ − ≤ ≤⇒

15

Use of Partial Correlations for the Identification of AR models

If the true model is AR(p), ppp φφ = whereas 1, 1 2 2, ... 0p p p pφ φ+ + + += = = ⇒ cut off after lag p.

Some computer programs like SAS and SPSS

compute also what is known as Inverse

Autocorrelations, but the use of the Partial

correlations for the identification of AR processes

is preferable.

16

Interpretation of Partial Correlations

kkφ measures the “net correlation” between tX

and ktX − , not explained by 1−tX … 1+−ktX .

1 1( , | ,..., )kk t t k t k tCorr X X X Xφ − − + −= .

Partial correlations in MA processes

Consider, 1t t t tX Y μ ε θε −= − = − [MA(1)], 2

1 /(1 ) , 0 , 2,3,...k kρ θ θ ρ= − + = =

Hence,

42

2

21

21

21

212

22111 111,

θθθ

ρρ

ρρρφρφ

++−=

−−

=−−

==

In general, )1(221

1)1()1(

+

−

−−−

= kkk

kkθ

θθφ ⇒ kkk |||| θφ < .

By the Invertibility condition, 1||

17

Ergodicity

In view of the autocorrelations between the

observations, the question arising is whether the

estimators ˆkμ and kγ̂ obtained from a single series

are consistent for the corresponding parameters.

There are certain ergodic theorems that indeed

guarantee the consistency of the estimators. For

example, a sufficient condition for Y to be

consistent for μ is that 0→kρ as ∞→k , i.e.,

values sufficiently apart are almost uncorrelated.

18

Wold Theorem

Every weakly stationary series that has no

deterministic component can be represented as

an infinite linear combination of random

innovations (white noise),

∑=+++=− ∞= −−− 02211 ... j jtjttttY εψεψεψεμ

0( 1)ψ = , where the tε are independent,

( ) 0 ,tE ε = 2( )tVar ε σ= and ∑ ∞

19

ARIMA Models

Introduction By Wold theorem every stationary series that

does not contain a deterministic component can be

represented as an )(∞MA process with convergent

coefficients. This representation can be

approximated by a finite MA or AR process but it

may require including many terms and hence the

estimation of many parameters (small number of

degrees of freedom).

The number of parameters can often be reduced

very drastically by including in the model both AR

and MA terms (Parsimonious representation).

20

ARIMA models

1 1 1 1... ...t t p t p t t q t qX X Xφ φ ε θ ε θ ε− − − −= + + + − − −

tt BXB ε)()( Θ=Φ ; ARMA(p,q).

ppBBB φφ −−−=Φ ...1)( 1 ;

1( ) 1 ...q

qB B Bθ θΘ = − − − . If t

dt ZBX )1( −= then we have ARIMA(p,d,q).

• qp > , qp = or qp < .

For nonseasonal models, usually 2, ≤qp

• The model can be viewed as an Autoregressive

model with correlated residuals,

1 1 ...t t t q t qe ε θ ε θ ε− −= − − − , where the correlations

are determined by the MA coefficients.

21

Example, ARMA(1,1)

1 1t t t tX Xφ ε θε− −= + − ⇒ 1 2 1 2t t t tX Xφ ε θε− − − −= + − ⇓

22 1 2( )t t t t tX Xφ ε φ θ ε φθε− − −= + + − − =

= 3 23 1 2 3( ) ( )t t t t tXφ ε φ θ ε φ φ θ ε φ θε− − − −+ + − + − − =...

= 21 2 3( ) ( ) ( ) ...t t t tε φ θ ε φ φ θ ε φ φ θ ε− − −+ − + − + − +

For Stationarity 0 ( )i

i φ φ θ∞= − < ∞∑ or 1||

22

Conditions for Invertibility and Stationarity, General case

Model: tt BXB ε)()( Θ=Φ . The MA part is always

stationary, The AR part is always invertible.

Hence:

The model is stationary if and only if the roots of

0)( =Φ B are outside the unit circle.

The model is invertible if and only if the roots of

0)( =Θ B are outside the unit circle.

23

Another justification for use of ARMA models

Many of the series of interest are sums of series. For example, total unemployment can be broken down by sex, age, industry classification etc.

Theorem

),(~

),(~

222

111qpARMAX

qpARMAX

t

t ⇒ ),(~21 qpARMAXX tt + ,

Where ),max(, 122121 qpqpqppp ++≤+≤

Example 1: )(~

)(~

22

11

qMAXqMAX

t

t ⇒ )(~21 qMAXX tt +

),max( 21 qqq ≤ .

Example 2: )(~

)(~

22

11pARXpARX

t

t ⇒ ),(~21 qpARMAXX tt +

),max(, 2121 ppqppp ≤+≤

The sum of two MA processes is again MA, but the sum of two AR processes is not necessarily AR.

24

Example 2 (Cont.)

Suppose that (1 ) , (1 )t t t tB X B Yα ε φ η− = − = ;

2( ) ( )t tVar Varη ε σ= = , and let t t tZ X Y= + . Then,

(1 )(1 ) (1 )(1 ) (1 )(1 )t t tB B Z B B X B B Yα φ α φ α φ− − = − − + − −

1 1(1 ) (1 ) .t t t t t tB Bφ ε α η ε φε η αη− −= − + − = − + −

Denote, (1 )(1 ) t tB B Z Qα φ− − = . 2 2 2( ) (2 )tVar Q α φ σ= + + ,

21( , ) ( )t tCov Q Q α φ σ− = − + ;

( , ) 0, 1t t kCov Q Q k− = >

This is ARMA(2,1). If α φ= − , then it is

AR(2).

25

Autocorrelations of ARMA processes

1 1 1 1... ...t t p t p t t q t qX X Xφ φ ε θ ε θ ε− − − −= + + + − − −

By multiplying both sides by ktX − and taking expectations it is easy to see that, For k > q ⇒ pkpkk −− ++= ρφρφρ ...11 ,

Same as in AR(p) process. The difference is in the

behavior of the first q correlations (in fact in first

q-p correlations).

Example: ARMA (1,1), 1 1t t t tX Xφ ε θε− −= + − .

Stationary and invertible if 1,1

26

Autocorrelations of ARMA(1,1) (cont.)

We just found in (4) that 1( )t tE Xε − =2( ) εφ θ σ− .

Substituting in (1) yields, 2 2

0 1 ( )ε εγ φγ σ θ φ θ σ= + − −

2 2 2 20 (1 )ε εφ γ φθσ φθ θ σ= − + − +

by(2). Hence,

22

0 21 2

1 εθ φθγ σ

φ+ −

=−

; 21 2(1 )( )

1 εφθ φ θγ σ

φ− −

=−

⇓

1 2(1 )( )1 2

φθ φ θρθ φθ

− −=

+ − ; 1 , 2,3,...k k kρ φρ −= =

• Denominator of 1ρ and (1 )φθ− always >0 and so 1( ) ( )ρ φ θ= −sign sign .

Partial correlations in ARMA models

By dividing both sides of the model equations by

( )BΘ , the model can be expressed as )(∞AR and

hence the partial correlations decay to zero.

Behave as in MA(q) for k>p (already for k> p-q).

27

Checking for stationarity, transformations

For k>q-p, kppk

k GAGA ++= ...11ρ

Where 1−iG are the solutions of 0)( =Φ B (same as in AR(p), but only from k>q-p). Thus, if the process is stationary 11 >−iG for all i and if none of the roots is close to 1, the autocorrelations decay fast to zero from k>q-p.

As in AR models, if d roots are 1, differencing the series d times yields a stationary series. • Slow decay of the Autocorrelations indicates

the need for differencing. To see this, suppose

that 1 (1 )G δ= − where δ is a small positive

number, and the other kG ’s are much smaller.

Then, )1()1( 1111 δδρ kAAGAkk

k −≅−=≅ so that

the decay is linear with a small increment. In

situations like this it is advisable to difference

even if the root is >1.

28

Interpretation of differencing

Suppose that the series tY follows the model,

0 1 ...d

t d tY t t Zα α α= + + + + , where tZ satisfies,

ttd BZBB ε)()1)(( Θ=−Φ . ARIMA(p,d,q). Then,

td

dtd ZBdYB )1(!)1( −+=− α , and

( )(1 ) ( )(1 )

( ) . [ ( ) .]

d dt t

t

B B Y const B B Z

const B B const constε

Φ − = +Φ −

= +Θ =

Thus, after d differences we get a stationary series

with a constant mean. Taking one more difference

eliminates the constant but this is not advisable

because it inflates the variance and the resulting

model could be noninvertible.

• The opposite also holds; the existence of a

nonzero mean after d differences signifies a

deterministic polynomial trend of order d.

29

Interpretation of differencing (cont.)

Example of the effect of ‘over differencing’:

1 1t t t t t t tY c W Y Yε ε ε− −= + = − = −⇒ ;

2( ) 0, ( ) 2t tE W Var W εσ= =

The variance is doubled and tW is noninvertible.

Log transformation

The use of the log transformation is commonly

justified as “variance stabilizer”. This can be seen

as follows: Suppose that the original series is the

product of a trend level, tL , a Seasonal effect,

tS , and an irregular component, tI , i.e.,

t t t tY L S I= × × (multiplicative decomposition).

Clearly, even if tI has a constant variance, the

variance of tY depends on tt ST × .

30

Log transformation (cont.)

t t t tY L S I= × × . Taking logs of both sides yields,

log log log logt t t tY L S I= + + , and if the log of

each component is a stationary ARMA process,

the sum is also an ARMA process.

We may still need differencing as well. For

example, if exp( )tL tα β= + (exponential trend),

ttt IStY logloglog +++= βα and we need one

difference (d=1). If tSlog is also nonstationary,

we may also need a seasonal difference (see

below).

Y =+− −−μ εθε θε - huji.ac.ilpluto.huji.ac.il/~msby/time-files/ARMA_2010.pdf · 3...

Documents

Transcript of Y =+− −−μ εθε θε - huji.ac.ilpluto.huji.ac.il/~msby/time-files/ARMA_2010.pdf · 3...