Y =+− −−μ εθε θε - huji.ac.ilpluto.huji.ac.il/~msby/time-files/ARMA_2010.pdf · 3...

30
Moving Average (MA) Models General representation of MA models 1 1 ... t t t q t q Y μ ε θε θε = + ~ () t Y MA q Can be viewed as a (finite) approximation for the infinite sum of random innovations (Wold Theorem, see later). The use of negative signs before the coefficients is meaningless. A positive coefficient for t k ε requires 0 k θ < , k=1,2,… Properties μ = ) ( t Y E ; + = = q j j t Y Var 1 2 2 ) 1 ( ) ( σ θ 2 1 1 ( ... ) k k k q k q for k q γ θ θθ θ θ σ + =− + + + q k for k > = 0 γ Cut-off after lag q. Example : In a rotating panel survey, if units are in the sample for q+1 successive time points (like in Labour Force Surveys), the model holding for the sample estimates is MA(q). MA models often obtained after transforming the original series. For example, t t bt a Y ε + + = 1 1 + = t t t t b Y Y ε ε ~ MA(1).

Transcript of Y =+− −−μ εθε θε - huji.ac.ilpluto.huji.ac.il/~msby/time-files/ARMA_2010.pdf · 3...

  • Moving Average (MA) Models

    General representation of MA models

    1 1 ...t t t q t qY μ ε θ ε θ ε− −= + − − − ~ ( )tY MA q↔

    • Can be viewed as a (finite) approximation for the infinite sum of random innovations (Wold Theorem, see later).

    • The use of negative signs before the coefficients is meaningless. A positive coefficient for t kε − requires 0kθ < , k=1,2,…

    Properties

    μ=)( tYE ; ∑+= =qj jtYVar 122 )1()( σθ

    21 1( ... )k k k q k q for k qγ θ θ θ θ θ σ+ −= − + + + ≤

    qkfork >= 0γ ⇒ Cut-off after lag q.

    Example: In a rotating panel survey, if units are in the sample for q+1 successive time points (like in Labour Force Surveys), the model holding for the sample estimates is MA(q).

    • MA models often obtained after transforming

    the original series. For example,

    tt btaY ε++= 11 −− −+=−⇒ tttt bYY εε ~ MA(1).

  • 2

    Example and alternative representation

    Example: MA(1) model, 1t t tY μ ε θε −− = − 2 2 2

    1( ) (1 ) , ( , )t t tVar Y Cov Y Yθ σ θσ−= + = −

    1 2( , ) 1t tCorr Y Y θ

    θ−−

    ⇒ =+

    and 11| ( , ) |2t t

    Corr Y Y − ≤ .

    • Important for model identification. When 0θ < , the series is ‘smooth’. When 0θ > , the series is ‘jumpy’.

    For general MA(q) process;

    {1

    cos[ /( 1)] if divides (q+1)cos[ /( 2)] otherwise kk q

    N kMax Nπρ π≤ ≤

    +=+

    N is the largest integer not exceeding ( 1) /q k+ .

    Back shift operator: 1−= tt XBX 2

    2( ) ( ) , , ( )k

    t t t t t kB X B BX X B X X− −⇒ = = ⋅⋅⋅ = .

    Alternative model representation 2

    1 2(1 ... ) ( )q

    t q t tY B B B Bμ θ θ θ ε ε− = − − − − = Θ

    A polynomial in B of order q.

    • MA processes are always stationary.

  • 3

    Invertibility

    Question: Suppose that the series of interest follows an MA(q) process. Is the model identifiable in the sense that the model coefficients are determined uniquely by the autocorrelations of the process? Answer: NO.

    Example: the MA(1) processes,

    15.0 −+=− tttY εεμ and 12 −+=− tttY εεμ , both have 4.01 =ρ

    Let )( μ−= tt YX . Then for any MA(1) process,

    1 1 2( )t t t t t tX Xε θε ε θ θε− − −= − = − +

    21 2t t tXε θ θ ε− −= − − =,…, 1

    it t ii

    Xε θ∞ −== −∑

    • For | | 1θ > we assign more and more weight to

    remote observations, makes little sense!

    • In this example we will choose 0.5θ = − .

  • 4

    Invertibility (cont.)

    Definition: An MA process is invertible if it can

    be written as an infinite Autoregressive process,

    1 1 2 2 ...t t t tY A Y A Yε − −= + + +% % % with coefficients jA

    that converge to zero, (1| |jj A

    =< ∞∑ ). ( t tY Y μ= −% ).

    Theorem: The MA(q) process is invertible if and only if the roots of the polynomial equation

    0)( =Θ B are larger than 1 in absolute value, 1|| >kB . (Outside the unit circle when the root kB

    is a complex number).

    Example: For the MA(1) process (1 )t tY Bθ ε= −% ,

    ( ) 0 1 0 (1/ )B B Bθ θΘ = − = =⇔ ⇔ ;

    | | 1 | | 1B θ>

  • 5

    Invertibility, general case

    ( )t tY B ε= Θ% and we ask whether we can divide

    [1/ ( )]t tB Yε = Θ % with convergent coefficients.

    Write, )1(...)1()( 1 BHBHB q−××−=Θ where

    )/1( iH are the roots of 0)( =Θ B . If 1| | 1

    iH> for

    all i 1||

  • 6

    Autoregressive (AR) Models

    General Representation of AR Models

    1 1 2 2 ...t t t p t p tX X X Xφ φ φ ε− − −= + + + + ; ( )t tX Y μ= −

    • Looks like an ordinary regression model but the

    explanatory variables are past observations and

    hence are dependent with each other, and with

    the residuals of other equations. Nonetheless, if

    the series is stationary (see next slide), the usual

    Least Squares estimators are consistent for the φ -

    coefficients and are best asymptotic normal

    (BAN). Also, the usual mean residual sum of

    squares is a consistent estimator of )(2 tVar εσ = .

    Polynomial representation

    ttYB ε=Φ )( ; p

    pBBB φφ −−−=Φ ...1)( 1 → ~ ( )tY AR p

    • Autoregressive models are always invertible.

  • 7

    Condition for stationarity

    An AR(P) process is stationary if and only if the roots of the polynomial equation 0)( =Φ B are larger than 1 in absolute value, 1|| >kB , (the root

    kB is outside the unit circle when kB is complex).

    Example: For an AR(1) process ttXB εφ =− )1( ,

    ( ) 0 1 0 (1/ )B B Bφ φΦ = − = =⇔ ⇔ ;

    | | 1 | | 1B φ> for all i ,

    1||

  • 8

    Condition for stationarity (cont.)

    • When at least one of the roots of 0)( =Φ B is

    inside the unit circle, the series “explodes”.

    Example: ttt XX ε+= −13 ; (variance grows geometrically). • When d roots equal 1 and the rest are outside the

    unit circle, we may write dBBB )1)(()( 1 −Φ=Φ ,

    where the roots of 0)(1 =Φ B are all outside the

    unit circle. Hence,

    1 1( ) ( )(1 ) ( )d

    t t tB X B B X B UΦ =Φ − = Φ and

    (1 )dt tU B X= − is stationary.

    • This result shows why very often stationarity is achieved by differencing.

    Example: Random walk, ttt XX ε+= −1 and

    ttXB ε=− )1( ⇒white noise and hence stationary.

  • 9

    Variance and correlations of AR(P) processes

    Let t tX Y μ= − and suppose,

    1 1 2 2 ...t t t p t p tX X X Xφ φ φ ε− − −= + + + + ⇓

    21 1 2 2 ...t t t t t p t p t t tX X X X X X X Xφ φ φ ε− − −= + + + + .

    Taking expectations on both sides and noting that

    0)()( =−= μtt YEXE yields, 2

    22110 ... εσγφγφγφγ ++++= pp ; [ )()( 2ttt EXE εε = ]

    ⇒= kk ργγ 0 2

    22110 )...1( εσρφρφρφγ =−−−− pp ,

    or )...1/( 22112

    0 ppρφρφρφσγ ε −−−−= .

    ...2211 ++= −−−−− kttkttktt XXXXXX φφ

    kttktptp XXX −−− ++ εφ . Taking expectations,

    pkpkkk −−− +++= γφγφγφγ ...2211

    ⇓ Yule-Walker (Y-W): pkpkkk −−− +++= ρφρφρφρ ...2211

    0)( =Φ kB ρ ( kk ρρ =− )

  • 10

    Examples

    For the AR(1) process ttt XX εφ += −1 ,

    φφρρ == 01 ,…, k

    kk φφρρ === − ...1

    The Autocorrelations decay to zero geometrically.

    For AR(2), tttt XXX εφφ ++= −− 2211 ,

    12112011 ρφφρφρφρ +=+= −

    21102112 φρφρφρφρ +=+=

    ⇓ 2

    1 1 2 2 1 2 2/(1 ) [ /(1 )]ρ φ φ ρ φ φ φ= − = − +; .

    Once we know 1ρ and 2ρ the other correlations are obtained from the Y-W equations,

    3 1 2 2 1ρ φ ρ φ ρ= + , 4 1 3 2 2ρ φ ρ φ ρ= + , …

  • 11

    General solution of Y-W equations

    For the general AR(p) process, if all the roots are distinct, ∑= =

    pr

    krrk GA1ρ ; }...1),/1{( prGr = are the

    solutions of 0)( =Φ B and the rA s are constants.

    Importance of general solution

    For a stationary process |1/ | 1rG > for all r and

    hence | | 1rG < and 0kρ → as k → ∞ .

    Necessary condition for stationarity. (In practice we plot the correlogram of the estimated correlations and check decay to zero.)

    Some of the roots }...1),/1{( prGr = can be complex, in which case the decay is sinusoidal.

    The coefficients kA can be computed by solving the p equations,

    1, 1...p kk r rr A G k pρ == =∑

    with the rA ’s as the unknowns. The first equation is,

    11p ir A= =∑ .

    Example:

    1 20.24t t t tY Y Y ε− −= − + ⇒ 1 11 21 1,

    0.4 0.6G G− −= = ;

    1 1 2/(1 ) 0.8065ρ φ φ= − = ⇒ 1 22.03, 1.03A A= = − .

  • 12

    Estimation of AR(p) coefficients by use of autocorrelation estimates

    Y-W equations :

    pp φρφρφρ 12111 ... −+++=

    pp φρφφρρ 22112 ... −+++= Note: k kρ ρ−= . . .

    pppp φφρφρρ +++= −− ...2211

    In matrix form,

    ⎥⎥⎥⎥

    ⎢⎢⎢⎢

    =

    ⎥⎥⎥⎥

    ⎢⎢⎢⎢

    ⎥⎥⎥⎥

    ⎢⎢⎢⎢

    −−

    pppp

    p

    p

    ρ

    ρρ

    φ

    φφ

    ρρ

    ρρρρ

    MMMMMM2

    1

    2

    1

    21

    21

    11

    1...

    ...1

    ...1

    ρΩ×Φ =

    % % 1ρ−⇒ Φ =Ω

    % % and 1ˆˆ ρ̂−Φ = Ω

    % %

    Question: Why is Ω invertible?

    Example:

    AR(2); 1 1 2 11 1 2 1

    φ ρ φ ρφ ρ φ ρ+ =+ =

    21 2 2 1

    1 22 21 1

    (1 ) ;1 1

    ρ ρ ρ ρφ φρ ρ− −

    = =− −

  • 13

    Partial autocorrelations

    An MA model can be identified by checking for a

    cut-off in the Correlogram. [A cut-off at lag

    (q+1) suggest an MA(q) process]. For AR

    processes the autocorrelations only decay to zero

    and it is hard to identify the order p.

    Definition: Let kjφ denote the j-th coefficient

    when fitting an AR(k) model to the series, i.e.,

    tktkktktkt XXXX ξφφφ ++++= −−− ...2211 .

    =kkφ Partial Correlation of lag k.

    1 1 1 1

    1 2 2 2

    1 2

    1 ...1 ...

    ... 1

    k k

    k k

    k k k

    ρ ρ φ ρρ ρ φ ρ

    ρ ρ ρ

    − −

    ⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥ ⎢ ⎥=⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎣ ⎦ ⎣ ⎦ ⎣ ⎦

    M M M M M M

    kkφ

  • 14

    Computation of partial autocorrelations (cont.)

    ( ) ( ) (k)ρk kΩ ×Φ =% % ⇒

    ][][

    )(

    *)(

    k

    kkk Det

    DetΩ

    Ω=φ ;

    *)(kΩ is obtained from )(kΩ by replacing the last

    column by (k)ρ%

    .

    • The partial autocorrelations can be estimated by replacing kρ by ˆkρ , but there are better methods.

    • The fitting of an AR(k) model does not imply

    that this is the correct model. Thus, for any

    stationary series, 111 ρφ = , 2

    2 122 2

    11ρ ρ

    φρ−

    =−

    The partial autocorrelations impose restrictions on the autocorrelations. For example, if 1 0.8ρ = ,

    2 0.28ρ⇒ ≥ since 2

    22 1 2| | 1 2 1 1φ ρ ρ≤ − ≤ ≤⇒

  • 15

    Use of Partial Correlations for the Identification of AR models

    If the true model is AR(p), ppp φφ = whereas 1, 1 2 2, ... 0p p p pφ φ+ + + += = = ⇒ cut off after lag p.

    Some computer programs like SAS and SPSS

    compute also what is known as Inverse

    Autocorrelations, but the use of the Partial

    correlations for the identification of AR processes

    is preferable.

  • 16

    Interpretation of Partial Correlations

    kkφ measures the “net correlation” between tX

    and ktX − , not explained by 1−tX … 1+−ktX .

    1 1( , | ,..., )kk t t k t k tCorr X X X Xφ − − + −= .

    Partial correlations in MA processes

    Consider, 1t t t tX Y μ ε θε −= − = − [MA(1)], 2

    1 /(1 ) , 0 , 2,3,...k kρ θ θ ρ= − + = =

    Hence,

    42

    2

    21

    21

    21

    212

    22111 111,

    θθθ

    ρρ

    ρρρφρφ

    ++−=

    −−

    =−−

    ==

    In general, )1(221

    1)1()1(

    +

    −−−

    = kkk

    kkθ

    θθφ ⇒ kkk |||| θφ < .

    By the Invertibility condition, 1||

  • 17

    Ergodicity

    In view of the autocorrelations between the

    observations, the question arising is whether the

    estimators ˆkμ and kγ̂ obtained from a single series

    are consistent for the corresponding parameters.

    There are certain ergodic theorems that indeed

    guarantee the consistency of the estimators. For

    example, a sufficient condition for Y to be

    consistent for μ is that 0→kρ as ∞→k , i.e.,

    values sufficiently apart are almost uncorrelated.

  • 18

    Wold Theorem

    Every weakly stationary series that has no

    deterministic component can be represented as

    an infinite linear combination of random

    innovations (white noise),

    ∑=+++=− ∞= −−− 02211 ... j jtjttttY εψεψεψεμ

    0( 1)ψ = , where the tε are independent,

    ( ) 0 ,tE ε = 2( )tVar ε σ= and ∑ ∞

  • 19

    ARIMA Models

    Introduction By Wold theorem every stationary series that

    does not contain a deterministic component can be

    represented as an )(∞MA process with convergent

    coefficients. This representation can be

    approximated by a finite MA or AR process but it

    may require including many terms and hence the

    estimation of many parameters (small number of

    degrees of freedom).

    The number of parameters can often be reduced

    very drastically by including in the model both AR

    and MA terms (Parsimonious representation).

  • 20

    ARIMA models

    1 1 1 1... ...t t p t p t t q t qX X Xφ φ ε θ ε θ ε− − − −= + + + − − −

    tt BXB ε)()( Θ=Φ ; ARMA(p,q).

    ppBBB φφ −−−=Φ ...1)( 1 ;

    1( ) 1 ...q

    qB B Bθ θΘ = − − − . If t

    dt ZBX )1( −= then we have ARIMA(p,d,q).

    • qp > , qp = or qp < .

    For nonseasonal models, usually 2, ≤qp

    • The model can be viewed as an Autoregressive

    model with correlated residuals,

    1 1 ...t t t q t qe ε θ ε θ ε− −= − − − , where the correlations

    are determined by the MA coefficients.

  • 21

    Example, ARMA(1,1)

    1 1t t t tX Xφ ε θε− −= + − ⇒ 1 2 1 2t t t tX Xφ ε θε− − − −= + − ⇓

    22 1 2( )t t t t tX Xφ ε φ θ ε φθε− − −= + + − − =

    = 3 23 1 2 3( ) ( )t t t t tXφ ε φ θ ε φ φ θ ε φ θε− − − −+ + − + − − =...

    = 21 2 3( ) ( ) ( ) ...t t t tε φ θ ε φ φ θ ε φ φ θ ε− − −+ − + − + − +

    For Stationarity 0 ( )i

    i φ φ θ∞= − < ∞∑ or 1||

  • 22

    Conditions for Invertibility and Stationarity, General case

    Model: tt BXB ε)()( Θ=Φ . The MA part is always

    stationary, The AR part is always invertible.

    Hence:

    The model is stationary if and only if the roots of

    0)( =Φ B are outside the unit circle.

    The model is invertible if and only if the roots of

    0)( =Θ B are outside the unit circle.

  • 23

    Another justification for use of ARMA models

    Many of the series of interest are sums of series. For example, total unemployment can be broken down by sex, age, industry classification etc.

    Theorem

    ),(~

    ),(~

    222

    111qpARMAX

    qpARMAX

    t

    t ⇒ ),(~21 qpARMAXX tt + ,

    Where ),max(, 122121 qpqpqppp ++≤+≤

    Example 1: )(~

    )(~

    22

    11

    qMAXqMAX

    t

    t ⇒ )(~21 qMAXX tt +

    ),max( 21 qqq ≤ .

    Example 2: )(~

    )(~

    22

    11pARXpARX

    t

    t ⇒ ),(~21 qpARMAXX tt +

    ),max(, 2121 ppqppp ≤+≤

    The sum of two MA processes is again MA, but the sum of two AR processes is not necessarily AR.

  • 24

    Example 2 (Cont.)

    Suppose that (1 ) , (1 )t t t tB X B Yα ε φ η− = − = ;

    2( ) ( )t tVar Varη ε σ= = , and let t t tZ X Y= + . Then,

    (1 )(1 ) (1 )(1 ) (1 )(1 )t t tB B Z B B X B B Yα φ α φ α φ− − = − − + − −

    1 1(1 ) (1 ) .t t t t t tB Bφ ε α η ε φε η αη− −= − + − = − + −

    Denote, (1 )(1 ) t tB B Z Qα φ− − = . 2 2 2( ) (2 )tVar Q α φ σ= + + ,

    21( , ) ( )t tCov Q Q α φ σ− = − + ;

    ( , ) 0, 1t t kCov Q Q k− = >

    This is ARMA(2,1). If α φ= − , then it is

    AR(2).

  • 25

    Autocorrelations of ARMA processes

    1 1 1 1... ...t t p t p t t q t qX X Xφ φ ε θ ε θ ε− − − −= + + + − − −

    By multiplying both sides by ktX − and taking expectations it is easy to see that, For k > q ⇒ pkpkk −− ++= ρφρφρ ...11 ,

    Same as in AR(p) process. The difference is in the

    behavior of the first q correlations (in fact in first

    q-p correlations).

    Example: ARMA (1,1), 1 1t t t tX Xφ ε θε− −= + − .

    Stationary and invertible if 1,1

  • 26

    Autocorrelations of ARMA(1,1) (cont.)

    We just found in (4) that 1( )t tE Xε − =2( ) εφ θ σ− .

    Substituting in (1) yields, 2 2

    0 1 ( )ε εγ φγ σ θ φ θ σ= + − −

    2 2 2 20 (1 )ε εφ γ φθσ φθ θ σ= − + − +

    by(2). Hence,

    22

    0 21 2

    1 εθ φθγ σ

    φ+ −

    =−

    ; 21 2(1 )( )

    1 εφθ φ θγ σ

    φ− −

    =−

    1 2(1 )( )1 2

    φθ φ θρθ φθ

    − −=

    + − ; 1 , 2,3,...k k kρ φρ −= =

    • Denominator of 1ρ and (1 )φθ− always >0 and so 1( ) ( )ρ φ θ= −sign sign .

    Partial correlations in ARMA models

    By dividing both sides of the model equations by

    ( )BΘ , the model can be expressed as )(∞AR and

    hence the partial correlations decay to zero.

    Behave as in MA(q) for k>p (already for k> p-q).

  • 27

    Checking for stationarity, transformations

    For k>q-p, kppk

    k GAGA ++= ...11ρ

    Where 1−iG are the solutions of 0)( =Φ B (same as in AR(p), but only from k>q-p). Thus, if the process is stationary 11 >−iG for all i and if none of the roots is close to 1, the autocorrelations decay fast to zero from k>q-p.

    As in AR models, if d roots are 1, differencing the series d times yields a stationary series. • Slow decay of the Autocorrelations indicates

    the need for differencing. To see this, suppose

    that 1 (1 )G δ= − where δ is a small positive

    number, and the other kG ’s are much smaller.

    Then, )1()1( 1111 δδρ kAAGAkk

    k −≅−=≅ so that

    the decay is linear with a small increment. In

    situations like this it is advisable to difference

    even if the root is >1.

  • 28

    Interpretation of differencing

    Suppose that the series tY follows the model,

    0 1 ...d

    t d tY t t Zα α α= + + + + , where tZ satisfies,

    ttd BZBB ε)()1)(( Θ=−Φ . ARIMA(p,d,q). Then,

    td

    dtd ZBdYB )1(!)1( −+=− α , and

    ( )(1 ) ( )(1 )

    ( ) . [ ( ) .]

    d dt t

    t

    B B Y const B B Z

    const B B const constε

    Φ − = +Φ −

    = +Θ =

    Thus, after d differences we get a stationary series

    with a constant mean. Taking one more difference

    eliminates the constant but this is not advisable

    because it inflates the variance and the resulting

    model could be noninvertible.

    • The opposite also holds; the existence of a

    nonzero mean after d differences signifies a

    deterministic polynomial trend of order d.

  • 29

    Interpretation of differencing (cont.)

    Example of the effect of ‘over differencing’:

    1 1t t t t t t tY c W Y Yε ε ε− −= + = − = −⇒ ;

    2( ) 0, ( ) 2t tE W Var W εσ= =

    The variance is doubled and tW is noninvertible.

    Log transformation

    The use of the log transformation is commonly

    justified as “variance stabilizer”. This can be seen

    as follows: Suppose that the original series is the

    product of a trend level, tL , a Seasonal effect,

    tS , and an irregular component, tI , i.e.,

    t t t tY L S I= × × (multiplicative decomposition).

    Clearly, even if tI has a constant variance, the

    variance of tY depends on tt ST × .

  • 30

    Log transformation (cont.)

    t t t tY L S I= × × . Taking logs of both sides yields,

    log log log logt t t tY L S I= + + , and if the log of

    each component is a stationary ARMA process,

    the sum is also an ARMA process.

    We may still need differencing as well. For

    example, if exp( )tL tα β= + (exponential trend),

    ttt IStY logloglog +++= βα and we need one

    difference (d=1). If tSlog is also nonstationary,

    we may also need a seasonal difference (see

    below).