Y =+− −−μ εθε θε - huji.ac.ilpluto.huji.ac.il/~msby/time-files/ARMA_2010.pdf · 3...
Transcript of Y =+− −−μ εθε θε - huji.ac.ilpluto.huji.ac.il/~msby/time-files/ARMA_2010.pdf · 3...
-
Moving Average (MA) Models
General representation of MA models
1 1 ...t t t q t qY μ ε θ ε θ ε− −= + − − − ~ ( )tY MA q↔
• Can be viewed as a (finite) approximation for the infinite sum of random innovations (Wold Theorem, see later).
• The use of negative signs before the coefficients is meaningless. A positive coefficient for t kε − requires 0kθ < , k=1,2,…
Properties
μ=)( tYE ; ∑+= =qj jtYVar 122 )1()( σθ
21 1( ... )k k k q k q for k qγ θ θ θ θ θ σ+ −= − + + + ≤
qkfork >= 0γ ⇒ Cut-off after lag q.
Example: In a rotating panel survey, if units are in the sample for q+1 successive time points (like in Labour Force Surveys), the model holding for the sample estimates is MA(q).
• MA models often obtained after transforming
the original series. For example,
tt btaY ε++= 11 −− −+=−⇒ tttt bYY εε ~ MA(1).
-
2
Example and alternative representation
Example: MA(1) model, 1t t tY μ ε θε −− = − 2 2 2
1( ) (1 ) , ( , )t t tVar Y Cov Y Yθ σ θσ−= + = −
1 2( , ) 1t tCorr Y Y θ
θ−−
⇒ =+
and 11| ( , ) |2t t
Corr Y Y − ≤ .
• Important for model identification. When 0θ < , the series is ‘smooth’. When 0θ > , the series is ‘jumpy’.
For general MA(q) process;
{1
cos[ /( 1)] if divides (q+1)cos[ /( 2)] otherwise kk q
N kMax Nπρ π≤ ≤
+=+
N is the largest integer not exceeding ( 1) /q k+ .
Back shift operator: 1−= tt XBX 2
2( ) ( ) , , ( )k
t t t t t kB X B BX X B X X− −⇒ = = ⋅⋅⋅ = .
Alternative model representation 2
1 2(1 ... ) ( )q
t q t tY B B B Bμ θ θ θ ε ε− = − − − − = Θ
A polynomial in B of order q.
• MA processes are always stationary.
-
3
Invertibility
Question: Suppose that the series of interest follows an MA(q) process. Is the model identifiable in the sense that the model coefficients are determined uniquely by the autocorrelations of the process? Answer: NO.
Example: the MA(1) processes,
15.0 −+=− tttY εεμ and 12 −+=− tttY εεμ , both have 4.01 =ρ
Let )( μ−= tt YX . Then for any MA(1) process,
1 1 2( )t t t t t tX Xε θε ε θ θε− − −= − = − +
21 2t t tXε θ θ ε− −= − − =,…, 1
it t ii
Xε θ∞ −== −∑
• For | | 1θ > we assign more and more weight to
remote observations, makes little sense!
• In this example we will choose 0.5θ = − .
-
4
Invertibility (cont.)
Definition: An MA process is invertible if it can
be written as an infinite Autoregressive process,
1 1 2 2 ...t t t tY A Y A Yε − −= + + +% % % with coefficients jA
that converge to zero, (1| |jj A
∞
=< ∞∑ ). ( t tY Y μ= −% ).
Theorem: The MA(q) process is invertible if and only if the roots of the polynomial equation
0)( =Θ B are larger than 1 in absolute value, 1|| >kB . (Outside the unit circle when the root kB
is a complex number).
Example: For the MA(1) process (1 )t tY Bθ ε= −% ,
( ) 0 1 0 (1/ )B B Bθ θΘ = − = =⇔ ⇔ ;
| | 1 | | 1B θ>
-
5
Invertibility, general case
( )t tY B ε= Θ% and we ask whether we can divide
[1/ ( )]t tB Yε = Θ % with convergent coefficients.
Write, )1(...)1()( 1 BHBHB q−××−=Θ where
)/1( iH are the roots of 0)( =Θ B . If 1| | 1
iH> for
all i 1||
-
6
Autoregressive (AR) Models
General Representation of AR Models
1 1 2 2 ...t t t p t p tX X X Xφ φ φ ε− − −= + + + + ; ( )t tX Y μ= −
• Looks like an ordinary regression model but the
explanatory variables are past observations and
hence are dependent with each other, and with
the residuals of other equations. Nonetheless, if
the series is stationary (see next slide), the usual
Least Squares estimators are consistent for the φ -
coefficients and are best asymptotic normal
(BAN). Also, the usual mean residual sum of
squares is a consistent estimator of )(2 tVar εσ = .
Polynomial representation
ttYB ε=Φ )( ; p
pBBB φφ −−−=Φ ...1)( 1 → ~ ( )tY AR p
• Autoregressive models are always invertible.
-
7
Condition for stationarity
An AR(P) process is stationary if and only if the roots of the polynomial equation 0)( =Φ B are larger than 1 in absolute value, 1|| >kB , (the root
kB is outside the unit circle when kB is complex).
Example: For an AR(1) process ttXB εφ =− )1( ,
( ) 0 1 0 (1/ )B B Bφ φΦ = − = =⇔ ⇔ ;
| | 1 | | 1B φ> for all i ,
1||
-
8
Condition for stationarity (cont.)
• When at least one of the roots of 0)( =Φ B is
inside the unit circle, the series “explodes”.
Example: ttt XX ε+= −13 ; (variance grows geometrically). • When d roots equal 1 and the rest are outside the
unit circle, we may write dBBB )1)(()( 1 −Φ=Φ ,
where the roots of 0)(1 =Φ B are all outside the
unit circle. Hence,
1 1( ) ( )(1 ) ( )d
t t tB X B B X B UΦ =Φ − = Φ and
(1 )dt tU B X= − is stationary.
• This result shows why very often stationarity is achieved by differencing.
Example: Random walk, ttt XX ε+= −1 and
ttXB ε=− )1( ⇒white noise and hence stationary.
-
9
Variance and correlations of AR(P) processes
Let t tX Y μ= − and suppose,
1 1 2 2 ...t t t p t p tX X X Xφ φ φ ε− − −= + + + + ⇓
21 1 2 2 ...t t t t t p t p t t tX X X X X X X Xφ φ φ ε− − −= + + + + .
Taking expectations on both sides and noting that
0)()( =−= μtt YEXE yields, 2
22110 ... εσγφγφγφγ ++++= pp ; [ )()( 2ttt EXE εε = ]
⇒= kk ργγ 0 2
22110 )...1( εσρφρφρφγ =−−−− pp ,
or )...1/( 22112
0 ppρφρφρφσγ ε −−−−= .
...2211 ++= −−−−− kttkttktt XXXXXX φφ
kttktptp XXX −−− ++ εφ . Taking expectations,
pkpkkk −−− +++= γφγφγφγ ...2211
⇓ Yule-Walker (Y-W): pkpkkk −−− +++= ρφρφρφρ ...2211
0)( =Φ kB ρ ( kk ρρ =− )
-
10
Examples
For the AR(1) process ttt XX εφ += −1 ,
φφρρ == 01 ,…, k
kk φφρρ === − ...1
The Autocorrelations decay to zero geometrically.
For AR(2), tttt XXX εφφ ++= −− 2211 ,
12112011 ρφφρφρφρ +=+= −
21102112 φρφρφρφρ +=+=
⇓ 2
1 1 2 2 1 2 2/(1 ) [ /(1 )]ρ φ φ ρ φ φ φ= − = − +; .
Once we know 1ρ and 2ρ the other correlations are obtained from the Y-W equations,
3 1 2 2 1ρ φ ρ φ ρ= + , 4 1 3 2 2ρ φ ρ φ ρ= + , …
-
11
General solution of Y-W equations
For the general AR(p) process, if all the roots are distinct, ∑= =
pr
krrk GA1ρ ; }...1),/1{( prGr = are the
solutions of 0)( =Φ B and the rA s are constants.
Importance of general solution
For a stationary process |1/ | 1rG > for all r and
hence | | 1rG < and 0kρ → as k → ∞ .
Necessary condition for stationarity. (In practice we plot the correlogram of the estimated correlations and check decay to zero.)
Some of the roots }...1),/1{( prGr = can be complex, in which case the decay is sinusoidal.
The coefficients kA can be computed by solving the p equations,
1, 1...p kk r rr A G k pρ == =∑
with the rA ’s as the unknowns. The first equation is,
11p ir A= =∑ .
Example:
1 20.24t t t tY Y Y ε− −= − + ⇒ 1 11 21 1,
0.4 0.6G G− −= = ;
1 1 2/(1 ) 0.8065ρ φ φ= − = ⇒ 1 22.03, 1.03A A= = − .
-
12
Estimation of AR(p) coefficients by use of autocorrelation estimates
Y-W equations :
pp φρφρφρ 12111 ... −+++=
pp φρφφρρ 22112 ... −+++= Note: k kρ ρ−= . . .
pppp φφρφρρ +++= −− ...2211
In matrix form,
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
=
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
−−
−
−
pppp
p
p
ρ
ρρ
φ
φφ
ρρ
ρρρρ
MMMMMM2
1
2
1
21
21
11
1...
...1
...1
ρΩ×Φ =
% % 1ρ−⇒ Φ =Ω
% % and 1ˆˆ ρ̂−Φ = Ω
% %
Question: Why is Ω invertible?
Example:
AR(2); 1 1 2 11 1 2 1
φ ρ φ ρφ ρ φ ρ+ =+ =
21 2 2 1
1 22 21 1
(1 ) ;1 1
ρ ρ ρ ρφ φρ ρ− −
= =− −
⇒
-
13
Partial autocorrelations
An MA model can be identified by checking for a
cut-off in the Correlogram. [A cut-off at lag
(q+1) suggest an MA(q) process]. For AR
processes the autocorrelations only decay to zero
and it is hard to identify the order p.
Definition: Let kjφ denote the j-th coefficient
when fitting an AR(k) model to the series, i.e.,
tktkktktkt XXXX ξφφφ ++++= −−− ...2211 .
=kkφ Partial Correlation of lag k.
1 1 1 1
1 2 2 2
1 2
1 ...1 ...
... 1
k k
k k
k k k
ρ ρ φ ρρ ρ φ ρ
ρ ρ ρ
−
−
− −
⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥ ⎢ ⎥=⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎣ ⎦ ⎣ ⎦ ⎣ ⎦
M M M M M M
kkφ
-
14
Computation of partial autocorrelations (cont.)
( ) ( ) (k)ρk kΩ ×Φ =% % ⇒
][][
)(
*)(
k
kkk Det
DetΩ
Ω=φ ;
*)(kΩ is obtained from )(kΩ by replacing the last
column by (k)ρ%
.
• The partial autocorrelations can be estimated by replacing kρ by ˆkρ , but there are better methods.
• The fitting of an AR(k) model does not imply
that this is the correct model. Thus, for any
stationary series, 111 ρφ = , 2
2 122 2
11ρ ρ
φρ−
=−
…
The partial autocorrelations impose restrictions on the autocorrelations. For example, if 1 0.8ρ = ,
2 0.28ρ⇒ ≥ since 2
22 1 2| | 1 2 1 1φ ρ ρ≤ − ≤ ≤⇒
-
15
Use of Partial Correlations for the Identification of AR models
If the true model is AR(p), ppp φφ = whereas 1, 1 2 2, ... 0p p p pφ φ+ + + += = = ⇒ cut off after lag p.
Some computer programs like SAS and SPSS
compute also what is known as Inverse
Autocorrelations, but the use of the Partial
correlations for the identification of AR processes
is preferable.
-
16
Interpretation of Partial Correlations
kkφ measures the “net correlation” between tX
and ktX − , not explained by 1−tX … 1+−ktX .
1 1( , | ,..., )kk t t k t k tCorr X X X Xφ − − + −= .
Partial correlations in MA processes
Consider, 1t t t tX Y μ ε θε −= − = − [MA(1)], 2
1 /(1 ) , 0 , 2,3,...k kρ θ θ ρ= − + = =
Hence,
42
2
21
21
21
212
22111 111,
θθθ
ρρ
ρρρφρφ
++−=
−−
=−−
==
In general, )1(221
1)1()1(
+
−
−−−
= kkk
kkθ
θθφ ⇒ kkk |||| θφ < .
By the Invertibility condition, 1||
-
17
Ergodicity
In view of the autocorrelations between the
observations, the question arising is whether the
estimators ˆkμ and kγ̂ obtained from a single series
are consistent for the corresponding parameters.
There are certain ergodic theorems that indeed
guarantee the consistency of the estimators. For
example, a sufficient condition for Y to be
consistent for μ is that 0→kρ as ∞→k , i.e.,
values sufficiently apart are almost uncorrelated.
-
18
Wold Theorem
Every weakly stationary series that has no
deterministic component can be represented as
an infinite linear combination of random
innovations (white noise),
∑=+++=− ∞= −−− 02211 ... j jtjttttY εψεψεψεμ
0( 1)ψ = , where the tε are independent,
( ) 0 ,tE ε = 2( )tVar ε σ= and ∑ ∞
-
19
ARIMA Models
Introduction By Wold theorem every stationary series that
does not contain a deterministic component can be
represented as an )(∞MA process with convergent
coefficients. This representation can be
approximated by a finite MA or AR process but it
may require including many terms and hence the
estimation of many parameters (small number of
degrees of freedom).
The number of parameters can often be reduced
very drastically by including in the model both AR
and MA terms (Parsimonious representation).
-
20
ARIMA models
1 1 1 1... ...t t p t p t t q t qX X Xφ φ ε θ ε θ ε− − − −= + + + − − −
tt BXB ε)()( Θ=Φ ; ARMA(p,q).
ppBBB φφ −−−=Φ ...1)( 1 ;
1( ) 1 ...q
qB B Bθ θΘ = − − − . If t
dt ZBX )1( −= then we have ARIMA(p,d,q).
• qp > , qp = or qp < .
For nonseasonal models, usually 2, ≤qp
• The model can be viewed as an Autoregressive
model with correlated residuals,
1 1 ...t t t q t qe ε θ ε θ ε− −= − − − , where the correlations
are determined by the MA coefficients.
-
21
Example, ARMA(1,1)
1 1t t t tX Xφ ε θε− −= + − ⇒ 1 2 1 2t t t tX Xφ ε θε− − − −= + − ⇓
22 1 2( )t t t t tX Xφ ε φ θ ε φθε− − −= + + − − =
= 3 23 1 2 3( ) ( )t t t t tXφ ε φ θ ε φ φ θ ε φ θε− − − −+ + − + − − =...
= 21 2 3( ) ( ) ( ) ...t t t tε φ θ ε φ φ θ ε φ φ θ ε− − −+ − + − + − +
For Stationarity 0 ( )i
i φ φ θ∞= − < ∞∑ or 1||
-
22
Conditions for Invertibility and Stationarity, General case
Model: tt BXB ε)()( Θ=Φ . The MA part is always
stationary, The AR part is always invertible.
Hence:
The model is stationary if and only if the roots of
0)( =Φ B are outside the unit circle.
The model is invertible if and only if the roots of
0)( =Θ B are outside the unit circle.
-
23
Another justification for use of ARMA models
Many of the series of interest are sums of series. For example, total unemployment can be broken down by sex, age, industry classification etc.
Theorem
),(~
),(~
222
111qpARMAX
qpARMAX
t
t ⇒ ),(~21 qpARMAXX tt + ,
Where ),max(, 122121 qpqpqppp ++≤+≤
Example 1: )(~
)(~
22
11
qMAXqMAX
t
t ⇒ )(~21 qMAXX tt +
),max( 21 qqq ≤ .
Example 2: )(~
)(~
22
11pARXpARX
t
t ⇒ ),(~21 qpARMAXX tt +
),max(, 2121 ppqppp ≤+≤
The sum of two MA processes is again MA, but the sum of two AR processes is not necessarily AR.
-
24
Example 2 (Cont.)
Suppose that (1 ) , (1 )t t t tB X B Yα ε φ η− = − = ;
2( ) ( )t tVar Varη ε σ= = , and let t t tZ X Y= + . Then,
(1 )(1 ) (1 )(1 ) (1 )(1 )t t tB B Z B B X B B Yα φ α φ α φ− − = − − + − −
1 1(1 ) (1 ) .t t t t t tB Bφ ε α η ε φε η αη− −= − + − = − + −
Denote, (1 )(1 ) t tB B Z Qα φ− − = . 2 2 2( ) (2 )tVar Q α φ σ= + + ,
21( , ) ( )t tCov Q Q α φ σ− = − + ;
( , ) 0, 1t t kCov Q Q k− = >
This is ARMA(2,1). If α φ= − , then it is
AR(2).
-
25
Autocorrelations of ARMA processes
1 1 1 1... ...t t p t p t t q t qX X Xφ φ ε θ ε θ ε− − − −= + + + − − −
By multiplying both sides by ktX − and taking expectations it is easy to see that, For k > q ⇒ pkpkk −− ++= ρφρφρ ...11 ,
Same as in AR(p) process. The difference is in the
behavior of the first q correlations (in fact in first
q-p correlations).
Example: ARMA (1,1), 1 1t t t tX Xφ ε θε− −= + − .
Stationary and invertible if 1,1
-
26
Autocorrelations of ARMA(1,1) (cont.)
We just found in (4) that 1( )t tE Xε − =2( ) εφ θ σ− .
Substituting in (1) yields, 2 2
0 1 ( )ε εγ φγ σ θ φ θ σ= + − −
2 2 2 20 (1 )ε εφ γ φθσ φθ θ σ= − + − +
by(2). Hence,
22
0 21 2
1 εθ φθγ σ
φ+ −
=−
; 21 2(1 )( )
1 εφθ φ θγ σ
φ− −
=−
⇓
1 2(1 )( )1 2
φθ φ θρθ φθ
− −=
+ − ; 1 , 2,3,...k k kρ φρ −= =
• Denominator of 1ρ and (1 )φθ− always >0 and so 1( ) ( )ρ φ θ= −sign sign .
Partial correlations in ARMA models
By dividing both sides of the model equations by
( )BΘ , the model can be expressed as )(∞AR and
hence the partial correlations decay to zero.
Behave as in MA(q) for k>p (already for k> p-q).
-
27
Checking for stationarity, transformations
For k>q-p, kppk
k GAGA ++= ...11ρ
Where 1−iG are the solutions of 0)( =Φ B (same as in AR(p), but only from k>q-p). Thus, if the process is stationary 11 >−iG for all i and if none of the roots is close to 1, the autocorrelations decay fast to zero from k>q-p.
As in AR models, if d roots are 1, differencing the series d times yields a stationary series. • Slow decay of the Autocorrelations indicates
the need for differencing. To see this, suppose
that 1 (1 )G δ= − where δ is a small positive
number, and the other kG ’s are much smaller.
Then, )1()1( 1111 δδρ kAAGAkk
k −≅−=≅ so that
the decay is linear with a small increment. In
situations like this it is advisable to difference
even if the root is >1.
-
28
Interpretation of differencing
Suppose that the series tY follows the model,
0 1 ...d
t d tY t t Zα α α= + + + + , where tZ satisfies,
ttd BZBB ε)()1)(( Θ=−Φ . ARIMA(p,d,q). Then,
td
dtd ZBdYB )1(!)1( −+=− α , and
( )(1 ) ( )(1 )
( ) . [ ( ) .]
d dt t
t
B B Y const B B Z
const B B const constε
Φ − = +Φ −
= +Θ =
Thus, after d differences we get a stationary series
with a constant mean. Taking one more difference
eliminates the constant but this is not advisable
because it inflates the variance and the resulting
model could be noninvertible.
• The opposite also holds; the existence of a
nonzero mean after d differences signifies a
deterministic polynomial trend of order d.
-
29
Interpretation of differencing (cont.)
Example of the effect of ‘over differencing’:
1 1t t t t t t tY c W Y Yε ε ε− −= + = − = −⇒ ;
2( ) 0, ( ) 2t tE W Var W εσ= =
The variance is doubled and tW is noninvertible.
Log transformation
The use of the log transformation is commonly
justified as “variance stabilizer”. This can be seen
as follows: Suppose that the original series is the
product of a trend level, tL , a Seasonal effect,
tS , and an irregular component, tI , i.e.,
t t t tY L S I= × × (multiplicative decomposition).
Clearly, even if tI has a constant variance, the
variance of tY depends on tt ST × .
-
30
Log transformation (cont.)
t t t tY L S I= × × . Taking logs of both sides yields,
log log log logt t t tY L S I= + + , and if the log of
each component is a stationary ARMA process,
the sum is also an ARMA process.
We may still need differencing as well. For
example, if exp( )tL tα β= + (exponential trend),
ttt IStY logloglog +++= βα and we need one
difference (d=1). If tSlog is also nonstationary,
we may also need a seasonal difference (see
below).