THE BASICS OF ARMA MODELS - NYU Stern School of...

THE BASICS OF ARMA MODELS

A

Stationarity

time series in discrete time is a sequence {x } of random variables defined on a common∞∞

p t

t t =−

robability space. We say that {x } is strictly stationary if the joint distributions do not change with

rtime, i.e., if the distribution of (x , . . . , x ) is the same as the distribution of (x , . . . , x ) fot t t +τ t +τ

a 1 k

1 k 1 k

ny integers t , . . . , t , and any integer τ.

The time series is weakly stationary (covariance stationary) if E [x ] = µ exists and is constant

f t X2

t t −r r

t

or all t , var [x ] = σ < ∞, and the autocovariance E [(x − µ) (x − µ)] = c depends only on the lag, r .

t

Strict stationarity, together with the assumption of finite first and second moments, implies weak sta-

ionarity. For Gaussian series (i.e., all joint distributions are Gaussian), weak stationarity is equivalent

t

to strict stationarity. This is because the multivariate Gaussian distribution is uniquely determined by

he first two moments, i.e., the mean and covariance. For convenience, and without any real loss of gen-

c

erality, we will take the constant mean value to be zero unless otherwise stated. In this case, we have

= E [x x ], and the autocorrelation at lag r is ρ = E [x x ] / var x var x = c / c . Given a timer t t −r r t t −r t t −r r 0√dddddddddddd

Σ r

n

t t −r −r r1

s 1 n r rt =r +

eries data set x , . . . , x , we estimate c by c =n1hh x x (define c = c ), and we estimate ρ by

ρ = c / c .r r 0

White Noise Processes

yA process {ε } is said to be white noise if it is weakly stationary, and if the ε are mutuallt t

ut t2

ε2

t .

T

uncorrelated. In the case of zero mean, we have E [ε ] = 0, E [ε ] = σ < ∞, and E [ε ε ] = 0 if t ≠ u

he white noise process is a basic building block for more complicated time series models. Note that

we have not assumed that the ε are independent. Remember that independence implies uncorrelated-t

ness, but that it is possible to have uncorrelated random variables which are not independent. In the

Gaussian case, however, independence and uncorrelatedness are equivalent properties.

The white noise process is not linearly forecastable, in the sense that the best linear forecast of

ε based on ε , ε . . . is simply the series mean (zero), and does not depend on the present andt +1 t t −1

- 2 -

past observations.

Autoregressive Processes

sWe say that the process {x } is autoregressive of order p (AR(p)) if there exist constantt

a 1 p, . . . , a such that

x = a x + ε ,p

k t −k t1

tk =Σ

t t t −1 t −2 s

e

where {ε } is zero-mean white noise, and ε is uncorrelated with x , x . . . . The AR (p ) proces

xists and is weakly stationary if and only if all the roots of the polynomial P (z ) = 1 − a z . . . − a z p1 p

t

s

lie outside the unit circle in the complex plane. An example of an autoregressive process which is no

tationary is the random walk, x = x + ε , which is an AR (1) with a = 1. We can verify that the ran-

d

t t −1 t 1

om walk is not stationary by noting that the root of P (z ) = 1 − z is z = 1, which is on the unit circle.

For a general AR (p ), if we multiply x by x (r > 0) and take expectations, we obtaint t +r

r tk =1

p

k t +r −k t t +rk =1

p

k r −kc Σ Σ= E [x a x + x ε ] = a c .

The equations c = a c are called the Yule-Walker Equations.rk =1

p

k r −kΣ

Moving Average Models

bWe say that {x } is a moving average of order q (MA (q )) if there exist constants b , . . . ,t 1 q

such that

x = b ε ,q

k t −k0

tk =Σ

w 0here b = 1. The MA (q ) process has a finite memory, in the sense that observations spaced more than

b

q time units apart are uncorrelated. In other words, the autocovariance function of the MA (q ) cuts off

eyond lag q . This contrasts with the autocovariance function of the AR (p ), which decays exponen-

tially, but does not cut off.

- 3 -

s

W

Autoregressive Moving Average Model

e say that {x } is an autoregressive moving average process of order p , q (ARMA (p , q )) ift

1 p 1 q tthere exist constants a , . . . , a , b , . . . , b such tha

x = a x + b ε + ε ,q

j t − j t1

p

j t − jj =1

tj =Σ Σ

2w t t t −1 t −here {ε } is zero-mean white noise, and ε is uncorrelated with x , x . . . . The ARMA (p , q )

r

a

process exists and is weakly stationary if and only if all the roots of the polynomial P (z ) defined earlie

re outside the unit circle. The process is said to be invertible if all the roots of the polynomial

Q (z ) = 1 + b z + . . . + b z lie outside the unit circle. A time series is invertible if and only if it has1 qq

an infinite-order autoregressive (AR (∞)) representation of the form

x = π x + ε ,tj =1

∞

j t − j t

Σ

Σ

2w j jhere π are constants with π < ∞.

Linear Forecasting of ARMA Processes

fIn the linear prediction problem, we want to forecast x based on a linear combination ot +h

xt t −1, x . . . , where h > 0 is the lead time. It can be shown that the best linear forecast, i.e., the one

which makes the mean squared error of prediction as small as possible, denoted by x , is determinedt +h

b t +h t t −1y two characteristics: (1) x can be expressed as a linear combination of x , x . . . , and (2) The

rprediction error x − x is uncorrelated with all linear combinations of x , x . . . . Thus, fot +h t +h t t −1

p

k t +1−k1

t +1k =Σ p

l

example, the best one-step linear forecast in the AR (p ) process is x = a x . The best one-ste

inear forecast in the MA (q ) model is x = b ε , which, in order to be useful in practice, mustq

k t +1−k1

t +1k =Σ

t t −1 s

A

be expressed as a linear combination of the observations x , x . . . , by inverting the process to it

R (∞) representation.

- 4 -

g

I

Optimal Forecastin

t is not necessarily true that the best linear forecast is the best possible forecast. It may be that

t

l

some nonlinear function of the observations gives a smaller mean squared prediction error than the bes

inear forecast. It can be shown that the optimal forecast, in terms of mean squared error, is the condi-

tional expectation, E [x e x , x . . . ]. This is a function of the present and past observations. If thet + h t t −1

{ tx } are Gaussian, then this function will be linear. Thus, in the Gaussian case, the optimal predictor is

)

r

the same as the best linear predictor. In fact, for any zero-mean process with a one-sided MA (∞

epresentation, x = b ε , if the conditional expectation of ε given the available information is∞

k t −k t +10

tk =Σ

zero, it will be true that the optimal predictor is the same as the best linear predictor. In general, how-

ever, the optimal predictor may be nonlinear.

THE BASICS OF ARMA MODELS - NYU Stern School of...

Documents

Transcript of THE BASICS OF ARMA MODELS - NYU Stern School of...