Multivariate Time Series: VAR p Processes and...

Multivariate Time Series: VAR(p) Processes and

Models

A VAR(p) model, for p > 0 is

Xt = φ0 + Φ1Xt−1 + · · · + ΦpXt−p + At,

where Xt, φ0, and Xt−i are k-vectors, Φ1, . . . , Φp are k×k matrices,

with Φp 6= 0, and {At} is a sequence of serially uncorrelated

k-vectors with 0 mean and constant positive definite variance-

covariance matrix Σ.

We also can write this using the back-shift operator as

(I − Φ1B − · · · − ΦpBp)Xt = φ0 + At,

or

Φ(B)Xt = φ0 + At,

1

Companion Matrix

We can sometimes get a better understanding of a k-dimensional

VAR(p) process by writing it as a kp VAR(1).

It is

Yt = Φ∗Xt−1 + Bt

where

0 I 0 · · · 00 0 I · · · 0... ... ... ... 00 I 0 · · · 0Φp Φp−1 Φp−2 · · · Φ1

.

This is sometimes called the companion matrix.

The key fact here is that stationarity can be assessed by looking

at the eigenvalues of Φ∗.

2

Number of Terms in Time Series Models

The two common general types of time series models incorporate

past history either through linear combinations of past observa-

tions (AR) or of previous errors (shocks) in the system (MA).

To use either type of model, we need to decide on the order.

For an AR model, we do that by using a sequence of partial

models.

Recall: We consider the models

Rt ≈ φ0,1 + φ1,1Rt−1

Rt ≈ φ0,2 + φ1,2Rt−1 + φ2,2Rt−2...

Rt ≈ φ0,p + φ1,pRt−1 + · · · + φ2,pRt−p

The coefficients φi,i constitute the partial autocorrelation func-

tion (PACF). (The “argument” for the function is the index.)

3

The Partial Autocorrelation Function (PACF) in

AR Models

The PACF is useful for an AR model because we can “partial

out” the dependence.

Consider AR(1): Rt = φRt−1 + At.

We have γ(2) = φ2γ(0) for Rt and Rt−2.

Could we get a covariance of something to go to 0 at lag 2?

Consider Rt − φRt−1 and Rt−2 − φRt−1. The covariance is 0.

This is the idea behind the PACF; for Rt and Rt+h, regress each

on the Rk’s between them.

The important result is that the PACF in an AR(p) model is

0 beyond lag p; that is φp+1,p+1 = 0; hence, we can use it to

identify p.

4

The Partial Autocorrelation Function (PACF) in

AR Models

The question is how to use the sample PACF.

Often, since after all, we use it to build a model, we just use

simple graphs to decide at what order the sample PACF has died

off.

More formally, if the errors in the AR(p) model are iid with

mean 0, then the sample PACFs beyond p are asymptotically

iid N(0,1/n).

5

Number of Terms in a VAR(p) Model

We use similar ideas to determine the p in a VAR(p) model.

Instead of scalar partial correlations, however, we have partial

covariance matrices.

It’s a little harder even to get started.

We take a parametric approach using a multivariate normal dis-

tribution.

Residuals from a partial true model have a PDF of the form

f(r) =1

(2π)k/2|Σ|1/2exp

((r − µr)

TΣ−1(r − µr)/2)

.

6

Sequential Tests for Φj = 0 in a VAR(p) Model

We consider a sequence of VAR models,

Xt = φ0 + Φ1Xt−1 + At

Xt = φ0 + Φ1Xt−1 + Φ2Xt−2 + At...

Xt = φ0 + Φ1Xt−1 + · · · + ΦiXt−i + At...

where Xt, φ0, and Xt−i are k-vectors, Φ1, . . . , Φi are k×k matrices,

with Φi 6= 0, and {At} is a sequence of serially uncorrelated

k-vectors with 0 mean and constant positive definite variance-

covariance matrix Σ.

We test sequentially that Φh = 0, using likelihood ratio tests.

The likelihood ratio leads to two similar tests, “Wald tests” and

“score tests” (also called “Rao tests” and “Lagrange multiplier

tests”). In a Wald test, we use the MLE under the hypothesized

model.

7


We’ll use a Wald test, using given data x1, . . . , xn.

That means to test a model with i ≥ 1 terms versus a model

with i − 1 terms, the log of the likelihood ratio only involves

− log

(|Σ̂i|

|Σ̂i−1|

),

where the Σ̂’s are the MLEs of the variance-covariance of the

errors in the model with the appropriate number of terms.

When i = 1, Σ̂0 is just the sample variance of the x’s.

With proper normalizing factors shown in equation (8.18) on

page 406 (derived by Tiao and Box), the log likelihood ratio

has an asymptotic chi-squared distribution with k2 degrees of

freedom under the null hypothesis. (This asymptotic distribution

holds under what I call the Le Cam regularity conditions – see

Gentle (2013), page 169. These are satisfied if our likelihood is

correct in the first place!)

8


We test

H0 : Φ1 = 0 versus H1 : Φ1 6= 0.

What next?

In practice, whether or not we reject, we may try H0 : Φ2 = 0,

but usually we don’t – we proceed to the next model only if we

reject the preceding hypothesis.

I am not sure whether there is an R function that does these

tests directly, but the output of VAR in the var package can easily

be used to compute the statistic.

9

The ARCH Effect in a VAR(p) Model

An extension to the VAR model allows for the volatility to vary

as in an ARCH or GARCH model.

The R function serial.test in the var package computes the

portmanteau test statistic for the ARCH effect (at least if the

model is VAR+ARCH).

10

Forecasting with a VAR(p) Model

Forecasting with a VAR(p) model is similar to the same thing in

a univariate model.

Given xt, . . . , xt−p+1, the 1-step-ahead forecast at time t is

Xt(1) = φ0 +p∑

i=1

ΦiXt+1−i,

and the forecast error is At+1.

Substituting, we get the 2-step-ahead forecast at time t as

Xt(2) = φ0 + Φ1Xt(1) +p∑

i=2

ΦiXt+2−i,

and the forecast error is At+2 + Φ1At+1.

11

Impulse Response Function

We can also express a causal VAR(p) as an infinite moving av-

erage model model just as we did with a univariate model:

Xt = θ0 + At + Ψ1At−1 + Ψ2At−2 + · · ·

The coefficient matrices in such an infinite MA model are called

impulse response functions.

What’s “causal”?

12

Vector Moving-Average or VMA(q) Models and

VARMA(p, q) Models

The vector moving-average or VMA(q) model is the obvious ex-

tension of the univariate MA model.

We can write it as

Xt = θ0 + At − Θ1At−1 − · · · − ΘqAt−q

or

Xt = θ0 + Θ(B)At.

The differences between a VMA and an MA are similar to the

differences between a VAR and an AR.

Also, just as we combine an AR model and an MA model, we

combine a VAR and a VMA to get a vector ARMA or VARMA(p, q)

model.

13

Marginal Models of Components of VMA(q)

Models

The marginal models of a VMA(q) model are just MA(q) models.

We see this because the cross-correlation matrix of Xt vanishes

after lag q, and so we can write Xit as

Xit = θi0 +q∑

j=1

θi,jBi,t−j,

where {Bi,t−j} is a sequence of uncorrelated random variable with

0 mean and constant variance.

14

Marginal Models of Components of VAR(p)

Models

One approach to studying the marginal components of VAR(p)

models is by use of the structural equations. These are formed

by diagaonalizing the variance-covariance matrix of At, as we

discussed last week for a VAR(p) model. This approach shows

the concurrent relationships of one component to all the others.

Another approach is to obtain explicit representations of all of

the component series as AR models.

We can do this if we can diagonalize the AR polynomial coeffi-

cient matrix in a VAR(p) model.

15

Marginal Models of Components and

Diagonalizing Matrices

Some technical notes are in order here. A nonnegative definite matrix canalways be diagonalized by a Cholesky decomposition, but not all square ma-trices can be diagonalized. A matrix that can be diagonalized is called aregular matrix. (See Gentle, 2007, pages 116 and following, for conditionsand general discussion of the problem.)

One general method of diagonalizing a regular matrix A is to use the matrixV whose columns are linearly independent eigenvectors of A and C is thediagonal matrix whose elements are the eigenvalues of A. This requires bothpremultiplication and postmultiplication of A, and if the matrix is not of fullrank, requires some rearrangement of the rows and columns.

16

Marginal Models of Components

We’ll just do the example in the text for the VAR(1) case for

k = 2.

The bivariate VAR(1) model is[

1 − Φ11B −Φ12B−Φ21B 1 − Φ22B

] [X1tX2t

]=

[A1,tA2,t

]

We premultiply both sides by[

1 − Φ22B Φ12BΦ21B 1 − Φ11B

].

This gives us the marginal models, in which each AR component

has a coefficient of (1 − Φ11B)(1 − Φ22B) − Φ12Φ21B2.

Note, however, that we have AR(2) models on the left side and

we have MA(1) models on the right side; that is, a bivariate

VAR(1) model became two marginal ARMA(2,1) models.

17

Marginal Models of Components

This idea generalizes (with a lot of tedious algebra).

A k-variate VAR(p) model yields k ARMA(kp, (k − 1)p) models.

The VMA(q) part of the original VARMA may add up to q ad-

dition MA components. In general, however, the number of MA

components is min((k − 1)p, q).

We next consider some other ways that decomposing a VAR(p)

model can lead to new insights about the process in some cases.

This is the case where we have cointegration.

18

Unit-Root Nonstationarity

Many economic time series exhibit either (apparent) random walk

behavior,

Pt = Pt−1 + At,

or random walk with a drift behavior,

Pt = µ + Pt−1 + At,

where {At} is iid with variance σ2A.

Either of these processes has unit-root nonstationarity.

These processes can be made stationary by differencing; that is,

the series is “integrated”.

We speak of integrated series of order d, and denote as I(d), if

d differences result in a stationary process.

Notice the effects of the nonstationarity.

19

Simple Random Walk Process

In the simple random walk process, the k-step ahead forecast is

P̂t(k) = E(Pt+k|pt, pt−1, . . . , p0) = pt.

It is not mean reverting.

The forecast error is

et(k) = at+k + · · · + at+1

Its variance is

V(Et(k)) = kσ2A.

The forecast has no value.

20

Random Walk Process with Drift

In the simple random walk process, the k-step ahead forecast is

P̂t(k) = E(Pt+k|pt, pt−1, . . . , p0) = kµ + p0.

It is not mean reverting.

The conditional variance of Pt is tσ2A, which grows without bound.

I should mention one more type of nonstationary process. It is

the trend-stationary process,

Pt = α0 + α1t + At.

Notice that this process is not stationary because of its mean;

its variance, however, is time invariant.

This process can be made stationary by detrending, that is, by

subtracting βt.

21

Spurious Regressions

First, consider two trend-stationary processes,

Yt = α0 + α1t + At

and

Xt = δ0 + δ1t + Bt,

that have nothing to do with each other (i.e., everything is “in-

dependent”).

Now, consider the regression of Yt on Xt:

Yt = β0 + β1Xt + εt= β0 + β1(δ0 + δ1t + Bt) + εt= γ0 + (β1δ1)t + εt.

The regression test will probably be significant.

This results from the trends.

It is “spurious”, however. Everybody knows this.

22

Spurious Regressions

Next, consider two random walks,

Yt = yt−1 + At

and

Xt = xt−1 + Bt,

that have nothing to do with each other (i.e., everything is “inde-

pendent”). For simplicity, assume that At and Bt are iid N(0,1).

Now, consider the regression of Yt on Xt (without intercept):

Yt = βXt + εt.

We see that

β = Cov(Yt, Xt)/V(Xt)

and

εt ∼ N(0, t).

23

Spurious Regressions of Random Walks

Granger and Newbold, in a very famous Monte Carlo study in

1974, found that the standard t test of H0 : β = 0 rejected 76%

of the time.

This example is very different from the spurious regressions of

one trend-stationary series on another.

The problem here is unit-root nonstationarity.

24

Spurious Regressions of Random Walks:

A Technical Aside

Consider a regression model of the form Yt = βxt + εt, with the

usual assumption of 0 correlations between all εt and V(εt) = σ2.

What about the relationship between xt and εt?

The asymptotic properties (relating to normality) will hold if xt

and εt are independent.

This is OK if xt is a constant.

What about if xt is a random variable?

This happens all the time in financial applications.

In these applications, however, we cannot assume that xt and εtare independent.

Can we find a weaker condition?

25

Spurious Regressions of Random Walks:

A Technical Aside (continued)

A weaker sufficient condition is called the martingale difference

assumption:

E(εt|xt, εt−1xt−1, . . . , ε1, x1) = 0, for all t

and

limt→∞

E(ε2t |xt, εt−1xt−1, . . . , ε1, x1) = σ2, almost surely.

The punchline is that the second condition is not satisfied in the

regression of one random walk on another.

The problem is that n−2∑x2t has a nondegenerate limiting dis-

tribution.

26

Unit-Root Nonstationarity and Cointegration

The spurious regression problem (as well as other issues) makes

consideration of unit-root nonstationarity in multivariate time

series important.

Now let’s consider unit-root nonstationarity in the context of a

VARMA.

There are different kinds of situations. In some cases the com-

ponent time series may not have any relationships to each other

(although spurious regressions may exist).

In some interesting cases, however, even though the component

series are unit-root nonstationary, a linear combination of some

of them is stationary.

This phenomenon is called cointegration.

27


The example in the text (p 428) is a good simple one to illustrate the idea.

We have the bivariate ARMA(1,1) model

Xt − ΦXt−1 = At − ΘAt−1

[X1t

X2t

]−

[0.50 −1.00

−0.25 0.50

] [X1,t−1

X2,t−1

]=

[A1,t

A2,t

]−

[0.20 −0.40

−0.10 0.20

] [A1,t−1

A2,t−1

]

We first determine the eigenvalues of the AR matrix.

> phi <- matrix(c(0.50,-0.25,-1.00,0.50),nrow=2)> eigen(phi)$values[1] 1.000000e+00 -5.421011e-20

We note that the AR coefficient matrix is singular. Also, we see that theother eigenvalue is 1. (This is a necessary condition of an idempotent matrix,but it is not sufficient. We note in this case, however, the the coefficientmatrix is idempotent.)

As illustrated on the previous slides, we write the model in the form thatuses the backshift operator, than then we obtain the marginal componentsby premultiplication by

[1 − 0.50B −1.00B−0.25B 1 − 0.50B

].

28


This premultiplication yields the coefficient matrix on the left as[

1 − B 00 1 − B

];

hence, we see that each component is unit-root nonstationary.

Now we seek a linear combination of the component time series

that is stationary.

Following Tsay, we transform the system as in equation (8.32).

29


By premultiplying by the generalized inverse of the coefficient

matrix,[

1.0 −2.0−0.5 1.0

]

we get equation (8.32), which has two linear combinations of

X1t and X2t,[

Y1tY2t

]−

[1.0 0

0 0

] [Y1,t−1

Y2,t−1

]=

[B1,tB2,t

]−

[0.4 0

0 0

] [B1,t−1

B2,t−1

]

The two linear combinations of X1t and X2t, that is, Y1t and Y2t,

are uncoupled.

Their concurrent correlation is the correlation between B1t and

B2t (which is not 0).

Y1t is unit-root nonstationary, but Y2t is stationary.

30

Cointegration

Y1t = X1t − 2X2t is called the common trend of X1t and X2t.

In Y2t = 0.5X1t + 1X2t = bT(X1t, X2t), the vector b = (0.5,1.0),

which yields a stationary process, is called the cointegration

vector.

In general, cointegration or order m exists within a multivariate

time series whenever all of the component series are unit-root

nonstationary, but there exist m > 0 linearly independent cointe-

gration vectors.

A financial interpretation of a cointegrated multivariate time se-

ries is that the components have some common threads that

result in linear combinations that have long-run equilibrium even

though the individual components are nonstationary and have

variances diverging to ∞.

31

Error Corrections

Unit-root nonstationarity problems can often be overcome by

differencing.

For the multivariate ARMA(p, q) process {Xt} that is cointe-

grated of order m, we seek some meaningful representation of

∆Xt = Xt − Xt−1.

In a cointegrated time series, we represent the differenced time

series as

∆Xt = CBTXt−1 +p−1∑

j=1

Φ∗j∆Xt−j + At −

q∑

i=1

ΘiAt−i,

where the C and B are k × m full rank matrices, the columns of

B are the cointegrating vectors, and for j = 1, . . . , p − 1,

Φ∗j = −

p∑

i=j+1

Φi

In this representation, BTXt−1 is stationary.

32

Error Correction Model (ECM) for a VAR(p)

Process

Let {Xt} be an I(0) or I(1) VAR(p) process.

Following the form of representation for the cointegrated ARMA(p, q)

process on the previous slide, we write the model in the form

∆Xt = µt + ΠXt−1 +p−1∑

j=1

Φ∗j∆Xt−j + At,

where Π = CBT.

The term ΠXt−1 is call the error correction term, and the model

is called the error correction model or ECM.

The rank of Π determines the extent of cointegration.

If rank(Π) = 0, there is no cointegration, and the process is

actually VAR(p − 1).

33

If rank(Π) = k, that is the matrix is full-rank, there is no coin-

tegration, and the process is just VAR(p).

If < 0rank(Π) = m < k, there is cointegration of order m.

Johansen’s Test

To test for cointegration in a nominal VAR(p) process is essen-

tially to test the rank of Π.

There is a likelihood ratio test for this, called Johansen’s test.

It is in the R function ca.jo in the urca package.

34

Cointegrated Financial Time Series

The only way to receive returns uniformly above a risk-adjusted

rate is by arbitrage.

In a fair and stable market, there is no arbitrage.

Whenever cointegrated time series exist, there is often the pos-

sibility that the two series do not reflect true value.

An example is pairs trading.

35

Multivariate Time Series: VAR p Processes and...

Documents

Transcript of Multivariate Time Series: VAR p Processes and...