Lecture 8: Univariate Processes with Unit Roots · Lecture 8: Univariate Processes with Unit...

Lecture 8: Univariate Processes with Unit Roots∗

1 Introduction

1.1 Stationary AR(1) vs. Random Walk

In this lecture, we will discuss a very important type of processes: unit root processes. For anAR(1) process

xt = φxt−1 + ut (1)

to be stationary, we require that |φ| < 1. Or in an AR(p) process, we require that all the roots of

1− φ1z − . . .− φpzp = 0

lie out of the unit circle.If one of the roots turns out to be one, then this process is called unit root process. In an AR(1)

process, we have φ = 1,xt = xt−1 + ut. (2)

It turns out that two processes with (|φ| < 1 and φ = 1) behave in very different manners. Forsimplicity, we assume that the innovations ut follows a i.i.d. Gaussian distribution with mean zeroand variance σ2.

First, I plot the following two graphs in figure 1. In the left graph, I set φ = 0.9 and in theright graph, I draw the random walk process, φ = 1. From the left graph, we see that xt movesaround zero and never gets out of the [−6, 6] region. There seems to be some force there whichpulls the process to its mean (zero). But in the right graph, we did not see a fixed mean, instead,xt moves ‘freely’ and in this case, it goes to as high as about 72. If we repeat generating the abovetwo process, we would see that the φ = 0.9 processes look pretty much the same; but the randomwalk processes are very different from each other. For instance, in a second simulation, it may godown to −80, say.

Above is some graphical illustration. Second, consider some moments of the process xt when|φ| < 1 and φ = 1. When |φ| < 1, we have that

E(xt) = 0 and E(x2) = σ2/(1− φ2),

when φ = 1, we no longer have constant unconditional moments, the first two conditional momentsare

E(xt|Ft−1) = xt−1 and E(x2t |Ft−1) = σ2.

∗Copyright 2002-2006 by Ling Hu.

1

0 100 200 300 400 500−6

−4

−2

0

2

4

6

8

0 100 200 300 400 5000

10

20

30

40

50

60

70

80

Figure 1: Simulated Autoregressive Processes with Coefficient φ = 0.9 and φ = 1

When we do k-period ahead forecasting, when |φ| < 1,

E(xt+k|Ft) = φkxt.

Since |φ| < 1, φk → 0 = E(xt) when k →∞. So as the forecasting horizon increases, the currentvalue of xt matters less and less since the conditional expectation converges to the unconditionalexpectation.

The variance of the forecasting

V ar(xt+k|Ft) = (1 + φ2 + . . . + φ2k)σ2 =1− φ2k+2

1− φ2σ2,

which converges to σ2/(1− φ2) as k →∞.Next, consider the case when φ = 1.

E(xt+k|Ft) = xt,

which means that the current value does matter (actually it is the only thing that matters) evenas k →∞!

The variance of the forecasting is

V ar(xt+k|Ft) = kσ2 →∞ as k →∞.

If we let x0 = 1, σ2 = 1, we could draw the forecasting of xt+k when φ = 0.9 and φ = 1 inFigure 2.

The upper graph in Figure 2 plots the forecasting for xk when φ = 0.9. The expectation forxk conditional on x0 = 1 drops to zero as k increases, and the forecasting error converges to theunconditional standard error quickly. The lower graph in Figure 2 plots the forecasting for xk whenφ = 1. Obviously, the forecasting interval diverges as k increases.

2

24

68

1012

1416

1820

−4 −2 0 2 4

k

Forecast of x(k) at time zero

24

68

1012

1416

1820

−4 −2 0 2 4 6

k

Forecast of x(k) at time zero

Figure 2: Forecasting of xk at time zero when φ = 0.9 and φ = 1 (x0 = 1, σ2 = 1)

A third way to compare a stationary and a nonstationary autoregressive process is to comparetheir impulse-response functions. We can invert (1) to

xt = ut + φut−1 + φ2ut−2 + . . . + φt−1u1

=t−1∑k=0

φkut−k

So the effect of a shock ut to xt+h is φh, which dies out as h increases. In the unit root case,

xt =t∑

k=1

uk,

The effect of ut on xt+h is one, which is independent of h. So if a process is a random walk, theeffects of all shocks on the level of {x} are permanent. Or the impulse-response function is flat atone.

Finally, we can compare the asymptotic distribution of the coefficient estimator of a stationaryand a nonstationary autoregressive process.

For an AR(1) process, xt = φxt−1 + ut where |φ| < 1 and ut ∼ i.i.d.N(0, σ2), we have shown inlecture note 6 that the MLE estimator of φ is asymptotically normal,

n1/2(φn − φ0) ∼ N

0, σ2

(n−1

n∑t=1

x2t

)−1 .

However, if φ0 = 1, then (n−1∑n

t=1 x2t )−1 goes to zero as n →∞. This implies that if φ0 = 1,

then φn converges at a order higher than n1/2.

3

Above we have only considered AR(1) process. In a general AR(p) process, if there is one unitroot, then the process is a nonstationary unit root process. Consider an AR(2) example. let λ1 = 1,and λ2 = 0.5, then

(1− L)(1− 0.5L)xt = εt, εt ∼ i.i.d.(0, σ2).

Then(1− L)xt = (1− 0.5L)−1εt = θ(L)εt ≡ ut.

So the difference of xt, ∆xt = (1− L)xt is a stationary process, and

xt = xt−1 + ut,

is a unit root process with serially correlated errors.

1.2 Stochastic Trend v.s. Deterministic Trend

In a unit root process,xt = xt+1 + ut,

where ut is a stationary process, then xt is said to be integrated of order one, denoted by I(1).An I(1) process is also said to be difference stationary, compared to trend stationary as has beendiscussed in the previous lecture. If we need to take difference twice to get a stationary process,then this process is said to be integrated of order two, denoted by I(2), and so force. A stationaryprocess can be denoted by I(0). If a process is a stationary ARMA(p, q) after taking kth differences,then the original process is called ARIMA(p, k, q). The ‘I’ here denotes integrated.

Recall that we learned about spectrum in lecture 3. The implications of these ‘integrate’ and‘difference’ operators to spectral analysis is that when you do ‘integration’ or sum, you filter outthe high frequency components and what remains are low frequencies; which is a feature of unitroot process. Recall that the spectrum of a stationary AR(1) process is

S(ω) =12π

(1 + φ2 − 2φcos ω)−1.

When φ → 1, we have S(ω) = 1/[4π(1− cos ω)]. Then when ω → 0, S(ω) → ∞. So processeswith stochastic trend have infinite spectrum at the origin. Recall that S(ω) decomposes the varianceof a process into components contributed by each frequencies. So the variance of a unit root processare largely contributed by low frequencies. On the other hand, when we do ‘difference’, we filterout the low frequencies and what remains are the high frequencies.

In the previous lecture, we discussed the processes with deterministic trend. We can comparea process with deterministic trend (DT) and a process with stochastic trend (ST) from two per-spectives. First, when we do k-period ahead forecasting, as k → ∞, the forecasting error for DTconverges to the variance of its stationary components, which is bounded. But as we see from theprevious section, the forecasting error for ST diverges as n → ∞. Second, the impulse-responsefunction for a DT is the same as in the stationary case: the effect of a shock dies out quickly. Whilethe impulse-function for ST is flat at one: the effect of all shocks on the level are permanent.

However, note that in Figure 1, we plot a simulated random walk, but part of its path lookslike to have a upward time trend. This turns out to be a quite general problem: over a short timeperiod, it is very hard to judge whether a process has a stochastic trend, or deterministic trend.

4

2 Brownian Motion and Functional Central Limit Theorem

2.1 Brownian Motion

To derive statistical inference of a unit root process, we need to make use of a very importantstochastic process – Brownian motion (also called Wiener process). To understand a Brownianmotion, consider a random walk

yt = yt−1 + ut, y0 = 0, ut ∼ i.i.d.N(0, 1). (3)

We can then write

yt =t∑

s=1

us ∼ N(0, t),

and the change in the value of x between dates t and s,

yt − ys = us+1 + us+1 + . . . + ut =t∑

i=s+1

ui ∼ N(0, t− s)

and it is independent of the change between dates r and q for s < t < r < q.Next, consider the change yt − yt−1 = ut ∼ i.i.d.N(0, 1). If we view ut as the sum of two

independent Gaussian variables,

ut = ε1t + ε2t, εit ∼ i.i.d.N

(0,

12

).

Then we can associate ε1t with the change between yt−1 and the value of y at some interim point(say, yt−(1/2)), and ε2t with the change between yt−(1/2) and yt:

yt−(1/2) − yt−1 = ε1t yt − yt−(1/2) = ε2t. (4)

Sampled at integer dates t = 1, 2, . . ., the process of (4) has the same properties as (3), since

yt − yt−1 = ε1t + ε2t ∼ i.i.d.N(0, 1).

In addition, the process of (4) is defined also at the non-integer dates and remains the propertyfor both integer and non-integer dates that yt − ys ∼ N(0, t − s) with yt − ys independent of thechange over any other nonoverlapping interval. Using the same reasoning, we could partition thechange between t− 1 and t into N separate subperiods:

yt − yt−1 =n∑

i=1

εit, εit ∼ i.i.d.N

(0,

1n

).

when n →∞, the limit process is known as Brownian motion. The value of this process at date t isdenoted by W (t). A realization of a continuous time process can be viewed as a stochastic functionW (·). In particular, we will be interested in Brownian motion over the interval t ∈ [0, 1].

Definition 1 (Brownian Motion) A standard Brownian motion W (t), t ∈ [0, 1], is a continuoustime stochastic process such that

5

(a) W (0) = 0

(b) For any time 0 < s < t < 1, W (t)−W (s) ∼ N(0, t− s). And the differences W (t2)−W (t1)and W (t4)−W (t3), for any 0 ≤ t1 < t2 < t3 < t4 ≤ 1, are independent.

(c) W(t) is continuous in time t with probability 1.

So given a standard Brownian motion W (t), we have E(W (t)) = 0, V ar(W (t)) = t, andCov(W (t),W (s)) = min(t, s). Other Brownian motions can be generated from a standard Brownianmotion. For example, the process Z(t) = σ ·W (t) has independent increments and is distributedN(0, σ2t). Such a process is described as Brownian motion with variance σ2.

An important feature of Brownian motion is that although it is continuous in t, it is notdifferentiable using standard calculus. The direction of change at t is likely to be completelydifferent from that at t + ∆, no matter how small is ∆. Even some parts of the realization of aBrownian motion looks ‘smooth’, if we see it with a ‘microscope’, we will see many zig-zags.

There are several concepts of smoothness of a function and continuity is the weakest one.Differentiability is another concept of smoothness. When the domain of the function is an interval,we have another smoothness condtion. A function f : [a, b] 7→ R is of bounded variation if ∃M < ∞such that for every partition of [a, b] by finite collections of points a = x0 < x1 < x2 < . . . < xn = b,

n∑k=1

|f(xi)− f(xi−1)| ≤ M.

Brownian motion is not of bounded variation.Later in this lecture, we will also see integrals of Brownian motion (

∫ 10 W (r)dr) and a stochastic

integral (∫ 10 W (r)dW (r), for r ∈ [0, 1]. First, note that W (r) is a Gaussian process, hence

∫ 10 W (r)dr

is also a Gaussian process. It is easy to see that E[∫ 10 W (r)dr] = 0. To compute its variance, let

s ≤ t and write

E

[∫ 1

0W (r)dr

]2

= 2∫ 1

0

∫ r

0E[W (r)W (s)]dsdr

= 2∫ 1

0

∫ r

0sdsdr

=∫ 1

0r2dr =

13

Therefore,∫ 10 W (r)dr ∼ N(0, 1/3). As another exercise, consider the distribution of W (1) −∫ 1

0 W (r)dr. Again, it is a Gaussian process with zero mean. To compute its variance,

E

[W (1)−

∫ 1

0W (r)dr

]2

= 1 +13− 2E

[W (1)

∫ 1

0W (r)dr

]=

43− 2

∫ 1

0rdr =

13.

To study the stochastic integral, we need a fundamental theorem in stochastic calculus:

6

Definition 2 (Ito’s Lemma) Let Xr be a process given by

dX(r) = udr + vdW (r).

Let g(r, x) be a twice continuously differentiable real function. Let

Y (r) = g(r, Xr),

then

dY (r) =∂g

∂r(r, x)dr +

∂g

∂x(r, x)dX(r) +

12

∂2g

∂x2(r, x) · (dX(r))2,

where (dX(r))2 = (dX(r)) · (dX(r)) is computed according to the rules

dr · dr = dr · dW (r) = dW (r) · dr = 0, dW (r) · dW (r) = dr.

Now, choose X(r) = σW (r), g(r, x) = 12x2. Then

Y (r) = g(r, X(r)) =12X(r)2 =

12σ2W (r)2.

By Ito’s lemma,

dY (r) =∂g

∂rdr+

∂g

∂xdX(r)+

12

∂2g

∂x2·(dX(r))2 = σ2W (r)dW (r)+

12(σdW (r))2 = σ2W (r)dW (r)+

12σ2dr.

where the first derivative of g with respect to x gives Xr = σW (r), and dXr = σdW (r). Hence,

d

(12σ2W (r)2

)= σ2W (r)dW (r) +

12σ2dr.

Integrate them from 0 to 1,

12σ2W (1)2 = σ2

∫ 1

0W (r)dW (r) +

12σ2,

therefore,

σ2

∫ 1

0W (r)dW (r) =

12σ2(W (1)2 − 1).

2.2 Functional Central Limit Theorem (FCLT)

2.2.1 Introduction

Recall that the central limit theorem (CLT) tells that the sample mean of a stationary process isasymptotically normal and centered at the the population mean. However, if xt is a random walk,there is no such a thing as ‘population mean’. Therefore, to draw inference for processes with unitroot, we need a new tool which is called functional central limit theorem’ (FCLT), the central limittheorem defined on the function spaces. FCLT is important to unit root limit theory just as CLTis important to stationary time series limit theory.

As usual, let n denotes the sample size, and we let r = t/n, so r ∈ [0, 1]. And we use the symbol[nr] to denote the largest integer that is less than or equal to nr.

7

Consider a process ut ∼ i.i.d.(0, σ2) with its mean denoted by un, then CLT tells that n1/2un →N(0, σ2). Now consider that given a sample of size n, we calculate the mean of the first half of thesample and throw out the rest of the observations:

u[n/2] =1

[n/2]

[n/2]∑t=1

ut.

This estimator also satisfies the CLT:√[n/2]u[n/2] → N(0, σ2).

Moreover, this estimator would be independent of an estimator tha uses only the second half of thesample. More generally, let’s construct a new random variable Xn(r) for r ∈ [0, 1],

Xn(r) = (1/n)[nr]∑t=1

ut.

or

Xn(r) =

0 for r ∈ [0, 1/n)u1/n for r ∈ [1/n, 2/n)(u1 + u2)/n for r ∈ [2/n, 3/n)...(u1 + u2 + . . . + un)/n for r = 1.

It is easy to see that n1/2Xn(1) = n1/2un and CLT tells that it converges to N(0, σ2). But whatXn(r) converges to as n →∞? Write

n1/2Xn(r) =1√n

[nr]∑t=1

ut =

√[nr]√n

1√[nr]

[nr]∑t=1

ut.

By CLT we have [nr]−1/2∑[nr]

t=1 ut → N(0, σ2), while ([nr]/n)1/2 → r1/2, therefore we have

n1/2Xn(r)/σ → N(0, r). (5)

Next if we consider the behavior of a sample mean based on observations [nr1] through [nr2]for r2 > r1, this is also asymptotically normal using similar approach,

n1/2(Xn(r2)−Xn(r1))/σ → N(0, r2 − r1)

and it is independent of the estimator in (5) for r < r1. Therefore, the sequence of stochasticfunctions {

√nXn(·)/σ}∞n=1 has an asymptotic probability law:

n1/2Xn(·)/σ → W (·). (6)

Note that here Xn(·) is a function, while in (5) Xn(r) is a random variable. The asymptoticresult (6) is known as the functional central limit theorem (FCLT). Later on, we may also writen1/2Xn(r) → W (r), but note this does not mean the variable Xn(r) converges to a variable which

8

has N(0, r) distribution, but that the function converges to a stochastic function: the standardBrownian motion.

Evaluated at r = 1, the function Xn(r) is just the sample mean. Thus when the function in (6)is evaluated at t = 1, we get the conventional CLT:

√nXn(1)/σ =

1σ√

n

n∑t=1

ut → W (1) ∼ N(0, 1). (7)

2.2.2 Convergence of a random function

In lecture 4, we discussed various convergence and continuous mapping theorem for a randomvariable. Now, let’s define convergence of a random function, such as Xn(r) we defined earlier.

We first define convergence in distribution for a random function. Let S(·) represent a continuous-time stochastic process with S(r) representing its value at some date r for r ∈ [0, 1]. Also supposethat for any given realization, S(·) is a continuous function of r with probability 1. For {Sn(·)}∞n=1

a sequence of such continuous functions, we say that Sn(·) →d S(·) if all of the following hold:

(a) For any finite collection of k particular dates, 0 ≤ r1 < r2 < . . . < rk ≤ 1, the sequence ofk-dimensional random vectors {yn}∞n=1 converges in distribution to the vector y, where

yn ≡

Sn(r1)Sn(r2)

...Sn(rk)

y ≡

S(r1)S(r2)

...S(rk)

;

(b) For each ε > 0, the probability that Sn(r1) differs from Sn(r2) for any dates r1 and r2 withinδ of each other goes to zero uniformly in n as δ → 0.

(c) P (|Sn(0)| > λ) → 0 uniformly in n as λ →∞.

Next, we will extend convergence in probability for a random function. Let {Sn(·)}∞n=1 and{Vn(·)}∞n=1 denote sequences of random continuous functions with Sn : r ∈ [0, 1] 7→ R and Vn : r ∈[0, 1] 7→ R. Define Yn as:

Yn = supr∈[0,1]

|Sn(r)− Vn(r)|.

Then Yn is a sequence of random variables. If Yn →p 0 (this is the usual convergence inprobability for a random variable), then we have that

Sn(·) →p Vn(·).

In other words, we define convergence in probability of a random function in terms of conver-gence of the upper bound of its distance from the limit function. Further, if Vn(·) →p Sn(·) andVn(·) →d V (·) where S(·) is a continuous function, then Vn(·) →d S(·).

Example 1 Let ut be strictly stationary time series with finite fourth moment, and let Sn(r) =n−1/2u[nr]. Then Sn(·) →p 0. Proof:

P

(sup

r∈[0,1]|Sn(r)| > δ

)

9

= P{

[|n−1/2u1| > δ], or [|n−1/2u2| > δ], . . . or [|n−1/2un| > δ]}

≤ nP (|n−1/2ut| > δ)

≤ nE(n−1/2ut)4

δ4

=E(u4

t )nδ4

→ 0.

So we ave Sn(·) →p 0.

In Lecture 4, we also reviewed that the continuous mapping theorem (CMT) tells that if xn → x,and g(·) is a continuous function, then we have g(xn) → g(x). We have a similar results for theFCLT. If Sn(·) → S(·), and g(·) is a continuous functional, then g(Sn(·)) → g(S(·)). For example,√

nXn(·)/σ →d W (·) implies that√

nXn(·) →d σW (·) ∼ N(0, σ2r). (8)

As another example, letSn(r) ≡ [

√nXn(r)]2. (9)

Since√

nXn(r) →d σW (·), it follows that

Sn(·) →d σ2[W (·)]2. (10)

2.3 Applications to Unit Root Processes

The simplist case to illustrate how to use FCLT to compute the asymptotics is to consider a randomwalk yt with y0 = 0,

yt = yt−1 + ut =t∑

i=1

ui, ut ∼ i.i.d.N(0, σ2).

Define Xn(·) as:

Xn(r) =

0 for r ∈ [0, 1/n)y1/n for r ∈ [1/n, 2/n)y2/n for r ∈ [2/n, 3/n)...yn/n for r = 1.

(11)

If we integrate Xn(r) over r ∈ [0, 1], we have∫ 1

0Xn(r)dr = y1/n2 + y2/n2 + . . . + yn−1/n2

= n−2n∑

t=1

yt−1

Multiplying its both sides by√

n:∫ 1

0

√nXn(r)dr = n−3/2

n∑t=1

yt−1.

10

From (8) we know that√

nXn(·) →d σW (·), by CMT,∫ 1

0

√nXn(r)dr →d σ

∫ 1

0W (r)dr.

therefore, we got the limit for n−3/2∑n−1

t=1 yt,

n−3/2n∑

t=1

yt−1 → σ

∫ 1

0W (r)dr. (12)

Thus, when yt is a driftless random walk, its sample mean 1n

∑nt=1 yt diverges but n−3/2

∑nt=1 yt

converges. An alternative way to find the limit distribution of n−3/2∑n

t=1 yt follows:

n−3/2n∑

t=1

yt−1 = n−3/2[u1 + (u1 + u2) + . . . + (u1 + u2 + . . . + un−1)]

= n−3/2[(n− 1)u1 + (n− 2)u2 + . . . + un−1]

= n−3/2n∑

t=1

(n− t)ut

= n−1/2n∑

t=1

ut − n−3/2n∑

t=1

tut

while from the previous lecture, we know that[n−1/2

∑nt=1 ut

n−3/2∑n

t=1 tut

]→d N

([00

], σ2

[1 1

212

13

]). (13)

Therefore n−3/2∑n

t=1 yt is asymptotically Gaussian with mean zero and variance equal to σ2[1−2(1/2) + 1/3] = σ2/3. From this expression we also have

n−3/2n∑

t=1

tut = n−1/2n∑

t=1

ut − n−3/2n∑

t=1

yt−1

→ σW (1)− σ

∫ 1

0W (r)dr (14)

Using similar methods we could compute the asymptotic distribution of the sum of squares ofa random walk. Define

Sn(r) = n[Xn(r)]2.

and it can be written as

Sn(r) =

0 for r ∈ [0, 1/n)y21/n for r ∈ [1/n, 2/n)

y22/n for r ∈ [2/n, 3/n)

...y2

n/n for r = 1

(15)

11

Again we compute the sum∫ 1

0Sn(r)dr = y2

1/n2 + y22/n2 + . . . + y2

n−1/n2.

Since we have that Sn(r) → σ2W (r)2, by CMT,

n−2n∑

t=1

y2t−1 → σ2

∫ 1

0[W (r)]2dr. (16)

If we make use of n−3/2∑n

t=1 yt−1 →d σ∫ 10 W (r)dr and for r = t/n , we also have

n5/2n∑

t=1

tyt−1 = n−3/2n∑

t=1

(t/n)yt−1 →d σ

∫ 1

0rW (r)dr. (17)

Similarly, for r = t/n and we use (16) to get

n−3n∑

t=1

ty2t−1 = n−2

n∑t=1

(t/n)y2t−1 →d σ2

∫ 1

0r[W (r)]2dr. (18)

Another useful result is

n−1n∑

t=1

yt−1ut → (1/2)σ2[W (1)2 − 1].

Proof: firsty2

t = (yt−1 + ut)2 = y2t−1 + 2yt−1ut + u2

t ,

so

n−1n∑

t=1

yt−1ut

= n−1(1/2)n∑

t=1

(y2t − y2

t−1)− n−1(1/2)n∑

t=1

u2t

= n−1(1/2)y2n − n−1(1/2)

n∑t=1

u2t

By (6), we have n−1yn → σW (1), by CMT, we have

n−1(1/2)y2n → (1/2)σ2W (1)2.

By LLN,

n−1(1/2)n∑

t=1

u2t → (1/2)σ2,

Therefore,

n−1n∑

t=1

yt−1ut → (1/2)σ2[W (1)2 − 1] (19)

12

3 Unit Root Tests

3.1 Unit Root Tests with i.i.d Error

The asymptotics of a a random walk with i.i.d. shocks is summarized in the following proposition.The number in bracket shows where the result is first introduced and proved.

Proposition 1 Suppose that ξt follows a random walk without drift,

ξt = ξt−1 + ut, ξ0 = 0, ut ∼ i.i.d(0, σ2).

Then

(a) n−1/2∑n

t=1 ut →d σW (1) [7];

(b) n−1∑n

t=1 ξt−1ut →d12σ2[W (1)2 − 1] [19];

(c) n−3/2∑n

t=1 tut →d σW (1)− σ∫ 10 W (r)dr [14];

(d) n−3/2∑n

t=1 ξt−1 →d σ∫ 10 W (r)dr [12];

(e) n−2∑n

t=1 ξ2t−1 →d σ2

∫ 10 W (r)2dr [16];

(f) n−5/2∑n

t=1 tξt−1 →d σ∫ 10 rW (r)dr [17];

(g) n−3∑n

t=1 tξ2t−1 →d σ2

∫ 10 rW (r)2dr [18].

(h) n−v+1∑n

t=1 tv → 1/(v + 1) for v = 0, 1, 2, . . . [lecture 7]

Note that all those W (·) is the same Brownian motion, so all those results are correlated. If weare not interested in their correlations, we can find simpler expressions for them. For example, (a)is just N(0, σ2), (b) is (1/2)σ2[χ2(1)− 1], (c) and (d) are N(0, σ2/3).

In general, the correspondence between the finite sample and their limits are like∑n

t=1 →∫ 10 ,

(t/n) → r, (1/n) → dr, n−1/2ut → dW , etc. Take (h) as an example, and let v = 2. From previouslecture we know that n−3

∑nt=1 t2 → 1/3. Using the correspondence here, we have

n−3n∑

t=1

t2 = n−1n∑

t=1

(t/n)2 →∫ 1

0r2dr = 1/3.

3.1.1 Case 1

Suppose that the data generating process (DGP) is a random walk, and we are estimating theparameter ρ by OLS in the regression

yt = ρyt−1 + ut, ut ∼ i.i.d(0, σ2), (20)

13

where ρ = 1 and we are interested in the asymptotic distributions of the OLS estimates ρn:

ρn =∑n

t=1 yt−1yt∑nt=1 y2

t−1

=∑n

t=1 yt−1(yt−1 + ut)∑nt=1 y2

t−1

= 1 +∑n

t=1 yt−1ut∑nt=1 y2

t−1

Then

n(ρn − 1) =n−1

∑nt=1 yt−1ut

n−2∑n

t=1 y2t−1

.

By (19) (result b), (16) (result e) and CMT, we have

n(ρ− 1) → W (1)2 − 1

2∫ 10 W (r)2dr

. (21)

First we note that (ρn−1) converges at the order of n, instead of n1/2, as in the cases when |ρ| <1. Therefore when the true coefficient is unity, ρn is superconsistent. Second, since W (1) ∼ N(0, 1),W (1)2 ∼ χ2(1). The probability that χ2(1) is less than one is 0.68, therefore with probability 0.68n(ρn − 1) will be negative, which implies that its limit distribution is skewed to the left. Recallthat in the AR(1) regression with |ρ| < 1, the estimate ρn is downward biased. However, its limitdistribution

√n(ρn − ρ) is still symmetric around zero. While when the true value of ρ is unity,

even the limit distribution of n(ρ− 1) is asymmetric with negative values twice as likely as positivevalues.

In practice, critical values for the random variable in (21) are found by computing the exactfinite sample distribution of n(ρ − 1) assuming ut is Gaussian. Then the critical value can betabulated by Monte Carlo or by numerical approximation.

There are two commonly used approaches to test the hypothesis that ρ0 = 1: Dickey-Fullerρ-test and Dickey-Fuller t-test. The DF ρ-test is to compute the statistics n(ρn − 1) and comparethe statistics with the critical values from the distribution in (21). The advantage of this approachis that we don’t need to compute its standard deviation. Alternatively, we could use DF t-testwhich is based on the usual t statistics,

tn =ρn − 1

σρ. (22)

where σρ is the standard deviation of OLS estimated coefficient,

σ2ρ =

s2n∑n

t=1 y2t−1

, (23)

and

s2n =

1n

n∑t=1

(yt − ρnyt−1)2.

14

Plug (23) into (22), we have

tn =n−1

∑nt=1 yt−1ut(

n−2∑n

t=1 y2t−1

)1/2 (s2n)1/2

.

If ρ → ρ = 1, which is true for OLS estimator in the present problem, then s2n → σ2 by LLN.

And by (19) and (16), we have the limit for tn,

tn →(1/2)σ2[W (1)2 − 1][

σ2∫ 10 W (r)2dr

]1/2[σ2]1/2

=W (1)2 − 1

2(∫ 1

0 W (r)2dr)1/2

. (24)

For the same reason as in (21), this t-statistics is asymmetric and skewed to the left.

3.1.2 Case 2

The DGP is still a random walk as in case 1 (20),

yt = yt−1 + ut, ut ∼ i.i.d(0, σ2),

but we include a constant term in the regression

yt = α + ρyt−1 + ut.

The OLS estimates for the coefficients[αn

ρn

]=[

n∑

yt−1∑yt−1

∑y2

t−1

]−1 [ ∑yt∑

yt−1yt

].

Under the null hypothesis H0 : α = 0, ρ = 1, the deviations of the estimate vector from thehypothesis [

αn

ρn − 1

]=[

n∑

yt−1∑yt−1

∑y2

t−1

]−1 [ ∑ut∑

ut−1yt

]. (25)

Recall in a regression with a constant and time trend, the estimates have different convergentrates. The situation is similar in this case. The order in probability for each terms are[

αn

ρn

]=[

Op(n) Op(n3/2)Op(n3/2) Op(n2)

]−1 [Op(n1/2)Op(n)

]. (26)

As we did before, now we need a rescaling matrix

Hn =[

n1/2 00 n

].

Premultiply (25) by Hn we have[n1/2αn

n(ρn − 1)

]=[

1 n−3/2∑

yt−1

n−3/2∑

yt−1 n−2∑

y2t−1

]−1 [n−1/2

∑ut

nP

ut−1yt

]. (27)

15

By result (d) and (e) we have[1 n−3/2

∑yt−1

n−3/2∑

yt−1 n−2∑

y2t−1

]→

[1 σ

∫W (r)dr

σ∫

W (r)dr σ2∫

W (r)2dr

]=

[1 00 σ

] [1

∫W (r)dr∫

W (r)dr∫

W (r)2dr

] [1 00 σ

]and by result (a) and (b) we have[

n−1/2∑

ut

n−1∑

ut−1yt

]→[

σW (1)(1/2)σ2[W (1)2 − 1]

]= σ

[1 00 σ

] [W (1)

(1/2)[W (1)2 − 1]

].

Therefore,[n1/2αn

n1(ρn − 1)

]→d

[σ 00 1

] [1

∫W (r)dr∫

W (r)dr∫

W (r)2dr

]−1 [W (1)

(1/2)[W (1)2 − 1]

]= ∆−1

[σ 00 1

] [ ∫W (r)2dr −

∫W (r)dr

−∫

W (r)dr 1

] [W (1)

(1/2)[W (1)2 − 1]

]= ∆−1

[σ 00 1

] [W (1)

∫W (r)2dr − (1/2)[W (1)2 − 1]

∫W (r)dr

(1/2)[W (1)2 − 1]−W (1)∫

W (r)dr

]where

∆ ≡∫

W (r)2dr −[∫

W (r)dr

]2

.

So the DF ρ statistics to test the null hypothesis that ρ = 1 has the following limit distribution

n(ρn − 1) →d(1/2)[W (1)2 − 1]−W (1)

∫W (r)dr∫

W (r)2dr −[∫

W (r)dr]2 . (28)

As in case 1, we can also use a t test,

tn =ρn − 1σρn

,

which converges to(1/2)[W (1)2 − 1]−W (1)

∫W (r)dr{∫

W (r)2dr −[∫

W (r)dr]2}1/2

.

The details can be found on page 493-494.

3.1.3 Case 3

Now, suppose that the true process is a random walk with drift:

yt = α + yt−1 + ut, ut ∼ i.i.d(0, σ2).

Without loss of generality, we could set y0 = 0. And we also estimate a linear regression with aconstant,

yt = α + ρyt−1 + ut.

16

Defineξt ≡ u1 + u2 + . . . + ut,

thenyt = αt + ξt

andn∑

t=1

yt−1 = α

n∑t=1

t +n∑

t=1

ξt−1.

Notice that these two terms have different divergent rates. We know that∑n

t=1 t = n(n+1)/2 =O(n2), while

∑nt=1 ξt−1 = Op(n3/2) as n−3/2

∑nt=1 ξt−1 converges to a normal distribution with finite

variance (result (d)). Therefore, pick the fastest divergent rate,

n−2n∑

t=1

yt−1 = αn−2n∑

t=1

t + n−1/2n−3/2n∑

t=1

ξt−1 →p α/2. (29)

Similarly,∑n

t=1 y2t−1,

∑nt=1 yt−1ut also have terms with different divergent rates:

n∑t=1

y2t−1 =

n∑t=1

[α(t− 1) + ξt−1]2

= α2n∑

t=1

(t− 1)2 +n∑

t=1

ξ2t−1 + 2α

n∑t=1

(t− 1)ξt−1

where∑n

t=1(t− 1)2 = Op(n3) (result (h)),∑n

t=1 ξ2t−1 = Op(n2) (result (e)), and

∑nt=1(t− 1)ξt−1 =

Op(n5/2) (result (f)). Norm the sequence with the inverse of the fastest divergent rate n3,

n−3n∑

t=1

y2t−1 → α2/3. (30)

Finally,n∑

t=1

yt−1ut =n∑

t=1

[α(t− 1) + ξt−1]ut = α

n∑t=1

(t− 1)ut +n∑

t=1

ξt−1ut,

where∑n

t=1(t− 1)ut = Op(n3/2) (result (c)) and∑n

t=1 ξt−1ut = Op(n) (result (b)). Again norm itwith the fastest divergent rate

n−3/2n∑

t=1

yt−1ut →p n−3/2n∑

t=1

α(t− 1)ut. (31)

Corresponding to the different rates, to derive a nondegenerate limit distribution for the esti-mates, again we need a scaling matrix. In this case, we need

Hn =[

n1/2 00 n3/2

].

17

Premultiply the OLS estimator vector (in deviations from their true value) with Hn we got[n1/2(αn − α)n3/2(ρn − 1)

]=[

1 n−2∑

yt−1

n−2∑

yt−1 n−3∑

y2t−1

]−1 [n−1/2

∑ut

n−3/2∑

yt−1ut

].

From (29) and (30), the first term[1 n−2

∑yt−1

n−2∑

yt−1 n−3∑

y2t−1

]→p

[1 α/2

α/2 α2/3

]≡ Q.

From (13) and (31), we have[n−1/2

∑ut

n−3/2∑

yt−1ut

]→p

[n−1/2

∑ut

n−3/2∑n

t=1 α(t− 1)ut

]→p N

([00

], σ2

[1 α/2

α/2 α2/3

])= N(0, σ2Q).

Therefore we have the following limit distribution for the OLS estimates[n1/2(αn − α)n3/2(ρn − 1)

]→d N(0, Q−1 · σ2Q ·Q−1) = N(0, σ2Q−1). (32)

So in case 3, both estimated coefficients are asymptotically Gaussian, and the asymptotic dis-tribution is the same as α and δ in the regression with deterministic trends. This is because here yt

has two components: a deterministic time trend and random walk, and the time trend dominatesthe random walk.

3.1.4 Case 4

Finally we consider that the true process is a random walk with or without drift,

yt = α + yt−1 + ut, ut ∼ i.i.d.(0, σ2),

where α may or may not be zero, and we run the following regression

yt = α + ρyt−1 + δt + ut. (33)

Without loss of generality, we assume that y0 = 0. Note that when α 6= 0, it is also a time trend,hence there will be an asymptotic collinear problem between yt and t. Hence rewrite the regressionas

yt = (1− ρ)α + ρ[yt−1 − α(t− 1)] + (δ + ρα)t + ut

= α∗ + ρ∗ξt−1 + δ∗t + ut

where α∗ = (1− ρ)α, ρ∗ = ρ, δ∗ = (δ + ρα), and ξt = yt − αt. With this transformation, under thenull hypothesis ρ = 1, δ = 0, ξt is a random walk:

ξt = u1 + u2 + . . . + ut.

18

Therefore, with this transformation, we regress yt on a constant, a driftless random walk, and adeterministic time trend.

The OLS estimates in this regression are α∗nρ∗nδ∗n

=

n∑

ξt−1∑

t∑ξt−1

∑ξ2t−1

∑ξt−1t∑

t∑

ξt−1t∑

t2

∑yt∑

ξt−1yt∑tyt

.

The hypothesis is that α = c, any constant, ρ = 1 and δ = 0. Correspondingly, in thetransformed system α∗ = 0, ρ∗ = 1, and δ∗ = c. The deviations of the estimates from these truevalues are given by α∗n

ρ∗n − 1δ∗n − c

=

n∑

ξt−1∑

t∑ξt−1

∑ξ2t−1

∑ξt−1t∑

t∑

ξt−1t∑

t2

∑ut∑

ξt−1ut∑tut

. (34)

Note that these three estimates have different convergent rates (we are already familiar withthem!) α∗n is n1/2 convergent, ρ∗n is n convergent, and δ∗n is n3/2 convergent. Therefore we need arescaling matrix

Hn =

n1/2 0 00 n 00 0 n3/2

.

Premultiply (34) with Hn we have that n1/2α∗nn(ρ∗n − 1)

n3/2(δ∗n − c)

=

1 n−3/2∑

ξt−1 n−2∑

t

n−3/2∑

ξt−1 n−2∑

ξ2t−1 n−5/2

∑ξt−1t

n−2∑

t n−5/2∑

ξt−1t n−3∑

t2

n−1/2∑

ut

n−1∑

ξt−1ut

n−3/2∑

tut

.

The limit distribution of each term in the above equation can be found in the proposition. Plugthem in and we get n1/2α∗n

n(ρ∗n − 1)n3/2(δ∗n − c)

→d

σ 0 00 1 00 0 σ

1∫

W (r)dr 12∫

W (r)dr∫

W (r)2dr∫

rW (r)dr12

∫rW (r)dr 1

3

−1 W (1)(1/2)[W (r)2 − 1]W (1)−

∫W (r)dr

.

(35)The DF unit root ρ test in this case is given by the middle row of (35). Note that it does not

depend on either σ or α. The DF t test can be derived in a similar way (see page 500 in Hamilton).

3.2 Unit Root Tests with Serially Correlated Errors

3.2.1 BN Decomposition and Phillips-Solo Device

Beveridge and Nelson (1981) proposed that any time series that displays some degree of nonstation-arity can be decomposed into two additive parts: a stationary (also called cyclitical or transitory)part and a nonstationary (also called long-run or permanent) part. Let

ut = C(L)εt =∞∑

j=0

cjεt−j , (36)

19

where (a) εt ∼ WN(0, σ2) and (b)∑∞

j=0 j · |cj | < ∞. The BN-decomposition tells that we couldrewrite the lag operator as

C(L) = C(1) + (L− 1)C(L)

where C(1) =∑∞

j=0 cj , C(L) =∑∞

j=0 cjLj , and cj =

∑∞j+1 ck. Since we assume that

∑∞j=0 j · |cj | <

∞, we have∑∞

j=0 |cj | < ∞. Phillips and Solo (1992) verified that with conditions (a), (b), andthat C(1) 6= 0, ut can be represented in the form of

ut = (C(1) + (L− 1)C(L))εt

= C(1)εt − C(L)(εt − εt−1)

Then for a random walk process with innovations ut, we could represent it as

yt = yt−1 + ut

= y0 +t∑

j=1

uj

= y0 +t∑

j=1

C(1)εj − C(L)t∑

j=0

(εj − εj−1)

= y0 + C(1)t∑

j=1

εj − C(L)εt + C(L)ε0

= y0 + η0 − ηt + C(1)t∑

j=1

εj

where η0 = C(L)ε0 is the initial condition, ηt = C(L)εt =∑∞

j=1 cjεt−j is a stationary process (notethat cj is absolutely summable), and C(1)

∑tj=1 εj is a nonstationary random walk process.

Rewrite yt as

yt =t∑

s=1

us = C(1)t∑

s=1

εs + θε0 − θεt.

Note that ξt =∑t

s=1 εs is a random walk with serially uncorrelated error and we have thatn−1/2ξ[nr] → σW (r), while ε0−εt are bounded in probability, hence we would expect that n−1/2yt =C(1)n−1/2ξt+op(1) → λW (r). The following proposition summarizes some important limit theoriesfor unit root process with serially correlated error.

Proposition 2 Let ut = C(L)εt =∑∞

j=0 cjεt−j, where∑∞

j=0 j · |cj | < ∞ and ε ∼ i.i.d.(0, σ2, µ4).Define that

γh = E(utut−h) = σ2∞∑

j=0

cjcj+h,

λ = σ

∞∑j=0

cj = σC(1),

ξt = u1 + u2 + . . . ut, ξ0 = 0.

In the above notation, λ2 is known as the long run variance of ut, which is in gerenal differentfrom the variance of ut, which is γ0.

20

(a) n−1/2∑n

t=1 ut →d λ ·W (1);

(b) n−1/2∑n

t=1 ut−jεt →d N(0, σ2γ0) for j = 1, 2, . . .;

(c) n−1∑n

t=1 utut−j → γj for j = 1, 2, . . .;

(d) n−1∑n

t=1 ξt−1εt →d (1/2)σλ[W (1)2 − 1];

(e)

n−1n∑

t=1

ξt−1ut−h →d

{(1/2)[λ2[W (1)2 − γ0] for h = 0(1/2)[λ2[W (1)2 − γ0] +

∑h−1j=0 γh for h = 1, 2, . . .

(f) n−3/2∑n

t=1 ξt−1 →d λ∫ 10 W (r)dr;

(g) n3/2∑n

t=1 tut−j →d λ[W (1)−

∫ 10 W (r)dr

];

(h) n−2∑n

t=1 ξ2t−1 →d λ2

∫ 10 W (r)2dr;

(i) n−5/2∑n

t=1 tξt−1 →d λ∫ 10 rW (r)dr;

(j) n−3∑n

t=1 tξ2t−1 →d λ2

∫ 10 rW (r)2dr;

(k) n−(v+1)∑n

t=1 tv → 1/(v + 1) for v = 0, 1, 2, . . ..

The proof of all these results can be found in the appendix of Chapter 17 in Hamilton. In theclass, we will discuss (a), (e) and (f) as examples. First to prove (a),

n−1/2

[nr]∑t=1

ut = n−1/2C(1)[nr]∑t=1

εt + n−1/2(η[nr] − η0).

By (6) and CMT we have that

n−1/2C(1)[nr]∑t=1

εt → σC(1)W (r).

η0 is the initial condition and η[nr] is a zero mean stationary process, both are bounded in probability,

n−1/2(η[nr] − η0) → 0.

Therefore, we obtain the limit

n−1/2

[nr]∑t=1

ut → σC(1)W (r)

and when r = 1,

n−1/2n∑

t=1

ut → σC(1)W (1) (37)

21

Second, to prove

n−1n∑

t=1

ξt−1ut−j →{

(1/2)[λ2(W (1)2 − γ0)] for j = 0(1/2)[λ2(W (1)2 − γ0)] +

∑j−1i=0 γi for j > 0

(38)

First, let h = 0 and we have

n−1n∑

t=1

ξt−1ut = n−1(1/2)ξ2n − n−1(1/2)

n∑t=1

u2t .

We know that n−1/2ξn → λW (1). By CMT, we have that

(1/2)n−1ξ2n → (1/2)λ2W (1)2.

In result (c), we have n−1∑n

t=1 u2t → γ0. Therefore,

n−1n∑

t=1

ξt−1ut → (1/2)[λ2(W (1)2 − γ0)].

Next, let h = 1. Note that

ξt−1ut−1 = (ξt−2 + ut−1)ut−1 = ξt−2ut−1 + ut−1ut−1.

We already got the limit for n−1∑n

t−1 ξt−1ut, therefore,

n−1n∑

t=1

ξt−1ut−1 → (1/2)[λ2(W (1)2 − γ0)] + γ0.

Similar for h = 2, 3, . . . .Thirdly, consider result (f),

n−3/2n∑

t=1

ξt−1 → λ

∫ 1

0W (r)dr. (39)

Define

Sn(r) =

0 for r ∈ [0, 1/n)n−1/2ξt for r ∈ [t/n, (t + 1)/n)n−1/2ξn for r = 1

(40)

then we haveSn(r) → λW (r).

By CMT, ∫ 1

0Sn(r)dr → λ

∫ 1

0W (r)dr,

and we have ∫ 1

0Sn(r)dr = n−3/2

n∑t=1

ξt.

22

3.2.2 Phillips-Perron Tests for Unit Roots

We will discuss case 2 only and other cases can be derived similarly. Let the true DGP be a randomwalk with serially correlated errors,

yt = α + ρyt−1 + ut, ut = C(L)εt

where C(L) and εt satisfy the conditions in proposition 2. When |ρ| < 1, OLS estimates of ρis not consistent when the errors are serially correlated. However, when ρ = 1, OLS estimatesρn → 1. Therefore, Phillips and Perron (1988) proposed estimating the regression with OLS andthen correct the estimates with serial correlation.

Under the null hypothesis H0 : α = 0, ρ = 1, the deviations of the OLS estimates vector fromthe hypothesis[

n1/2αn

n(ρn − 1)

]=[

1∑

n−3/2yt−1

n−3/2∑

yt−1 n−2∑

y2t−1

]−1 [n−1/2

∑ut

n−1∑

yt−1ut

]. (41)

Use result (f) and (h) in proposition 2,[1

∑n−3/2yt−1

n−3/2∑

yt−1 n−2∑

y2t−1

]−1

→[

1 λ∫

W (r)drλ∫

W (r)dr λ2∫

W (r)2dr

]−1

=[

1 00 λ

]−1 [ 1∫

W (r)dr∫W (r)dr

∫W (r)2dr

]−1 [ 1 00 λ

]−1

,

and use result (a) and (e) in proposition 2,[n−1/2

∑ut

n−1∑

yt−1ut

]→d

[λW (1)

(1/2)[λ2W (1)2 − γ0]

]=

[λW (1)

(1/2)[λ2W (1)2 − 1]

]+[

0(1/2)(λ2 − γ0)

]= λ

[1 00 λ

] [λW (1)

(1/2)[λ2W (1)2 − 1]

]+[

0(1/2)(λ2 − γ0)

].

Substitute these two results into (41),[n1/2αn

n(ρn − 1)

]→

[λ 00 1

] [1

∫W (r)dr∫

W (r)dr∫

W (r)2dr

]−1 [λW (1)

(1/2)[λ2W (1)2 − 1]

]+[

1 00 λ−1

] [1

∫W (r)dr∫

W (r)dr∫

W (r)2dr

]−1 [ 0(1/2)(λ2 − γ0)/λ

]To test ρ = 1,

n(ρn − 1) →[

0 1] [ 1

∫W (r)dr∫

W (r)dr∫

W (r)2dr

]−1 [λW (1)

(1/2)[λ2W (1)2 − 1]

]+

λ2 − γ0

2λ2

[0 1

] [ 1∫

W (r)dr∫W (r)dr

∫W (r)2dr

]−1 [ 01

]=

(1/2)[W (1)2 − 1]−W (1)∫

W (r)dr∫W (r)2dr −

[∫W (r)dr

] +(1/2)(λ2 − γ0)

λ2{∫

W (r)2dr −[∫

W (r)dr]} .

23

The first term describes the asymptotic distribution of n(ρ − 1) as if ut is i.i.d as in the pre-vious subsection (28). The second term is a correction for serial correlation. When ut is seriallyuncorrelated, C(1) = 1, then λ2 = γ0 = σ2. Then this term disappears. The asymptotics for thet-statistics can be derived in a similar way.

3.2.3 Augmented Dickey-Fuller Tests for Unit Roots

An alternative unit root test with serially correlated errors is augmented Dickey-Fuller test. Recallthat I used an example of AR(2) process early in this lecture

(1− φ1L− φ2L2)yt = εt,

with one unit root and another root |λ2| < 1. Then we could rewrite it

yt = yt−1 + ut, ut = (1− λ2)−1εt = θ(L)εt.

So this is a unit root process with serially correlated errors. To correct for the serial correlation,define

ρ = φ1 + φ2, κ = −φ2.

Then we have the following equivalent polynomial,

(1− ρL)− κL(1− L)= 1− ρL− κL + κL2

= 1− (φ1 + φ2 − φ2)L− φ2L2.

Therefore, the original AR(2) process can be written as

[(1− ρL)− κL(1− L)]yt = εt,

oryt = ρyt−1 + κ∆yt−1 + εt (42)

This approach can be generalized to AR(p) process, where we define

ρ = φ1 + φ2 + . . . + φp,

andκj = −[φj+1 + φj+2 + . . . + φp] for j = 1, 2, . . . , p− 1.

Note that when the process contain a unit root, which means one root of

1− φ1z − φ2z2 − . . .− φpz

p = 0

is unity,1− φ1 − φ2 − . . .− φp = 0,

which implies that ρ = 1. Therefore to test if a process contain a unit root is equivalent to test ifρ = 1 in (42). Furthermore, (42) is a regression with serially uncorrelated errors. For simplicity,

24

in our following discussion, we work with an AR(2) process. Again, we only consider case 2. Ourregression

yt = κ∆yt−1 + α + ρyt−1 + εt ≡ x′tβ + εt

where xt = (∆yt−1, 1, yt−1), β = (κ, α, ρ). The deviation of the OLS estimates from the true β,

βn − β =

[n∑

t=1

xtx′t

]−1 [ n∑t=1

xtεt

].

Let ut = yt − yt−1,n∑

t=1

xtx′t =

∑u2

t−1

∑ut−1

∑ut−1yt−1∑

ut−1 n∑

yt−1∑yt−1ut−1 yt−1

∑y2

t−1

,

n∑t=1

xtεt =

∑ut−1εt∑εt∑

yt−1εt

.

ut is stationary, so its coefficient is n1/2 convergent. So the scaling matrix

Hn =

√n 0 0

0√

n 00 0 n

.

Premultiply the coefficient vector with Hn,

Hn(βn − β) =

{H−1

n

[n∑

t=1

xtx′t

]H−1

n

}−1

H−1n

[n∑

t=1

xtεt

]. (43)

Define γj = E(utut−j), λ = σC(1) = σ/(1− κ), where σ = E(ε2t ).

H−1n

[n∑

t=1

xtx′t

]H−1

n →d

γ0 0 00 1 λ

∫W (r)dr

0 λ∫

W (r)dr λ2∫

W (r)2dr

≡ [ V 00 Q

].

Here V = γ0, while it would be a matrix with elements γj for a general AR(p) model, and

Q =[

1 λ∫

W (r)drλ∫

W (r)dr λ2∫

W (r)2dr

].

Next, consider the second term in (43),

H−1n

[n∑

t=1

xtεt

]=

n−1/2∑

ut−1εt

n−1/2∑

εt

n−1∑

yt−1εt

Apply the usual CLT to the first element,

n−1/2∑

ut−1εt →p h1 ∼ N(0, σ2V ).

25

Apply result (a) and (d) of proposition 2 for the other two terms,[n−1/2

∑εt

n−1∑

yt−1εt

]→d h2 ∼

[σW (1)

(1/2)σλ[W (1)2 − 1]

].

Substituting the above results into (43) and we get

Hn(βn − β) →d

[V 00 Q

] [h1

h2

]=[

V −1h1

Q−1h2

]. (44)

Since the limit distribution is block diagonal, we can discuss the coefficients on the stationarycomponents and the nonstationary components seperately. For the stationary components,

√n(κn − κ) →d V −1h1 ∼ N(0, σ2V −1).

In this AR(2) problem, the variance is simply σ2/γ0. The limit distribution on the constant andthe I(1) components are[

n1/2αn

n(ρn − 1)

]→d Q−1h2 =

[σ 00 σ/λ

] [1

∫W (r)dr∫

W (r)dr∫

W (r)2dr

] [σW (1)

(1/2)σλ[W (1)2 − 1]

].

This implies that n · (λ/σ) · (ρn − 1) has the same distribution as in (28). Since λ = σC(1),λ/σ = C(1) = 1/(1− κ). Therefore, the ADF ρ-test is

n(ρn − 1)1− κn

→d(1/2)[W (1)2 − 1]−W (1)

∫W (r)dr∫

W (r)2dr −[∫

W (r)dr]2 . (45)

For the general AR(p) process, simply replace (1− κn) with (1− κ1,n− . . .− κp−1,n). The ADFt-test can be found in Hamilton’s book.

26

Lecture 8: Univariate Processes with Unit Roots · Lecture 8: Univariate Processes with Unit...

Documents

Transcript of Lecture 8: Univariate Processes with Unit Roots · Lecture 8: Univariate Processes with Unit...