Markov-Switching Models

29
Markov-Switching Models Mei-Yuan Chen Department of Finance National Chung Hsing University Feb. 25, 2013 Mei-Yuan Chen. The L A T E X source files are mkv-sw.tex.

Transcript of Markov-Switching Models

Page 1: Markov-Switching Models

Markov-Switching Models

Mei-Yuan Chen

Department of Finance

National Chung Hsing University

Feb. 25, 2013

©Mei-Yuan Chen. The LATEX source files are mkv-sw.tex.

Page 2: Markov-Switching Models

1 INTRODUCTION

The Markov switching models are useful because of the potential it offers for capturing

occasional but recurrent regime shifts in a simple dynamic econometric model.

2 Serially Uncorrelated Data and Switching

Two-state model is considered as

yt = x′

tβst + et, t = 1, 2, . . . , T, et ∼ N(0, σ2st),

βst = β0(1− st) + β1st,

σ2st = σ2

0(1− st) + σ21st,

st = 0 or 1.

Then, the log-likelihood function is

lnL =T∑

t=1

ln f(yt|st), (1)

where

f(yt|st) =1√

2πσstexp

(−(yt − x′

tβst)2

2σ2st

). (2)

Problem: st, t = 1, . . . , T are not observable so that MLE is not applicable to do estima-

tion.

Since

f(yt, st|Ψt−1) = f(yt|st,Ψt−1)P (st|Ψt−1)

and then

f(yt|Ψt−1) =1∑

st=0

f(yt, st|Ψt−1)

=1∑

st=0

f(yt|st,Ψt−1)P (st|Ψt−1)

=1√2πσ0

exp

(−(yt − x′

tβ0)2

2σ20

)× P (st = 0|Ψt−1)

+1√2πσ1

exp

(−(yt − x′

tβ1)2

2σ21

)× P (st = 1|Ψt−1).

2

Page 3: Markov-Switching Models

Thus,

lnL =T∑

t=1

ln f(yt|st)

=T∑

t=1

1∑

st=0

ln f(yt, st|Ψt−1)

=T∑

t=1

ln{1∑

st=0

f(yt|st,Ψt−1)P (st|Ψt−1)}. (3)

2.0.1 The Case of Independent Switching

When st evolves independently of its own past valuse and any other exogenous or prede-

termined variables, we can specify probabilities very simply as

P (st = 0) = p =exp(p0)

1 + exp(p0)

P (st = 1) = 1− p = 1− exp(p0)

1 + exp(p0).

When st evolves independently of its own past values, it may be dependent upon some

exogenous or predetermined variables, say zt−1. Then

P (st = 0|Ψt−1) = pt =exp(p0 + z′

t−1γ)

1 + exp(p0 + z′

t−1γ)= P (st = 0|Ψt−1)

P (st = 1|Ψt−1) = 1− pt = 1− exp(p0 + z′

t−1γ)

1 + exp(p0 + z′

t−1γ)= P (st = 1|Ψt−1).

Finally, the parameters β0, β1, σ20 , σ

21 , p0, p1 or p0 and γ.

2.0.2 The Case of Markov Switching

In the case of the evolution of st depends upon st−1, st−2, . . . , st−r, in which case the

process of st is named as an r-th order Markov-switching process. For simplicity, we

consider the simplest case of a two-state, first-order Markov-switching process for st with

the following probabilities

P (st = 0|st−1 = 0) = q =exp(q0)

1 + exp(q0)

P (st = 1|st−1 = 1) = p =exp(p0)

1 + exp(p0).

Dealing with the problem of unobserved st is exactly the same as before.

Calculations for P (st = j|Ψt−1), j = 0, 1

3

Page 4: Markov-Switching Models

Step 1: Given P (st−1 = i|Ψt−1), i = 0, 1,

P (st = j|Ψt−1)

=1∑

i=0

P (st = j, st−1 = i|Ψt−1)

=1∑

i=0

P (st = j|st−1 = i,Ψt−1)× P (st−1 = i|Ψt−1)

= P (st = j|st−1 = 0,Ψt−1)× P (st−1 = 0|Ψt−1)

+P (st = j|st−1 = 1,Ψt−1)× P (st−1 = 1|Ψt−1).

Step 2: Once yt is observed at the end of time t, the probability terms are updated as

P (st = j|Ψt) = P (st = j|yt,Ψt−1)

=f(yt, st = j|Ψt−1)

f(yt|Ψt−1)

=f(yt, st = j|Ψt−1)∑1i=0 f(yt, st = i|Ψt−1)

=f(yt|st = j,Ψt−1)P (st = j|Ψt−1)∑1i=0 f(yt|st = i,Ψt−1)P (st = i|Ψt−1)

.

In details, given P (s0 = 0|Ψ0) = π0 and P (s0 = 1|Ψ0) = π1, we calculate at time t = 1,

P (s1 = 0|Ψ0) =1∑

i=0

P (s1 = 0, s0 = i|Ψ0)

=1∑

i=0

P (s1 = 0|s0 = i,Ψ0)× P (s0 = i|Ψ0)

= P (s1 = 0|s0 = 0,Ψ0)× P (s0 = 0|Ψ0) + P (s1 = 0|s0 = 1,Ψ0)× P (s0 = 1|Ψ0)

= q × π0 + (1− p)× π1 (4)

Similarly,

P (s1 = 1|Ψ0) =1∑

i=0

P (s1 = 1, s0 = i|Ψ0)

=1∑

i=0

P (s1 = 1|s0 = i,Ψ0)× P (s0 = i|Ψ0)

= P (s1 = 1|s0 = 0,Ψ0)× P (s0 = 0|Ψ0) + P (s1 = 1|s0 = 1,Ψ0)× P (s0 = 1|Ψ0)

= (1− q)× π0 + p× π1 (5)

Next, the densities are

f(y1|s1 = 0,Ψ0) =1√2πσ0

exp

(−(y1 − x′

1β0)2

2σ20

)(6)

f(y1|s1 = 1,Ψ0) =1√2πσ1

exp

(−(y1 − x′

1β1)2

2σ21

)(7)

4

Page 5: Markov-Switching Models

The log-likelihood becomes lnL =∑T

t=1 ln f(yt|st,Ψt−1) and

ln f(yt|st,Ψt−1) = ln

[1∑

i=0

f(yt|st = i,Ψt−1)P (st = i|Ψt−1)

].

Thus,

f(y1|s1,Ψ0) =1∑

i=0

f(y1, s1 = i|Ψ0)

=1∑

i=0

f(y1|s1 = i,Ψ0)P (s1 = i|Ψ0)

= f(y1|s1 = 0,Ψ0)P (s1 = 0|Ψ0) + f(y1|s1 = 1,Ψ0)P (s1 = 1|Ψ0)

= (6) × (4) + (7)× (5). (8)

After having y1, by using

IP(A|B,C) =IP(A ∩B|C)

IP(B|C),

the probability terms are updating as

P (s1 = 0|Ψ1) = P (s1 = 0|y1,Ψ0)

=f(s1 = 0, y1|Ψ0)

f(y1|Ψ0)

=f(s1 = 0, y1|Ψ0)∑1i=0 f(y1, s1 = i|Ψ0)

=f(y1|s1 = 0,Ψ0)× P (s1 = 0|Ψ0)∑1i=0 f(y1|s1 = i,Ψ0)P (s1 = i|Ψ0)

=(6) × (4)

(6) × (4) + (7)× (5)

=(6) × (4)

(8)(9)

Similarly,

P (s1 = 1|Ψ1) = P (s1 = 1|y1,Ψ0)

=f(s1 = 1, y1|Ψ0)

f(y1|Ψ0)

=f(s1 = 1, y1|Ψ0)∑1i=0 f(y1, s1 = i|Ψ0)

=f(y1|s1 = 1,Ψ0)× P (s1 = 1|Ψ0)∑1i=0 f(y1|s1 = i,Ψ0)P (s1 = i|Ψ0)

=(7) × (5)

(6) × (4) + (7)× (5)

=(7) × (5)

(8)(10)

5

Page 6: Markov-Switching Models

For time t = 2,

P (s2 = 0|Ψ1) =1∑

i=0

P (s2 = 0, s1 = i|Ψ1)

=1∑

i=0

P (s2 = 0|s1 = i,Ψ1)× P (s1 = i|Ψ1)

= P (s2 = 0|s1 = 0,Ψ1)× P (s1 = 0|Ψ1) + P (s2 = 0|s1 = 1,Ψ1)× P (s1 = 1|Ψ1)

= q × (9) + (1− p)× (10). (11)

Similarly,

P (s2 = 1|Ψ1) =1∑

i=0

P (s2 = 1, s1 = i|Ψ1)

=1∑

i=0

P (s2 = 1|s1 = i,Ψ1)× P (s1 = i|Ψ1)

= P (s2 = 1|s1 = 0,Ψ1)× P (s1 = 0|Ψ1) + P (s2 = 1|s1 = 1,Ψ1)× P (s1 = 1|Ψ1)

= (1− q)× (9) + p× (10). (12)

Next, the densities are

f(y2|s2 = 0,Ψ1) =1√2πσ0

exp

(−(y2 − x′

2β0)2

2σ20

)(13)

f(y2|s2 = 1,Ψ1) =1√2πσ1

exp

(−(y2 − x′

2β1)2

2σ21

)(14)

Besides,

f(y2|s2,Ψ1) =1∑

i=0

f(y2, s2 = i|Ψ1)

=1∑

i=0

f(y2|s2 = i,Ψ1)P (s2 = i|Ψ1)

= f(y2|s2 = 0,Ψ1)P (s2 = 0|Ψ1) + f(y2|s2 = 1,Ψ1)P (s2 = 1|Ψ1)

= (13) × (11) + (14) × (12).

After having y2, by using

IP(A|B,C) =IP(A ∩B|C)

IP(B|C),

the probability terms are updating as

P (s2 = 0|Ψ2) = P (s2 = 0|y2,Ψ1)

=f(s2 = 0, y2|Ψ1)

f(y2|Ψ1)

6

Page 7: Markov-Switching Models

=f(s2 = 0, y2|Ψ1)∑1i=0 f(y2, s2 = i|Ψ1)

=f(y2|s2 = 0,Ψ1)× P (s2 = 0|Ψ1)∑1i=0 f(y2|s2 = i,Ψ1)P (s2 = i|Ψ1)

=(13) × (11)

(13) × (11) + (14)× (12)

=(13) × (11)

(15)(15)

Similarly,

P (s2 = 1|Ψ2) = P (s2 = 1|y2,Ψ1)

=f(s2 = 1, y2|Ψ1)

f(y2|Ψ1)

=f(s2 = 1, y2|Ψ1)∑1i=0 f(y2, s2 = i|Ψ1)

=f(y2|s2 = 1,Ψ1)× P (s2 = 1|Ψ1)∑1i=0 f(y1|s1 = i,Ψ0)P (s1 = i,Ψ0)

=(14) × (12)

(13) × (11) + (14)× (12)

=(14) × (12)

(15)(16)

For time t = 3,

P (s3 = 0|Ψ2) =1∑

i=0

P (s3 = 0, s2 = i|Ψ2)

=1∑

i=0

P (s3 = 0|s2 = i,Ψ2)× P (s2 = i|Ψ2)

= P (s3 = 0|s2 = 0,Ψ2)× P (s2 = 0|Ψ2) + P (s3 = 0|s2 = 1,Ψ2)× P (s2 = 1|Ψ2)

= q × (15) + (1− p)× (16). (17)

Similarly,

P (s3 = 1|Ψ2) =1∑

i=0

P (s3 = 1, s2 = i|Ψ2)

=1∑

i=0

P (s3 = 1|s2 = i,Ψ2)× P (s2 = i|Ψ2)

= P (s3 = 1|s2 = 0,Ψ2)× P (s2 = 0|Ψ2) + P (s3 = 1|s2 = 1,Ψ2)× P (s2 = 1|Ψ2)

= (1− q)× (15) + p× (16). (18)

Next, the densities are

f(y3|s3 = 0,Ψ2) =1√2πσ0

exp

(−(y3 − x′

3β0)2

2σ20

)(19)

7

Page 8: Markov-Switching Models

f(y3|s3 = 1,Ψ2) =1√2πσ1

exp

(−(y3 − x′

3β1)2

2σ21

)(20)

Besides,

f(y3|s3,Ψ2) =1∑

i=0

f(y3, s3 = i|Ψ2)

=1∑

i=0

f(y3|s3 = i,Ψ2)P (s3 = i|Ψ2)

= f(y3|s3 = 0,Ψ2)P (s3 = 0|Ψ2) + f(y3|s3 = 1,Ψ2)P (s3 = 1|Ψ2)

= (19) × (17) + (20) × (18).

After having y3, by using

IP(A|B,C) =IP(A ∩B|C)

IP(B|C),

the probability terms are updating as

P (s3 = 0|Ψ3) = P (s3 = 0|y3,Ψ2)

=f(s3 = 0, y3|Ψ2)

f(y3|Ψ2)

=f(s3 = 0, y3|Ψ2)∑1i=0 f(y3, s3 = i|Ψ2)

=f(y3|s3 = 0,Ψ2)× P (s3 = 0|Ψ2)∑1i=0 f(y3|s3 = i,Ψ2)P (s3 = i|Ψ2)

=(19) × (17)

(19) × (17) + (20)× (18)

=(19) × (17)

(21)(21)

Similarly,

P (s3 = 1|Ψ3) = P (s3 = 1|y3,Ψ2)

=f(s3 = 1, y3|Ψ2)

f(y3|Ψ2)

=f(s3 = 1, y3|Ψ2)∑1i=0 f(y3, s3 = i|Ψ2)

=f(y3|s3 = 1,Ψ2)× P (s3 = 1|Ψ2)∑1i=0 f(y3|s3 = i,Ψ0)P (s3 = i,Ψ2)

=(20) × (18)

(19) × (17) + (20)× (18)

=(20) × (18)

(21)(22)

Repeat above steps through T , and then the log-likelihood can be constructed and MLE

for β0, β1, σ0, σ1, p0, q0, π0 and π1 can be obtained.

8

Page 9: Markov-Switching Models

2.1 Serially Correlated Data and Markov Switching

In general, an autoregressive model of order k with first-order, M -state Markov-Switching

mean and variance may be written as

φ(L)(yt − µst) = et, et ∼ N(0, σ2st)

P (st = j|st−1 = i) = pij, i, j = 1, 2, . . . ,M∑

j=1

pij = 1

µst = µ1s1t + µ2s2t + · · ·+ µMsMt

σ2st = σ2

1s1t + σ21s1t + · · · ,+σ2

MsMt

where smt = 1 if st = m and smt = 0, otherwise.

For simplicity, an AR(1) model is considered:

(yt − µst) = φ1(yt−1 − µst−1) + et, et ∼ i.i.d.N(0, σ2st ).

The log-likelihood function at time t is

f(yt|Ψt−1, st, st−1)

=1√

2πσstexp

(− [(yt − µst)− φ1(yt−1 − µst−1)]

2

2σ2st

),

which depends on the knowledge about st and st−1 but they are unobservable. To solve

this problem, the procedures same as above are suggested.

Step 1: Derive the joint density of yt, st and st−1, conditional on past information Ψt−1:

f(yt, st, st−1|Ψt−1) = f(yt|st, st−1,Ψt−1)P (st, st−1|Ψt−1).

Step 2: To get f(yt|Ψt−1), integrate st and st−1 out of the joint density by summing the

joint density over all possible values of st and st−1:

f(yt|st, st−1) =M∑

j=1

M∑

i=1

f(yt, st = j, st−1 = i|Ψt−1)

=M∑

j=1

M∑

i=1

f(yt|st = j, st−1 = i,Ψt−1)P (st = j, st−1 = i|Ψt−1).

Then, the log-likelihood function can be written as

lnL =T∑

t=1

ln[f(yt|st, st−1)]

=T∑

t=1

ln

M∑

j=1

M∑

i=1

f(yt|st = j, st−1 = i,Ψt−1)P (st = j, st−1 = i|Ψt−1)

. (23)

To calculate P (st = j, st−1 = i|Ψt−1), i, j = 1, . . . ,M , we have the following filtering

and smoothing procedures.

9

Page 10: Markov-Switching Models

2.1.1 Filtering

Step 1: Given P (st−1 = i|Ψt−1), i = 1, . . . ,M , at the beginning of time t or the t-th

iteration, the weight terms P (st = j, st−1 = i|Ψt−1), i, j = 1, . . . ,M are calculated as

P (st = j, st−1 = i|Ψt−1)

= P (st = j|st−1 = i]× P (st−1 = i|Ψt−1)

= pij × P (st−1|Ψt−1).

Step 2: Once yt is observed at the end of time t, then

P (st = j, st−1 = i|Ψt)

= P (st = j, st−1 = i|Ψt−1, yt)

=P (st = j, st−1 = i, yt|φt−1)

f(yt|Ψt−1)

=f(yt|st = j, st−1 = i,Ψt−1)× P (st = j, st−1 = i|Ψt−1)∑M

j=1

∑Mi=1 f(yt, st = j, st−1 = i|Ψt−1)

=f(yt|st = j, st−1 = i,Ψt−1)× P (st = j, st−1 = i|Ψt−1)∑M

j=1

∑Mi=1 f(yt|st = j, st−1 = i,Ψt−1)× P (st = j, st−1 = i|Ψt−1)

,

with

P (st = j|Ψt) =M∑

i=1

P (st = j, st−1 = i|Ψt)

Iterating the above two steps for t = 1, 2, . . . , T , we have P (st, st−1|Ψt−1). In more

detail, given M = 2 and at time t = 0, P (s0 = 0|Ψ0) = π0 = (1− p22)/(2− p22 − p11) and

P (s0 = 1|Ψ0) = 1− π0 = (1− p11)/(2− p22 − p11), when t = 1,

P (s1 = 0, s0 = 0|Ψ0) = p00P (s0 = 0|Ψ0) = p00 × π0 (24)

P (s1 = 1, s0 = 0|Ψ0) = p01P (s0 = 0|Ψ0) = p01 × π0 (25)

P (s1 = 0, s0 = 1|Ψ0) = p10P (s0 = 1|Ψ0) = p10 × π1 (26)

P (s1 = 1, s0 = 1|Ψ0) = p11P (s0 = 1|Ψ0) = p11 × π1, (27)

then given

f(y1|s1 = 0, s0 = 0,Ψ0) =1√

2πσs1=0

exp

(− [(y1 − µs1=0)− φ1(y0 − µs0=0)]

2

2σ2s1=0

)

=1√2πσ0

exp

(− [(y1 − µ0)− φ1(y0 − µ0)]

2

2σ20

)(28)

f(y1|s1 = 1, s0 = 0,Ψ0) =1√

2πσs1=1

exp

(− [(y1 − µs1=1)− φ1(y0 − µs0=0)]

2

2σ2s1=1

)

10

Page 11: Markov-Switching Models

=1√2πσ1

exp

(− [(y1 − µ1)− φ1(y0 − µ0)]

2

2σ21

)(29)

f(y1|s1 = 0, s0 = 1,Ψ0) =1√

2πσs1=0

exp

(− [(y1 − µs1=0)− φ1(y0 − µs0=1)]

2

2σ2s1=0

)

=1√2πσ0

exp

(− [(y1 − µ0)− φ1(y0 − µ1)]

2

2σ20

)(30)

f(y1|s1 = 1, s0 = 1,Ψ0) =1√

2πσs1=1

exp

(− [(y1 − µs1=1)− φ1(y0 − µs0=1)]

2

2σ2s1=1

)

=1√2πσ1

exp

(− [(y1 − µ1)− φ1(y0 − µ1)]

2

2σ21

). (31)

f(y1|Ψ0) =1∑

j=0

1∑

i=0

f(y1, s1 = j, s0 = i|Ψ0)

=1∑

j=0

1∑

i=0

f(y1|s1 = j, s0 = i,Ψ0)× P (s1 = j, s0 = i|Ψ0)

= (28)× (24) + (29) × (25) + (30)× (26) + (31) × (27) (32)

Then, after having y1,

P (s1 = 0, s0 = 0|Ψ0, y1)

=f(y1|s1 = 0, s0 = 0,Ψ0)× P (s1 = 0, s0 = 0|Ψ0)∑M

j=1

∑Mi=1 f(y1|s1 = j, s0 = i,Ψ0)× P (s1 = j, s0 = i|Ψ0)

=(28) × (24)

(32)(33)

P (s1 = 1, s0 = 0|Ψ0, y1)

=f(y1|s1 = 1, s0 = 0,Ψ0)× P (s1 = 1, s0 = 0|Ψ0)∑M

j=1

∑Mi=1 f(y1|s1 = j, s0 = i,Ψ0)× P (s1 = j, s0 = i|Ψ0)

=(29) × (25)

(32)(34)

P (s1 = 0, s0 = 1|Ψ0, y1)

=f(y1|s1 = 0, s0 = 1,Ψ0)× P (s1 = 0, s0 = 1|Ψ0)∑M

j=1

∑Mi=1 f(y1|s1 = j, s0 = i,Ψ0)× P (s1 = j, s0 = i|Ψ0)

=(30) × (26)

(32)(35)

P (s1 = 1, s0 = 1|Ψ0, y1)

=f(y1|s1 = 1, s0 = 1,Ψ0)× P (s1 = 0, s0 = 1|Ψ0)∑M

j=1

∑Mi=1 f(y1|s1 = j, s0 = i,Ψ0)× P (s1 = j, s0 = i|Ψ0)

=(31) × (27)

(32). (36)

11

Page 12: Markov-Switching Models

And then updating

P (s1 = 0|Ψ1) = P (s1 = 0, s0 = 0|Ψ1) + P (s1 = 0, s0 = 1|Ψ1)

= (33) + (35) (37)

P (s1 = 1|Ψ1) = P (s1 = 1, s0 = 0|Ψ1) + P (s1 = 1, s0 = 1|Ψ1)

= (34) + (36). (38)

At time t = 2,

P (s2 = 0, s1 = 0|Ψ1) = p00P (s1 = 0|Ψ1) = p00 × (??) (39)

P (s2 = 1, s1 = 0|Ψ1) = p01P (s1 = 0|Ψ1) = p01 × (??) (40)

P (s2 = 0, s1 = 1|Ψ1) = p10P (s1 = 1|Ψ1) = p10 × (??) (41)

P (s2 = 1, s1 = 1|Ψ1) = p11P (s1 = 1|Ψ1) = p11 × (??), (42)

then given

f(y2|s2 = 0, s1 = 0,Ψ1) =1√

2πσs2=0

exp

(− [(y2 − µs2=0)− φ1(y1 − µs1=0)]

2

2σ2s2=0

)

=1√2πσ0

exp

(− [(y2 − µ0)− φ1(y1 − µ0)]

2

2σ20

)(43)

f(y2|s2 = 1, s1 = 0,Ψ1) =1√

2πσs2=1

exp

(− [(y2 − µs2=1)− φ1(y1 − µs1=0)]

2

2σ2s2=1

)

=1√2πσ1

exp

(− [(y2 − µ1)− φ1(y1 − µ0)]

2

2σ21

)(44)

f(y2|s2 = 0, s1 = 1,Ψ1) =1√

2πσs2=0

exp

(− [(y2 − µs2=0)− φ1(y1 − µs1=1)]

2

2σ2s2=0

)

=1√2πσ0

exp

(− [(y2 − µ0)− φ1(y1 − µ1)]

2

2σ20

)(45)

f(y2|s2 = 1, s1 = 1,Ψ1) =1√

2πσs2=1

exp

(− [(y2 − µs2=1)− φ1(y1 − µs1=1)]

2

2σ2s2=1

)

=1√2πσ1

exp

(− [(y2 − µ1)− φ1(y1 − µ1)]

2

2σ21

). (46)

f(y2|Ψ1) =1∑

j=0

1∑

i=0

f(y2, s2 = j, s1 = i|Ψ1)

=1∑

j=0

1∑

i=0

f(y2|s2 = j, s1 = i,Ψ1)× P (s2 = j, s1 = i|Ψ1)

= (43)× (39) + (44) × (40) + (45)× (41) + (46) × (42) (47)

12

Page 13: Markov-Switching Models

Then, after having y3,

P (s2 = 0, s1 = 0|Ψ1, y2)

=f(y2|s2 = 0, s1 = 0,Ψ1)× P (s2 = 0, s1 = 0|Ψ1)∑M

j=1

∑Mi=1 f(y2|s2 = j, s1 = i,Ψ1)× P (s2 = j, s1 = i|Ψ1)

=(43) × (39)

(47)(48)

P (s2 = 1, s1 = 0|Ψ2, y2)

=f(y2|s2 = 1, s1 = 0,Ψ1)× P (s1 = 2, s1 = 0|Ψ1)∑M

j=1

∑Mi=1 f(y2|s2 = j, s1 = i,Ψ1)× P (s2 = j, s1 = i|Ψ1)

=(44) × (40)

(47)(49)

P (s2 = 0, s1 = 1|Ψ1, y2)

=f(y2|s2 = 0, s1 = 1,Ψ1)× P (s2 = 0, s1 = 1|Ψ1)∑M

j=1

∑Mi=1 f(y2|s2 = j, s1 = i,Ψ1)× P (s2 = j, s1 = i|Ψ1)

=(45) × (41)

(47)(50)

P (s2 = 1, s1 = 1|Ψ1, y2)

=f(y2|s2 = 1, s1 = 1,Ψ1)× P (s2 = 0, s1 = 1|Ψ1)∑M

j=1

∑Mi=1 f(y2|s2 = j, s1 = i,Ψ1)× P (s2 = j, s1 = i|Ψ1)

=(46) × (42)

(47). (51)

And then updating

P (s2 = 0|Ψ2) = P (s2 = 0, s1 = 0|Ψ2) + P (s2 = 0, s1 = 1|Ψ2)

= (48) + (50) (52)

P (s2 = 1|Ψ2) = P (s2 = 1, s1 = 0|Ψ2) + P (s2 = 1, s1 = 1|Ψ2)

= (49) + (51). (53)

Iterate above procedures to time T and then the log-likelihood can be obtained as

lnL =T∑

t=1

ln

M∑

j=1

M∑

i=1

f(yt|st = j, st−1 = i,Ψt−1)P (st = j, st−1 = i|Ψt−1)

.

2.2 Issues Related to Markov-Switching Models

2.2.1 Kim’s Smoothing Algorithm

Consider the joint probability that st = j and st+1 = k based on full information:

P (st = j, st+1 = k|ΨT )

13

Page 14: Markov-Switching Models

= P (st+1 = k|ΨT )× P (st = j|st+1 = k,ΨT )

= P (st+1 = k|ΨT )× P (st = j|st+1 = k,Ψt)

=P (st+1 = k|ΨT )× P (st = j, st+1 = k|Ψt)

P (st+1 = k|Ψt)

=P (st+1 = k|ΨT )× P (st = j|Ψt)× P (st+1 = k|st = j,Ψt)

P (st+1 = k|Ψt),

where the second equality in above equation comes from the fact that

P (st = j|st+1 = k,ΨT )

= P (st = j|st+1 = k, ht+1,T ,Ψt)

=P (st = j, ht+1,T |st+1 = k,Ψt)

P (ht+1,T |st+1 = k,Ψt)

=P (st = j|st+1 = k,Ψt)× P (ht+1,T |st = j, st+1 = k,Ψt)

P (ht+1,T |st+1 = k,Ψt)

= P (st = j|st+1 = k,Ψt).

Besides,

P (st = j|ΨT ) =M∑

k=1

P (st = j, st+1 = k|ΨT ).

Given P (ST |ΨT ) at the last iteration of the basic filter in previous section, then for two

states model, say, P (st = 0|ΨT ) = π′

0 and P (st = 1|ΨT ) = 1−π′

0. Then at time t = T −1,

P (sT−1 = 0, sT = 0|ΨT )

=P (sT = 0|ΨT )× P (sT−1 = 0|ΨT−1)× P (sT = 0|sT−1 = 0)

P (sT = 0|ΨT−1)(54)

P (sT−1 = 0, sT = 1|ΨT )

=P (sT = 1|ΨT )× P (sT−1 = 0|ΨT−1)× P (sT = 1|sT−1 = 0)

P (sT = 1|ΨT−1)(55)

P (sT−1 = 1, sT = 0|ΨT )

=P (sT = 0|ΨT )× P (sT−1 = 1|ΨT−1)× P (sT = 0|sT−1 = 1)

P (sT = 0|ΨT−1)(56)

P (sT−1 = 1, sT = 1|ΨT )

=P (sT = 1|ΨT )× P (sT−1 = 1|ΨT−1)× P (sT = 1|sT−1 = 1)

P (sT = 1|ΨT−1). (57)

Combining (54) and (55) we have

P (sT−1 = 0|ΨT ) = (54) + (55), (58)

and combining (56) and (57) we have

P (sT−1 = 1|ΨT ) = (56) + (57). (59)

14

Page 15: Markov-Switching Models

At time T − 2,

P (sT−2 = 0, sT−1 = 0|ΨT )

=P (sT−1 = 0|ΨT )× P (sT−2 = 0|ΨT−2)× P (sT−1 = 0|sT−2 = 0)

P (sT−1 = 0|ΨT−2)(60)

P (sT−2 = 0, sT−1 = 1|ΨT )

=P (sT−1 = 1|ΨT )× P (sT−2 = 0|ΨT−2)× P (sT−1 = 1|sT−2 = 0)

P (sT−1 = 1|ΨT−2)(61)

P (sT−2 = 1, sT−1 = 0|ΨT )

=P (sT−1 = 0|ΨT )× P (sT−2 = 1|ΨT−2)× P (sT−1 = 0|sT−2 = 1)

P (sT−1 = 0|ΨT−2)(62)

P (sT−2 = 1, sT−1 = 1|ΨT )

=P (sT−1 = 1|ΨT )× P (sT−2 = 1|ΨT−2)× P (sT−1 = 1|sT−2 = 1)

P (sT−1 = 1|ΨT−2). (63)

Combining (60) and (61) we have

P (sT−2 = 0|ΨT ) = (60) + (61), (64)

and combining (62) and (63) we have

P (sT−1 = 1|ΨT ) = (62) + (63). (65)

Iterate above steps through T − 2, T − 3, . . . , 1, the smoothed probability can be cal-

culated given estimated parameters.

2.2.2 Derivation of Steady-State Probabilities Used to Start Filter

Let the transition probabilities for a first-order, M -state Markov-switching process st be

represented as

P ∗ =

P11 P21 · · · PM1

P12 P22 · · · PM2

......

......

P1M P2M · · · PMM

,

M∑

j=1

Pij = 1,∀i

where i′MP ∗ = i′M with iM = [1 1 · · · , 1]′.If let πt be a vector of M × 1 steady-state probabilities, we have

πt =

P (st = 1)

P (st = 2)...

P (st = M)

=

π1t

π2t...

πMt

, i′Mπt = 1.

15

Page 16: Markov-Switching Models

By definition of steady-state probabilities, πt=1 = P ∗πt and πt+1 = πt, thus

πt = P ∗πt ⇒ (IM − P ∗)πt = 0M×1.

Combining i′Mπt = 1, IM − P ∗

i′M

πt =

0M×1

1

, or Aπt =

0M×1

1

Thus,

πt = (A′A)−1A′

0M×1

1

.

That is, πt is the last column of the matrix (A′A)−1A′.

2.2.3 Expected Duration of a Regime in a Markov-Switching Models

Denote D as the duration of state j,

D = 1, if st = j, and st+1 6= j;P (D = 1) = 1− Pjj

D = 2, if st = st+1 = j, and st+2 6= j;P (D = 2) = Pjj(1− Pjj)

D = 3, if st = st+1 = st+2 = j, and st+3 6= j;P (D = 3) = P 2jj(1− Pjj)

D = 4, if st = st+1 = st+2 = st+3 = j, and st+4 6= j;P (D = 4) = P 3jj(1− Pjj)

... =...

Then, the expected duration of regime j can be derived as

IE(D) =∞∑

j=1

jP (D = j)

= 1× (1− Pjj) + 2× Pjj(1− Pjj) + 3× P 2jj(1− Pjj) + · · ·

= (1− Pjj)(1 + 2Pjj + 3P 2jj + 4P 3

jj + · · ·)

≈ 1

1− Pjj.

3 Alternative View on Estimation of Regime-Switching Mod-

els

Let {St}Tt=1 be the sample path of a first-order, two-state Markov process with transition

probability matrix P ∗ which is shown as follows.

P ∗ =

p00 p10

p01 p11

.

16

Page 17: Markov-Switching Models

It is obvious that the transition probabilities are time-invariant.

Let {yt}Tt=1 be the sample path of a time series that depends on {st}Tt=1 as follows:

(yt|st = i;αi)N(µi, σ

2i

),

where αi =(µi, σ

2i

)′, i = 0, 1. Thus, the density of yt conditional upon st is:

f (yt|st = i;αi) =1√2πσi

exp

(− (yt − µi)

2

2σ2i

), i = 0, 1. (66)

It will be convenient to stack the two sets of parameters governing the densities into a

(4× 1) vector, α =(α

0, α′

1

).

As we shall see, a quantity of particular interest in the likelihood function is P (s1),

which denotes P (S1 = s1) . There are two cases to consider, stationary and nonstationary.

In the stationary case,

P (s1) = P (s1|; θ) .

That is, P (S1) is simply the long-run probability of S1 = s1. In the nonstationary case, the

long-run probability does not exist, and so P (S1 = s1) must be treated as an additional

parameter to be estimated. It turns out, as we show subsequently, that P (S1 = 1) is all

that is needed to construct the first likelihood term. We shall call this quantity ′ρ′ in both

the stationary and nonstationary cases, remembering that in the stationary case ρ is not

an additional parameter to be estimated, while the nonstationary case ρ is an additional

parameter to be estimated.

Let θ =(α

, ρ)′

be the (5× 1) vector of all model parameters. The complete-data

likelihood is then

f(yT, sT |θ

)= f (y1, s1|θ)

T∏

t=2

f(yt, st|yt−1

, st−1; θ)

(67)

= f (y1, s1|θ)P (s1)T∏

t=2

f(yt|st, yt−1

, st−1; θ)P(st|yt−1

, st−1; θ)

= f (y1|s1;α)P (s1)T∏

t=2

f (yt|st;α)P (st|st−1; θ)

here f denotes any density and underlining denotes past history of the variable from t = 1

to the variable subscript.

It will prove convenient to write the complete-data likelihood in terms of indicator

functions,

f(yT, sT |θ

)= [I (s1 = 1) f (y1|s1 = 1;α1) ρ+ I (s1 = 0) f (y1|s1 = 0;α0) (1− ρ)]

17

Page 18: Markov-Switching Models

×T∏

t=2

{I (st = 1, st−1 = 1) f (yt|st = 1;α1) p

11

+I (st = 0, st−1 = 1) f (yt|st = 0;α0) (1− p11)

+I (st = 1, st−1 = 0) f (yt|st = 1;α1) (1− p00)

+I (st = 0, st−1 = 0) f (yt|st = 0;α0) p00}

Conversion to log form yields

log f(yT, sT |θ

)= I (s1 = 1) [log f (y1|s1 = 1;α1) + log ρ]

+I (s1 = 0) [log f (y1|s1 = 0;α0) + log (1− ρ)]

+T∑

t=2

{I(st = 1) log f (yt|st = 1;α1) + I (st = 0) log f (yt|st = 0;α0)

+I (st = 1, st−1 = 1) log(p11)+ I (st = 0, st−1 = 1) log

(1− p11

)

+I (st = 1, st−1 = 0) log(1− p00

)+ I (st = 0, st−1 = 0) log

(p00)}

The complete-data log likelihood cannot be constructed in practice, because the complete

data are not observed. Conceptually, the fact that the states are unobserved is inconse-

quential, because the incomplete-data log likelihood may be obtained by summing over all

possible state sequences,

log f(yT|θ)= log

1∑

s1=0

1∑

s2=0

· · ·1∑

sT=0

f(yT, sT |θ

) , (68)

and them maximized with respect to θ. In practice, however, construction and numer-

ical maximization of the incomplete-data log likelihood in this way is computationally

intractable, as {st}Tt=1 may be realized in 2T ways. Therefore, following Hamilton’s (1990)

suggestion for the case of constant transition probabilities, we propose an EM algorithm

for maximization of the incomplete-data likelihood.

4 Model Estimation : the EM Algorithm

The steps of the EM algorithm include:

(1) Pick θ(0)

(2) Get :

P(st = 1|y

T; θ(0)

)∀t,

P(st = 0|y

T; θ(0)

)∀t,

P(st = 1, st−1 = 1|y

T; θ(0)

)∀t,

18

Page 19: Markov-Switching Models

P(st = 0, st−1 = 1|y

T; θ(0)

)∀t,

P(st = 1, st−1 = 0|y

T; θ(0)

)∀t,

P(st = 0, st−1 = 0|y

T; θ(0)

)∀t,

(3) Set θ(1) = argmaxθ E[log f

(yT, sT |θ(0)

)].

(4) Iterate to convergence.

Step (1) simply assigns an initial guess to the parameter vector, θ(0) , in order to

start the EM algorithm. Step (2) is the ’E’ (expectation) part of the algorithm, which

produces smoothed state probabilities conditional upon θ(0) , while step (3) is the ’M’

(maximization) part, which produces an updated parameter estimate, θ(1) , conditional

upon the smoothed state probabilities obtained in step (2). The convergence criterion

adopted in (4) may be based upon various standard criteria, such as the change in the log

likelihood form one iteration to the nest, the value of the gradient vector, or∥∥∥θ(j) − θ(j−1)

∥∥∥,for various norms ‖•‖.

4.1 The expectation step

As in Hamilton (1990), this amounts to substitution of smoothed state probabilities for

indicator functions in the complete-data log likelihood,

E[log f

(yT, sT |θ(j−1)

)]= ρ(j−1)

[log f (y1|s1 = 1;α1) + log ρ(j−1)

](69)

+(1− ρ(j−1)

) [log f

(y1|s1 = 0;α

(j−1)0

)+ log

(1− ρ(j−1)

)]

+T∑

t=2

{P(st = 1|y

T; θ(j−1)

)log f

(yt|st = 1;α

(j−1)1

)

+P(st = 0|y

T; θ(j−1)

)log f

(yt|st = 0;α

(j−1)0

)

+P(st = 1, st−1 = 1|y

T; θ(j−1)

)log

(p11)

+P(st = 0, st−1 = 1|y

T; θ(j−1)

)log

(1− p11

)

+P(st = 1, st−1 = 0|y

T; θ(j−1)

)log

(1− p00

)

+P(st = 0, st−1 = 0|y

T; θ(j−1)

)log

(p00)}

where the smoothed state probabilities are obtained from the optimal smoother, con-

ditional upon the current ’best guess’ of θ, θ(j−1).

Given θ(j−1) , and yT, the algorithm for calculating the smoothed state probabilities

for iteration j is as follows:

1. Calculate the sequence of conditional densities of yt , and transition probabilities

given by the transition matrix.

19

Page 20: Markov-Switching Models

2. Calculate filtered joint state probabilities (a (T −1×4) matrix) by iterating on steps

2a-2d below for t = 2, ..., T :

(a) Calculate the joint conditional distribution of (yt, st, st−1) given yt−1

(four num-

bers): For t = 2, the joint conditional distribution is given by

f(y2, s2,s1|y1; θ(j−1)

)= f

(y2|s2;α(j−1)

)P(s2|s1; θ(j−1)

)P (s1) (70)

For subsequent time t, the joint conditional distribution is

f(yt, st,st−1|yt−1

; θ(j−1))

=T∑

t=2

f(yt|st;α(j−1)

)P(st|st−1; θ

(j−1))

P(st−1,st−2|yt−1

; θ(j−1))

(71)

where the conditional density f(yt|st;α(j−1)

)and transition probabilities P

(st|st−1; θ

(j−1))

are given by step 1, and P(st−1,st−2|yt−1

; θ(j−1))is the filtered probability re-

sulting from execution of step 2 for the previous t value.

(b) Calculate the conditional likelihood of yt (one number):

f(yt|yt−1

; θ(j−1))=

1∑

st=0

1∑

st−1=0

f(yt, st,st−1|yt−1

; θ(j−1)). (72)

(c) Calculate the time-t filtered state probabilities (four numbers):

P(st,st−1|yt; θ

(j−1))=

f(yt, st,st−1|yt; θ

(j−1))

f(yt|yt; θ(j−1)

) (73)

where the numerator is the joint conditional distribution of (yt, st, st−1) from

step a and the denominator is the conditional likelihood of yt from step b above.

(d) These four filtered probabilities are used as input for step a to calculate the

filtered probabilities for the next time period, and steps 2a - 2d are repeated

(T − 2) times.

3. Calculate the smoothed joint state probabilities as follows (a (T − 1× 6) matrix):

(a) For t = 2 and a given valuation of (st, st−1), sequentially calculate the joint

probability of (sτ , sτ−1, st, st−1) given yτand xτ , for τ = t+ 2, t+ 3, ..., T :

P(sτ , sτ−1, st, st−1|yτ ; θ

(j−1))=

1∑sτ−2=0

f(yτ |sτ ;α(j−1)

)P(sτ |sτ−1; θ

(j−1))P(sτ−1,sτ−2, st,st−1|yτ−1

; θ(j−1))

f(yτ |yτ−1

; θ(j−1)) (74)

20

Page 21: Markov-Switching Models

where the first two terms in the numerator are given by step 1, the third by the

previous step 3a computation, and the denominator by step 2b. When τ = t+2,

the third term in the numerator is initialized with the following expression:

P(st+1, st, st−1|yt+1

; θ(j−1))=

f(yt+1|st+1;α

(j−1))P(st+1|st; θ(j−1)

)P(st,st−1|yt; θ

(j−1))

f(yt|yt; θ(j−1)

) (75)

For each τ value we reduce a (4× 1) vector of probabilities corresponding to

the four possible valuations of (sτ,sτ−1). Thus, upon reaching τ = T , we have

computed and saved a (T × 3)× 4 matrix, the last row of which is used in step

3b below.

(b) Upon reaching τ = T , the smoothed joint state probability for time t and the

chosen valuation of (st, st−1) is calculated as

P(st,st−1|yT ; θ

(j−1))=

1∑

sT=0

1∑

sT−1=0

P(sT , sT , st, st−1|yT ; θ

(j−1)). (76)

(c) Steps 3a and 3b are repeated for all possible time t valuations (st, st−1), un-

til a smoothed probability has been calculated for each of the four possible

valuations. At this point we have a (1× 4) vector of smoothed joint state

probabilities for (st, st−1) .

(d) Steps 3a-3c are repeated for t = 3, 4, ..., T,yielding a total of (T − 1× 4) smoothed

joint state probabilities.

4. Smoothed marginal state probabilities are found by summing over the smoothed

joint state probabilities. For example,

P(st = 1|y

T; θ(j−1)

)= P

(st = 1, st−1 = 1|y

T; θ(j−1)

)

+P(st = 1, st−1 = 0|y

T; θ(j−1)

)

These (T − 1× 6) smoothed state probabilities are used as input for the maximiza-

tion step, which we now describe.

4.2 The maximization step

Given the smoothed state probabilities, the expected complete-data log likelihood, given

by (1.6) is maximized directly with respect to the model parameters.

∂E[log f

(yT, sT |θ(j−1)

)]

∂µ(j)i

21

Page 22: Markov-Switching Models

=T∑

t=1

{P(st = i|y

T; θ(j−1)

)(yt − µ

(j)i

)}= 0 (77)

µ(j)i =

∑Tt=1 ytP

(st = i|y

T; θ(j−1)

)

∑Tt=1 P

(st = i|y

T; θ(j−1)

) , i = 0, 1.

∂E[log f

(yT, sT |θ(j−1)

)]

∂σ(j)i

=T∑

t=1

{P(st = i|y

T; θ(j−1)

) [−(σ2i

)(j)+(yt − µ

(j)i

)2]}= 0 (78)

(σ2i

)(j)=

∑Tt=1

(yt − µ

(j)i

)2P(st = i|y

T; θ(j−1)

)

∑Tt=1 P

(st = i|y

T; θ(j−1)

) , i = 0, 1.

ρ(j) = P(st = 1|y

T; θ(j−1)

)(79)

∂E[log f

(yT, sT |xT ; θ(j−1)

)]

∂p(j)11

=T∑

t=2

P(st = 1, st−1 = 1|y

T, xT ; θ

(j−1))× 1

p(j)11

+P(st = 0, st−1 = 1|y

T, xT ; θ

(j−1))× −1

1−p(j)11

= 0 (80)

1− p(j)11

p(j)11

=

∑Tt=2 P

(st = 0, st−1 = 1|y

T; θ(j−1)

)

∑Tt=2 P

(st = 1, st−1 = 1|y

T; θ(j−1)

)

p(j)11 =

∑Tt=2 P

(st = 1, st−1 = 1|y

T; θ(j−1)

)

∑Tt=2

{P(st = 1, st−1 = 1|y

T; θ(j−1)

)+ P

(st = 0, st−1 = 1|y

T; θ(j−1)

)}

p(j)11 =

∑Tt=2 P

(st = 1, st−1 = 1|y

T; θ(j−1)

)

∑Tt=2 P

(st−1 = 1|y

T; θ(j−1)

)

∂E[log f

(yT, sT |θ(j−1)

)]

∂p(j)00

=T∑

t=2

P(st = 1, st−1 = 0|y

T; θ(j−1)

)× −1

1−p(j)00

+P(st = 0, st−1 = 0|y

T; θ(j−1)

)× 1

p(j)00

= 0

p(j)00 =

∑Tt=2 P

(st = 0, st−1 = 0|y

T; θ(j−1)

)

∑Tt=2 P

(st−1 = 0|y

T; θ(j−1)

)

22

Page 23: Markov-Switching Models

5 R code for Markov-switching Models

5.1 fMarkovSwitching

> data(indep)

> data(dep)

>

> S<-c(1,0,0) # where to switch (in this case in the olny indep)

> k<-2 # number of states

> distIn<-"t" #distribution assumption

>

> myModel<-MS_Regress_Fit(dep,indep,S,k) # fitting the model

> print(myModel) # printing output

***** Numerical Optimization for MS Model Converged *****

Final log Likelihood: 2486.471

Number of parameters: 8

Distribution Assumption -> Normal

***** Final Parameters *****

---> Non Switching Parameters <---

Non Switching Parameter at Indep Column 2

Value: 0.4797

Std error: 0.0269 (0.00)

Non Switching Parameter at Indep Column 3

Value: 0.1564

Std error: 0.0333 (0.00)

---> Switching Parameters <---

State 1

Model Standard Deviation: 0.0288

Std Error: 0.0014 (0.00)

State 2

23

Page 24: Markov-Switching Models

Model Standard Deviation: 0.0163

Std Error: 0.0005 (0.00)

Switching Parameters for Indep Column 1

State 1

Value: 0.0006

Std error: 0.0017 (0.74)

State 2

Value: 0.0003

Std error: 0.0007 (0.65)

---> Transition Probabilities Matrix <---

0.98 0.01

0.02 0.99

---> Expected Duration of Regimes <---

Expected duration of Regime #1: 50.56 time periods

Expected duration of Regime #2: 94.08 time periods

1.0

6 Likelihood Ratio Test for Markov-switching Models

Given the log likelihood function as in (81):

lnL =T∑

t=1

ln[f(yt|st, st−1)]

=T∑

t=1

ln

M∑

j=1

M∑

i=1

f(yt|st = j, st−1 = i,Ψt−1)P (st = j, st−1 = i|Ψt−1)

, (81)

rewrite the log likelihood as LT (θ|ST ), where θ is the unknown parameter vector under

investigation and ST is the sample path of state variable. That is

LT (θ|ST ) =T∑

t=1

lt(θ|St).

24

Page 25: Markov-Switching Models

25

Page 26: Markov-Switching Models

The null and alternative hypotheses are

H0 : θ = θ0 vs Ha : θ 6= θ0.

Define the likelihood ratio (LR) function:

LRT (θ) = LT (θ|ST )− LT (θ0|ST ) =T∑

t=1

lt(θ|St)−T∑

t=1

lt(θ0|St).

Since the likelihood ratio surface is simply a level-shift of the likelihood surface, the maxi-

mum likelihood estimator is given by the parameter values which maximize the likelihood

ratio surface. Therefore, the likelihood ratio test statistic for H0 against Ha is given by

the supremum of the likelihood ratio surface:

LRT = supθ

LRT (θ).

As the LR surface can be decomposed into its mean and deviation from mean:

LRT (θ) = RT (θ) +QT (θ) (82)

where RT (θ) = E[LRT (θ)] is the mean, and

QT (θ) =T∑

t=1

qt(θ),

is the deviation from the mean, where

qt(θ) = [lt(θ)− lt(θ0)]− E[lt(θ)− lt(θ0)].

It is clear that the function RT (θ) is maximized precisely at the true parameter vector

(which is θ0 under the null) and then RT (θ) is nonpositive, and strictly negative for θ 6= θ0.

Under properly standardized,

1√TQT (α) =

1√T

T∑

t=1

qt(θ) ⇒ Q(θ), (83)

where Q(θ) is a mean zero Gaussian process with covariance function

K(θ1,θ2) = E[qt(θ1)qt(θ2)′]. (84)

The function K(·, ·) describe the covariances between Q(θ) at different values of θ.

The decomposition of (82) can be rewritten as asymptotic approximation:

1√TLRT (θ) =

1√TRT (θ) +

1√TQT (θ)

1√TRT (θ) +Q(θ) + op(1).

26

Page 27: Markov-Switching Models

Since RT (θ) is non-positive, it is obvious that

1√TLRT (θ) ≤

1√TQT (θ) ⇒ Q(θ). (85)

Therefore, since LRT = supθ LRT (θ),

P

{1√TLRT ≥ x

}≤ P

{supθ

1√TQT (θ) ≥ x

}→ P

{supθ

1√TQ(θ) ≥ x

}. (86)

Taking V (θ) = K(θ,θ) and its sample analogue,

VT (θ) =T∑

t=1

[lt(θ)− lt(θ0)−

1

nLRT (θ)

]2

=T∑

t=1

[lt(θ)− lt(θ0)]2 − 1

nLRT (θ)

2,

the statistic

Q∗

T (θ) =QT (θ)

VT (θ)1/2⇒ Q(θ)

V (θ)1/2= Q∗(θ) ≡ N(0, 1).

Define the standardized likelihood ratio statistic:

LR∗

T = supθ

LRT (θ) = supθ

LRT (θ)

VT (θ)1/2

It is clear that the the statistic has the bound:

P (LR∗

T ≥ c) ≤ P (supθ

Q∗

T (θ) ≥ c) → P (supθ

Q∗(θ) ≥ c) = F ∗(c).

For the cases allowing for nuisance parameters, the regression model is supposed to

have the log-likelihood

LT (θ,γ,λ) =T∑

t=1

lt(θ,γ,λ)

with nuisance parameter vectors γ ∈ Γ,λ ∈ Λ. The hypothesis is

H0 : θ = 0 H1 : θ 6= 0.

Assume that λ is fully identified, but γ is not identified under H0. To apply the likelihood

the likelihood ratio test introduced previously, the parameter vector λ has to be eliminated.

Define the sequence of parameter estimates

λ̂T (θ,γ) = maxλ

LT (θ,γ,λ) (87)

which are the maximum likelihood estimates of λ for fixed values of θ and γ. The con-

centrated likelihood function is then

L̂T (θ,γ) = LT (θ,γ, λ̂T (θ,γ)).

27

Page 28: Markov-Switching Models

The likelihood ratio process, its large-sample counterpart, expectation and centred version

are

L̂RT (θ,γ) = L̂T (θ,γ)− L̂(0,γ),

LRT (θ,γ) = LT (θ,γ)− LT (0,γ)

RT (θ,γ) = E[LRT (θ,γ)]

Q̂T (θ,γ) = L̂RT (θ,γ)−RT (θ,γ)

QT (θ,γ) = LRT (θ,γ)−RT (θ,γ).

Also assume that an empirical process central limit theorem holds:

1√TQT (θ,γ) ⇒ Q(θ,γ) (88)

where Q(θ,γ) is a Gaussian process with covariance function

K((θ1,γ1), (θ2,γ2)) = limT→∞

1

TE[QT (θ1,γ1)QT (θ2,γ2)].

Set V (θ,γ) = K((θ,γ), (θ,γ)) to be the associate variance function.

As RT (θ,γ) ≤ 0 under the null hypothesis,

1√TL̂RT (θ,γ) ≤

1√TQ̂T (θ,γ) =

1√TQT (θ,γ) + op(1) ⇒ Q(θ,γ).

Construct the sample variance

VT ((θ,γ), β̂T (θ,γ)) =T∑

t=1

qt((θ,γ), β̂T (θ,γ))2,

where

qt((θ,γ), β̂T (θ,γ)) = lt((θ,γ), β̂T (θ,γ))− lt((0,γ), β̂T (0,γ))−1

TL̂RT (θ,γ).

The standardized LR function is defined as

L̂R∗

T (θ,γ) =L̂RT (θ,γ)

VT (θ,γ)1/2,

yielding the standardized LR statitic

L̂R∗

T = sup(θ,γ)

L̂R∗

T (θ,γ).

Define the centred stochastic processes

Q̂T (θ,γ) =Q̂T (θ,γ)

VT (θ,γ)1/2, QT (θ,γ) =

QT (θ,γ)

VT (θ,γ)1/2.

28

Page 29: Markov-Switching Models

and assume that Q∗

T (θ,γ) satisfies an empirical process law:

Q∗

T (θ,γ) ⇒ Q∗(θ,γ),

where Q∗(θ,γ) = Q(θ,γ)/V (θ,γ)1/2 is a Gaussian process with covariance function

K∗((θ1,γ1), (θ2,γ2)) =K((θ1,γ1), (θ2,γ2))

V (θ1,γ1)1/2V (θ2,γ2)1/2.

Theorem 1 in Hansen (1992) states that given T−1VT (θ,γ) converges in probability uni-

formly to V (θ,γ),

P{L̂R∗

T ≥ c} ≤ P{ sup(θ,γ)

Q̂∗

T (θ,γ) ≥ c} → P{SupQ∗ ≥ c}.

This theorem provides a bound for the standardized LR statistic in terms of the distribu-

tion of the random variable SupQ∗.

29