Lecture 12 Heteroscedasticity · • Now, we have the CLM regression with hetero-(different)...

RS – Lecture 12

1

1

Lecture 12Heteroscedasticity

• Use the GLS estimator with an estimate of 1. is parameterized by a few estimable parameters, = (θ) . Example: Harvey’s heteroscedastic model.

2. Iterative estimation procedure:

(a) Use OLS residuals to estimate the variance function.

(b) Use the estimated in GLS - Feasible GLS, or FGLS.

• True GLS estimator

bGLS = (X’Ω-1X)-1 X’Ω-1y (converges in probability to .)

• We seek a vector which converges to the same thing that this does. Call it FGLS, based on [X -1 X]-1 X -1 y

Two-Step Estimation of the GR Model: Review

RS – Lecture 12

2

Two-Step Estimation of the GR Model: Review

• Feasible GLS is based on finding an estimator which has the same properties as the true GLS.

Example: Var[i] = 2 Exp(zi).True GLS: Regress yi/[ Exp((1/2)zi)] on xi/[ Exp((1/2)zi)] FGLS: With a consistent estimator of [,], say [s,c], we do the same computation with our estimates.

Note: If plim [s,c] = [,], then, FGLS is as good as true GLS.

• Remark: To achieve full efficiency, we do not need an efficient estimate of the parameters in , only a consistent one.

Two-Step Estimation of the GR Model: FGLS

RS – Lecture 12

3

5

.

xx1 x2

f(y|x)

x3

..

E(y|x) = b0 + b1x

Heteroscedasticity• Assumption (A3) is violated in a particular way: has unequal variances, but i and j are still not correlated with each other. Some observations (lower variance) are more informative than others (higher variance).

Heteroscedasticity

• Now, we have the CLM regression with hetero-(different) scedastic (variance) disturbances.(A1) DGP: y = X + is correctly specified. (A2) E[|X] = 0(A3’) Var[i] = 2 i, i > 0. (CLM => i = 1, for all i.)(A4) X has full column rank – rank(X)=k-, where T ≥ k.

• Popular normalization: i i = 1. (A scaling, absorbed into 2.)

• A characterization of the heteroscedasticity: Well defined estimators and methods for testing hypotheses will be obtainable if the heteroscedasticity is “well behaved” in the sense that

i / i i → 0 as T → . -i.e., no single observation becomes dominant.

(1/T)i i → some stable constant. (Not a plim!)

RS – Lecture 12

4

GR Model and Testing

• Implications for conventional OLS and hypothesis testing:

1. b is still unbiased.

2. Consistent? We need the more general proof. Not difficult.

3. If plim b = , then plim s2 = 2 (with the normalization).

4. Under usual assumptions, we have asymptotic normality.

• Two main problems with OLS estimation under heterocedasticity:

(1) The usual standard errors are not correct. (They are biased!)

(2) OLS is not BLUE.

• Since the standard errors are biased, we cannot use the usual t-statistics or F --statistics or LM statistics for drawing inferences. This is a serious issue.

Heteroscedasticity: Inference Based on OLS

• Q: But, what happens if we still use s2(XX)-1?

A: It depends on XX - XX. If they are nearly the same, the OLS covariance matrix will give OK inferences.

But, when will XX - XX be nearly the same? The answer is based on a property of weighted averages. Suppose i is randomly drawn from a distribution with E[i] = 1. Then,

(1/T)i i xi2 E[x2] --just like (1/T)i xi

2.

• Remark: For the heteroscedasticity to be a significant issue for estimation and inference by OLS, the weights must be correlated with x and/or xi

2. The higher correlation, heteroscedasticity becomes more important (b is more inefficient).

p

RS – Lecture 12

5

• There are several theoretical reasons why the i may be related to x and/or xi

2:

1. Following the error-learning models, as people learn, their errors of behavior become smaller over time. Then, σ2

i is expected to decrease.

2. As data collecting techniques improve, σ2i is likely to decrease.

Companies with sophisticated data processing techniques are likely to commit fewer errors in forecasting customer’s orders.

3. As incomes grow, people have more discretionary income and, thus, more choice about how to spend their income. Hence, σ2

i is likely to increase with income.

4. Similarly, companies with larger profits are expected to show greater variability in their dividend/buyback policies than companies with lower profits.

Finding Heteroscedasticity

• Heteroscedasticity can also be the result of model misspecification.

• It can arise as a result of the presence of outliers (either very small or very large). The inclusion/exclusion of an outlier, especially if T is small, can affect the results of regressions.

• Violations of (A1) --model is correctly specified--, can produce heteroscedasticity, due to omitted variables from the model.

• Skewness in the distribution of one or more regressors included in the model can induce heteroscedasticity. Examples are economic variables such as income, wealth, and education.

• David Hendry notes that heteroscedasticity can also arise because of

– (1) incorrect data transformation (e.g., ratio or first difference transformations).

– (2) incorrect functional form (e.g., linear vs log–linear models).


RS – Lecture 12

6

• Heteroscedasticity is usually modeled using one the following specifications:

- H1 : σt2 is a function of past εt

2 and past σt2 (GARCH model).

- H2 : σt2 increases monotonically with one (or several) exogenous

variable(s) (x1,, . . . , xT ).

- H3 : σt2 increases monotonically with E(yt).

- H4 : σt2 is the same within p subsets of the data but differs across the

subsets (grouped heteroscedasticity). This specification allows for structural breaks.

• These are the usual alternatives hypothesis in the heteroscedasticitytests.


12

• Visual test

In a plot of residuals against dependent variable or other variable will

often produce a fan shape.

0

20

40

60

80

100

120

140

160

180

0 50 100 150

Series1


RS – Lecture 12

7

Testing for Heteroscedasticity

• Usual strategy when heteroscedasticity is suspected: Use OLS along the White estimator. This will give us consistent inferences.

• Q: Why do we want to test for heteroscedasticity?

A: OLS is no longer efficient. There is an estimator with lower asymptotic variance (the GLS/FGLS estimator).

• We want to test: H0: E(ε2|x1, x2,…, xk) = E(ε2) = 2

• The key is whether E[2] = 2i is related to x and/or xi2. Suppose

we suspect a particular independent variable, say X1, is driving i.

• Then, a simple test: Check the RSS for large values of X1, and the RSS for small values of X1. This is the Goldfeld-Quandt test.

• The Goldfeld-Quandt test

- Step 1. Arrange the data from small to large values of the independent variable suspected of causing heteroscedasticity, Xj.

- Step 2. Run two separate regressions, one for small values of Xj and one for large values of Xj, omitting d middle observations (≈ 20%). Get the RSS for each regression: RSS1 for small values of Xj and RSS2 for large Xj’s.

- Step 3. Calculate the F ratio

GQ = RSS2/RSS1, ~ Fdf,df with df =[(T – d) – 2(k+1)]/2 (A5 holds).

If (A5) does not hold, we have GQ is asymptoticallly χ2.


RS – Lecture 12

8

• The Goldfeld-Quandt test

Note: When we suspect more than one variable is driving the i’s,

this test is not very useful.

• But, the GQ test is a popular to test for structural breaks (two regimes) in variance. For these tests, we rewrite step 3 to allow for different size in the sub-samples 1 and 2.

- Step 3. Calculate the F-test ratio

GQ = [RSS2/ (T2 – k)]/[RSS1/ (T1 – k)]


• The Likelihood Ratio Test

Let’s define the likelihood function, assuming normality, for a general case, where we have g different variances:

We have two models:

(R) Restricted under H0: i2 = 2. From this model, we calculate ln L

(U) Unrestricted. From this model, we calculate the log likelihood.

Testing for Heteroscedasticity: LR Test

)ˆln(2

]1)2[ln(2

ln 2TT

LR

)()(1

2

1ln

22ln

2ln

12

1

2

iiii

g

i i

g

ii

i XyXyTT

L

)()(ˆ;ˆln2

]1)2[ln(2

ln 12

1

2 bXybXyTT

L iiiiTi

g

ii

iU

i

RS – Lecture 12

9

• Now, we can estimate the Likelihood Ratio (LR) test:

Under the usual regularity conditions, LR is approximated by a χ2g-1.

• Using specific functions for i2, this test has been used by

Rutemiller and Bowers (1968) and in Harvey’s (1976) groupwise heteroscedasticity paper.

Testing for Heteroscedasticity: LR Test

21

2

1

2 ˆlnˆln)ln(ln2

ga

i

g

iiRU TTLLLR

• Score LM tests

• We want to develop tests of H0: E(ε2|x1, x2,…, xk) = 2 against an H1 with a general functional form.

• Recall the central issue is whether E[2] = 2i is related to xand/or xi

2. Then, a simple strategy is to use OLS residuals to estimate disturbances and look for relationships between ei

2 and xi and/or xi2.

• Suppose that the relationship between ε2 and X is linear:

ε2 = Xα + vThen, we test: H0: α = 0 against H1: α ≠ 0.

• We can base the test on how the squared OLS residuals e correlate with X.


RS – Lecture 12

10

• Popular heteroscedasticity LM tests:

- Breusch and Pagan (1979)’s LM test (BP).

- White (1980)’s general test.

• Both tests are based on OLS residuals. That is, calculated under H0: No heteroscedasticity.

• The BP test is an LM test, based on the score of the log likelihood function, calculated under normality. It is a general tests designed to detect any linear forms of heteroskedasticity.

• The White test is an asymptotic Wald-type test, normality is not needed. It allows for nonlinearities by using squares and crossproducts of all the x’s in the auxiliary regression.


• Let’s start with a general form of heteroscedasticity:

hi(α0 + zi,1’ α1 + .... + zi,m’ αm) = i2

• We want to test: H0: E(εi2|z1, z2,…, zk) = hi(zi’α) =2

or H0: α1 = α2 =... = αm = (m restrictions)

• Assume normality. That is, the log likelihood function is:

log L = constant + ½Σ log 2i - ½ Σ εi

2/i2

Then, construct an LM test:

LM = S(θR)’ I(θR)-1 S(θR) θ= (β,α)

S(θ)=∂log L/∂θ’=[-Σi-2X’εi ;-½Σ(∂h/∂α)zii

-2+½Σi-4 εi

2(∂h/∂α)zi]

I(θ) = E[- ∂2log L/∂θ∂θ’]

• We have block diagonality, we can rewrite the LM test, under H0: LM = S(α0,0)’ [I22- I21 I11 I21]-1 S(α0,0)

Testing for Heteroscedasticity: BP Test

RS – Lecture 12

11

• We have block diagonality, we can rewrite the LM test, under H0: LM = S(α0,0)’ [I22- I21 I11 I21]-1 S(α0,0)

S(α0,0) = -½Σi (∂h/∂α|α0,R,0)z′ R-2 + ½ Σi R

-4ei2(∂h/∂α |α0,R,0)z′

= ½ R-2 (∂h/∂α |α0,R,0) Σi zi (ei

2/R2 - 1)

= ½ R-2 (∂h/∂α |α0,R,0) Σi zi ωi

ωi =ei2/R

2 – 1 = gi - 1

I22(α0,0) = E[- ∂2log L/∂ α∂ α’] = ½ [R-2 (∂h/∂α |α0,R,0 )]2 Σi zi zi′

I21(α0,0) = 0R

2 = (1/T) Σi ei2 (MLE of under H0).

Then,

LM = ½ (Σi zi ωi)′[Σi zi zi′]-1(Σi zi ωi)= ½ W′Z (Z′Z)-1 Z′W ~ χ2m

Note: Recall R2 = [y′X (X′X)-1 X′y – T ]/[y′y – T ] = ESS/TSS

Also note that under H0: E[ωi]=0, E[ωi2]=1.


2y 2y

• LM = ½ W′Z (Z′Z)-1 Z′W = ½ ESS

ESS = Explained SS in regression of ωi (=ei2/R

2 - 1) against zi.

• Under the usual regularity condition, and under H0,

√T (αML – α) N(0, 2 4 (Z′Z/T)-1)

Then,

LM-BP= (2 R4)-1 ESSe χ2

m

ESSe = ESS in regression of ei2 (=gi R

2) against zi.

Since R4 4 => LM-BP χ2

m

Note: Recall R2= [y′X (X′X)-1 X′y – T ]/[y′y – T ]

Under H0: E[ωi]=0, E[ωi2]=1, then, the LM test can be shown to be

asymptotically equivalent to a T R2. (Think of and y’y/T=1 above).


d

dp

d

2y 2y

0y

RS – Lecture 12

12

• The LM test is asymptotically equivalent to a T R2 test, where R2 is calculated from a regression of ei

2/R2on the variables Z.

• Usual calculation of the Breusch-Pagan test

- Step 1. (Auxiliary Regression). Run the regression of ei2on all the

explanatory variables, z. In our example,

ei2 = α0 + zi,1’ α1 + .... + zi,m’ αm + vi

- Step 2. Keep the R2 from this regression. Let’s call it Re22. Calculate

either

(a) F = (Re22/m)/[(1-Re2

2)/(T-(m+1)], which follows a Fm,(T-(m+1)

or

(b) LM = T Re22, which follows χ2

m.


• Variations:

(1) Glesjer (1969) test. Use absolute values instead of ei2 to estimate

the varying second moment. Following our previous example,

|ei|= α0 + zi,1’ α1 + .... + zi,m’ αm + vi

(2) Harvey-Godfrey (1978) test. Use ln(ei2). Then, the implied model

for i2 is an exponential model.

ln(ei2) = α0 + zi,1’ α1 + .... + zi,m’ αm + vi

Note: Implied model for i2 =expα0 + zi,1’ α1 + .... + zi,m’ αm + vi.


RS – Lecture 12

13

• Variations:

(3) Koenker’s (1981) studentized LM test. A usual problem with statistic LM is that it crucially depends on the assumption that ε is normal. Koenker (1981) proposed studentizing the statistic LM-BP by

LM-S = (2 R4) LM-BP/[Σ (εi

2-R2)2 /T] χ2

m


d

• Based on the difference between OLS and true OLS variances:

2 (XX - XX)= X X - 2XX = Σi (E[εi2] - 2)xi ′xi

• Empirical counterpart: (1/T) Σi (ei2 - s2)xi ′xi

• We can express each element of the k(k+1) matrix as:

(1/T) Σi (ei2 - s2)ψi ψi : Kolmogorov-Gabor polynomial

ψi = (ψ1i, ψ2i, ..., ψmi)’ ψli= ψqi ψpi p≥q, p,q=1,2,...,k

l= 1,2,...,m m=k(k-1)/2

• White heteroscedasticity test:

W = [(1/T) Σi (ei2 - s2)ψi]’ DT

-1 [(1/T) Σi (ei2 - s2)ψi] χ2

m

where

DT = Var [(1/T)Σi (ei2 - s2)ψi]]

Note: W is asymptotically equivalent to a T R2 test, where R2 is calculated from a regression of ei

2/R2on the ψi’s.

Testing for Heteroscedasticity: White Test

d

RS – Lecture 12

14

• Usual calculation of the White test

- Step 1. (Auxiliary Regression). Regress e2 on all the explanatory variables (Xj), their squares (Xj

2), and all their cross products. For example, when the model contains k = 2 explanatory variables, the test is based on:

ei2 =β0+ β1 x1,i +β2 x2,i+β3 x1,i

2+β4 x2,i2 + β5 x1x2,i + vi

Let m be the number of regressors in auxiliary regression. Keep R2, say Re2

2.

- Step 2. Compute the statistic

LM = T Re22, which follows a χ2

m.

Testing for Heteroscedasticity: White Test

• White Test for the 3 factor F-F model for IBM returns (T=320): IBMRet - rf = 0 + 1 (MktRet - rf) + 2 SMB + 4 HML +

> b <- solve(t(x)%*% x)%*% t(x)%*%y #OLS regression

> e <- y - x%*%b

> e2 <- e^2

> xx2 <- cbind(x1^2,x2^2,x3^2,x1*x2,x1*x3,x2*x3)

> fit2 <- lm(e2~xx2)

> r2_e2 <- summary(fit2)$r.squared

> r2_e2

[1] 0.0166492

> lm_t <- T*r2_e2

> lm_t

[1] 5.327744

> ncol(xx2)

[1] 6

LM-White Test: 5.33 ⟹ cannot reject H0 at 5% level (χ2[6],05≈11.07).

Testing for Heteroscedasticity: Examples

RS – Lecture 12

15

• Suppose we suspect that squared HML (x3) –a measure of Book-to-Market variance- drives heteroscedasticity. We do a LM-BP test:

> xx1 <- x3^2

> fit2 <- lm(e2~xx1)


> r2_e2

[1] 0.009285253

> lm_t <- T*r2_e2

> lm_t

[1] 2.971281

> 1-pchisq(lm_t,1)

[1] 0.08475472

LM-BP Test: 2.97 ⟹ cannot reject H0 at 5% level (χ2[1],05≈3.84);

with a p-value= .085.

Testing for Heteroscedasticity: Examples

• Drawbacks of the Breusch-Pagan test:

- It has been shown to be sensitive to violations of the normality assumption.

- Three other popular LM tests: the Glejser test; the Harvey-Godfrey test, and the Park test, are also sensitive to such violations.

• Drawbacks of the White test

- If a model has several regressors, the test can consume a lot of df’s.

- In cases where the White test statistic is statistically significant, heteroscedasticity may not necessarily be the cause, but model specification errors.

- It is general. It does not give us a clue about how to model heteroscedasticity to do FGLS. The BP test points us in a direction.

Testing for Heteroscedasticity: Remarks

RS – Lecture 12

16

• Drawbacks of the White test (continuation)

- In simulations, it does not perform well relative to others, especially, for time-varying heteroscedasticity, typical of financial time series.

- The White test does not depend on normality; but the Koenker’s test is also not very sensitive to normality. In simulations, Koenker’s test seems to have more power –see, Lyon and Tsai (1996) for a Monte Carlo study of the heteroscedasticity tests presented here.

.


• General problems with heteroscedasticity tests:

- The tests rely on the first four assumptions of the CLM being true.

- In particular, (A2) violations. That is, if the zero conditional mean assumption, then a test for heteroskedasticity may reject the null hypothesis even if Var(y|X) is constant.

- This is true if our functional form is specified incorrectly (omitted variables or specifying a log instead of a level). Recall David Hendry’s comment.

• Knowing the true source (functional form) of heteroscedasticity may be difficult. A practical solution is to avoid modeling heteroscedasticity altogether and use OLS along the White heterosekdasticity-robust standard errors.


RS – Lecture 12

17

Estimation: WLS form of GLS• While it is always possible to estimate robust standard errors for OLS estimates, if we know the specific form of the heteroskedasticity, we can obtain more efficient estimates than OLS: GLS.

• GLS basic idea: Efficient estimation through the transform the model into one that has homoskedastic errors – called WLS.

• Suppose the heteroskedasticity can be modeled as:

Var(ε|x) = 2 h(x)

• The key is to figure out what h(x) looks like. Suppose that we know hi. For example, hi(x)=xi

2. (make sure hi is always positive.)

• Then, use 1/√ xi2 to transform the model.

• Suppose that we know hi(x)=xi2. Then, use 1/√xi

2 to transform the model:

Var(εi/√hi|x) = 2

• Thus, if we divide our whole equation by √hi we get a (transformed) model where the error is homoskedastic.

• Assuming weights are known, we have a two-step GLS estimation:

- Step 1: Use OLS, then the residuals to estimate the weights.

- Step 2: Weighted least squares using the estimated weights.

• Greene has a proof based on our asymptotic theory for the asymptotic equivalence of the second step to true GLS.

Estimation: WLS form of GLS

RS – Lecture 12

18

• More typical is the situation where we do not know the form of the heteroskedasticity. In this case, we need to estimate h(xi).

• Typically, we start by assuming a fairly flexible model, such as

Var(ε|x) = h(x) = 2exp(X) --make sure Var(εi|x) >0.

Since we don’t know , it must be estimated. Our assumption implies that

ε2 = 2exp(X) v with E(v|X) = 1.

Then, if E(v) = 1

ln(ε2) = X + u (*)

where E(u) = 0 and u is independent of X.Now, we know that e is an estimate of ε, so we can estimate (*) by OLS.

Estimation: FGLS

• Now, an estimate of h is obtained as ĥ = exp(ĝ), and the inverse of this is our weight. Now, we can do GLS as usual.

• Summary of FGLS

(1) Run the original OLS model, save the residuals, e. Get ln(e2).

(2) Regress ln(e2) on all of the independent variables. Get fitted values, ĝ.

(3) Do WLS using 1/sqrt[exp(ĝ)] as the weight.

(4) Iterate to gain efficiency.

• Remark: We are using WLS just for efficiency – OLS is still unbiased and consistent. Sandwich estimator gives us consistent inferences.

Estimation: FGLS

RS – Lecture 12

19

• ML estimates all the parameters simultaneously. To construct the likelihood, we assume a distribution for ε. Under normality (A5):

• Suppose i2 = exp(α0 + zi,1 α1 + .... + zi,m αm )= exp(zi’α)

• Then, the first derivatives of the log likelihood wrt θ=(β,α) are:

• Then, we get the f.o.c. We get a non-linear system of equations.

Estimation: MLE

)()(1

2

1ln

2

12ln

2ln

12

1

2

iiii

T

i i

T

ii XyXy

TL

)1/(2

1)'exp(/)

2

1()'exp(/1

2

1ln

1

22

1 1

422

T

iiiiii

T

i

T

iiiiii

i

zzzzzL

12

1

'/ln

XL

i

T

iiix

• We take second derivatives to calculate the information matrix :

• Then,

• We can estimate the model using Newton’s method:

θj+1 = θj – Ht-1 gt gt = ∂log L/∂θ’

Estimation: MLE

T

iiiii

i

zxL

1

2/'2

1

'

ln

XXL

i

T

iii

12

1

2

'/''

ln

xx

T

iiiii

ii

zzL

1

22 /'2

1

'

ln

ZZ

XXLEI

'2

10

0']

'

ln[)(

1

RS – Lecture 12

20

• We estimate the model using Newton’s method:

θj+1 = θj – Hj-1 gj gj = ∂log Lj/∂θ’

Since Ht is block diagonal,

βj+1 = βj – (X’ Σj-1 X)-1 X’ Σj

-1 εj

αj+1 = αj – (½Z’Z)-1 [½ Σi zi (εi2/i

2 - 1)] = αj – (Z’Z)-1 Z’v,

where

v = (εi2/i

2 - 1)]

Convergence will be achieved when gj = ∂log Lj/∂θ’ is close to zero.

• We have an iterative algorithm => Iterative FGLS= MLE!

Estimation: MLE

• A log transformation of the data, can eliminte (or reduce) a certain type of heteroskedasticity.

- Assume - t = E[Zt]

- Var[Zt] = δ t2 (Variance proportional to the

squared mean)

• We log transformed the data: log(Zt). Then, we use the delta method to approximate the variance of the transformed variable. Recall: Var[f(X)] using delta method:

• Then, the variance of log(Zt) is roughly constant:

][)/1()][log( 2ttt ZVarZVar

Heteroscedasticity: Log Transformations

][)()]([ 2' XVarfXfVar

RS – Lecture 12

21

• Until the early 1980s econometrics had focused almost solely onmodeling the means of series -i.e., their actual values.

yt = Et[ yt|x] + εt, εt ~ D(0,σ2)Suppose we have an AR(1) process:

yt = α + β yt-1 + εt.Then, the conditional mean, conditioning on information at time t, is:

Et-1[ yt|x] = Et[ yt] = α + β yt-1

Note: The unconditional mean and variance are:E[yt] = α/(1-β) = constant Vart[yt] = σ2/(1-β2) = constant

The conditional mean is time varying; the unconditional mean is not!

Key distinction: Conditional vs. Unconditional moments.

ARCH Models

• Similar idea for the varianceUnconditional variance:

Var [yt] = E[(yt –E[yt])2] = σ2/(1-β2)

Conditional variance: Vart-1[yt] = Et-1[(yt –Et-1[yt])2] = Et-1[εt2]

Remark: Again, conditional moments are time varying; unconditional moments are not!

ARCH Models

RS – Lecture 12

22

• The unconditional variance measures the overall uncertainty. In theAR(1) example, time t-1 information plays no role: Var[yt] = σ2/(1-β2)• The conditional variance, Vart-1[yt], is a better measure ofuncertainty at time t-1. It is a function of information at time t-1.

mean Variance

Conditional Variance

ARCH Models

Time

yt

t-1

- Thick tails - Mandelbrot (1963): leptokurtic (thicker than Normal)

- Volatility clustering - Mandelbrot (1963): “large changes tend to be followed by large changes of either sign.”

- Leverage Effects – Black (1976), Christie (1982): Tendency for changes in stock prices to be negatively correlated with changes in volatility.

- Non-trading Effects, Weekend Effects – Fama (1965), French and Roll (1986) : When a market is closed information accumulates at a different rate to when it is open –for example, the weekend effect, where stock price volatility on Monday is not three times the volatility on Friday.

ARCH Models: Stylized Facts of Asset Returns

RS – Lecture 12

23

- Expected events – Cornell (1978), Patell and Wolfson (1979), etc:Volatility is high at regular times such as news announcements orother expected events, or even at certain times of day –for example,less volatile in the early afternoon.

- Volatility and serial correlation – LeBaron (1992): Inverse relationship between the two.

- Co-movements in volatility – Ramchand and Susmel (1998): Volatility is positively correlated across markets/assets.

• We need a model that accommodates all these facts.


Figure: Descriptive Statistics and Distribution for Monthly S&P500 Returns

Statistic

Mean (%) 0.626332(p-value: 0.0004)

Standard Dev (%) 4.37721

Skewness -0.43764

Excess Kurtosis 2.29395

Jarque-Bera 145.72(p-value: <0.0001)

AR(1) 0.0258(p-value: 0.5249)

• Easy to check leptokurtosis (Stylized Fact #1)


RS – Lecture 12

24

• Easy to check Volatility Clustering (Stylized Fact #2)


-0.25

-0.2

-0.15

-0.1

-0.05

0

0.05

0.1

0.15

0.2

1964

1965

1966

1967

1969

1970

1971

1972

1974

1975

1976

1977

1979

1980

1981

1982

1984

1985

1986

1987

1989

1990

1991

1992

1994

1995

1996

1997

1999

2000

2001

2002

2004

2005

2006

2007

2009

2010

2011

2012

2014

Figure: Monthly S&P500 Returns (1964:1-2014:9)

ARCH Models: Engle (1982)

2

1

2211

2

2

)()()(

),0(~

LEVar

NXYq

iitittttt

ttttt

22ttt define

ttt L 22 )(

• This is an AR(q) model for squared innovations. That is, we have an ARCH model: Auto-Regressive Conditional Heteroskedasticity

This model estimates the unobservable (latent) variance.

Note: We are dealing with a variance, we usually impose ω>0 and αi >0 for all i.

• We start with assumptions (A1) to (A5), but with a specific (A3’):

Robert F. Engle, USA

RS – Lecture 12

25


211

2

2 ),0(~

tt

ttttt NXY

• The unconditional variance is determined by:

That is,

To obtain a positive σ2, we impose another restriction: (1-Σi αi)>0.

• Example: ARCH(1)

• We need to impose restrictions: α1>0 & 1 - α1>0.

q

ii

q

iitit EE

1

2

1

222 ][][

q

ii

1

2

1

• Even though the errors may be serially uncorrelated they are not independent: There will be volatility clustering and fat tails. Let’s define standardized errors:

tttz /• They have conditional mean zero and a time invariant conditional variance equal to 1. That is, zt ~ D(0,1). If zt is assumed to follow a N(0,1), with a finite fourth moment (use Jensen’s inequality). Then:

.3)(/)()(

)(3)()()()()()()(224

22224224444

ttt

tttttttt

EE

EEzEEzEEzEE

• For an ARCH(1), the 4th moment for an ARCH(1):

.13)31/()1(3)( 222 ift


RS – Lecture 12

26

• More convenient, but less intuitive, presentation of the ARCH(1) model:

ttt 2

where υt is i.i.d. with mean 0, and Var[υt]=1. Since υt is i.i.d., then:

211

21

21

221

21 ][][][][ tttttttttt EEEE

• It turns out that σt2 is a very persistent process. Such a process can

be captured with an ARCH(q), where q is large. This is not efficient.


• In practice, q is often large. A more parsimonious representation is the Generalized ARCH model or GARCH(q,p):

2

1

2

1

2jtj

p

jjti

q

it

22 )()( LL

GARCH: Bollerslev (1986)

• A more parsimonious representation is the GARCH(q,p):

2

1

2

1

2jtj

p

jjti

q

it

tttdefine 22

tt LLL )())()(( 22

which is an ARMA(max(p,q),p) model for the squared innovations.

• Popular GARCH model: GARCH(1,1):

with an unconditional variance: Var[εt2] = σ2 = ω /(1- α1 - β1).

=> Restrictions: ω>0, α1>0, β1>0; (1- α1 - β1)>0.

21

21

21 ttt

RS – Lecture 12

27

• Technical details: This is covariance stationary if all the roots of α(L) + β(L) = 1

lie outside the unit circle. For the GARCH(1,1) this amounts toα1 + β1 < 1.

• Bollerslev (1986) showed that if 3α12 + 2α1β1 + β1

2 < 1, the second and 4th (unconditional) moments of εt exist:

0)321()321)(1(

)1(3][

)1(][

2111

212

1112

111

112

4

11

2

ifE

E

t

t

GARCH: Bollerslev (1986)

• Consider the forecast in a GARCH(1,1) model

)()( 2221

21

221

21

21 tttttttt zz

Taking expectation at time t

)1(][ 1122

1 tttE

Then, by repeated substitutions:

jt

j

i

ijttE )(])([][ 11

21

011

2

As j→∞, the forecast reverts to the unconditional variance:

ω/(1-α1-β1).

jE tjtt 22 ][

• When α1+β1=1, today’s volatility affect future forecasts forever:

GARCH: Forecasting and Persistence

RS – Lecture 12

28

• In the GARCH-X model, exogenous variables are added to the conditional variance equation.

Consider the GARCH(1,1)-X model:

),( 12

112

112 tttt Xf

where f(Xt, ) is strictly positive for all t. Usually, Xt, is an observed economic variable or indicator, say liquidity index, and f(.) is a non-linear transformation, which is non-negative.

Examples: Glosten et al. (1993) and Engle and Patton (2001) use 3-mo T-bill rates for modeling stock return volatility. Hagiwara and Herce (1999) use interest rate differentials between countries to model FX return volatility. The US congressional budget office uses inflation in an ARCH(1) model for interest rate spreads.

GARCH-X

• Recall the technical detail: The standard GARCH model:222 )()( LLt

is covariance stationary if α(1) + β(1) < 1.

• But strict stationarity does not require such a stringent restriction (That is, that the unconditional variance does not depend on t).

In the GARCH(1,1) model, if α1 + β1 =1, we have the Integrated GARCH (IGARCH) model.

• In the IGARCH model, the autoregressive polynomial in the ARMA representation has a unit root: a shock to the conditional variance is “persistent.”

IGARCH

RS – Lecture 12

29

• That is, today’s variance remains important for future forecasts of all horizons.

• Nelson (1990) establishes that, as this satisfies the requirement for strict stationarity, it is a well defined model.

• In practice, it is often found that α1 + β1 are close to 1.

• It is often argued that IGARCH is a product of omitted variables; For example, structural breaks. See Lamoreux and Lastrapes (1989), Hamilton and Susmel (1994), & Mikosch and Starica (2004).

• Shepard and Sheppard (2010) argue for a GARCH-X explanation.

jE tjtt 22 ][

IGARCH

• Variance forecasts are generated with:

• EGARCH model -- Nelson (Econometrica, 1991). It models an exponential function for the time-varying variance:

p

jjtj

q

iitititit zEzz

1

2

1

2 )log(|))||(|()log(

2

1

2

1

2

1

2 * jtj

p

jititi

q

iiti

q

it I

• GJR-GARCH model -- Glosten Jagannathan and Runkle (JF, 1993):

• Remark: Both models capture sign (asymmetric) effects in volatility: Negative news (εt-i<0) increase the conditional volatility (leverage effect).

where It-i=1 if εt-i<0; 0 otherwise.

GARCH: Variations

RS – Lecture 12

30

• Non-linear ARCH model NARCH -- Higgins and Bera (1992) and Hentschel (1995).

These models apply the Box-Cox-type transformation to the conditional variance:

jtj

p

jiti

q

it

11

||

Note: The variance depends on both the size and the sign of the variance which helps to capture leverage type (asymmetric) effects.

Special case: γ=2 (standard GARCH model).

GARCH: Variations

• Threshold ARCH (TARCH) --Rabemananjara, R. and J.M. Zakoian (JAE, 1993).

Large events to have an effect but no effect from small events

q

i

p

j

jtjititiitit II1 1

222 )))()((

Many other versions are possible by adding minor asymmetries or non-linearities in a variety of ways.

There are two variances:

)(,

)(,

1 1

222

1 1

222

it

q

i

p

j

jtjitiit

it

q

i

p

j

jtjitiit

if

if

GARCH: Variations

RS – Lecture 12

31

• Switching ARCH (SWARCH) - Hamilton and Susmel (JE, 1994).

Intuition: σt2 depends on the state of the economy –regime. It’s based on

Hamilton’s (1989) time series models with changes of regime:

The key is to select a parsimonious representation:

GARCH: Variations

q

itssisst tttt

1

21,,,

211

q

i s

iti

s

t

itt 1

22

For a SWARCH(1) with 2 states (1 and 2) we have 4 possible σt2:

2,2,/

1,2,/

2,1,/

1,1,/

12

112

12

112

122

112

12

112

222

122

11

111

tttt

tttt

tttt

tttt

ss

ss

ss

ss

• All of these models can be estimated by maximum likelihood. First we need to construct the sample likelihood.

• Since we are dealing with dependent variables, we use the conditioning trick to get the joint distribution:

ARCH Estimation: MLE

T

t ttt

TTT

T

XYyf

yyyf

yyfyfyyyfL

1 1

1111

1212121

));,|(log(

));,..,,,..,|(log(....

....));,,|(log());|(log());,...,,(log(

θ

θxx

θxxθxθ 1

Taking logs

).;,..,,,..,|(....

)....;,,|();|();,...,,(

1111

1212121

θxx

θxxθxθ 1

TTT

T

yyyf

yyfyfyyyf

We maximize this function w.r.t. the k mean parameters (γ) and the mvariance parameters (ω, α, β).

RS – Lecture 12

32

• Note that the δL/δγ=0 (k f.o.c.’s) will give us GLS.

• Denote δL/δθ=S(yt,θ)=0 (S(.) is the score vector)

- We have a (k+m x k+m) system. But, it is a non-linear system. We will need to use numerical optimization. Gauss-Newton or BHHH (also approximates H by the product of S(yt,θ)’s) can be easily implemented.

- Given the AR structure, we will need to make assumptions about σ0

(and ε0,ε1 , ..εp if we assume an AR(p) process for the mean).

- Alternatively, we can take σ0 (and ε0,ε1 , ..εp) as parameters to be estimated (it can be computationally more intensive and estimationcan lose power.)


• If the conditional density is well specified and θ0 belongs to Ω, then

• Common practice in empirical studies: Assume the necessary regularity conditions are satisfied.

T

t

tt ySTAwhereANT

1

0110

100

2/1 ),(),,0()ˆ(

• Under the correct specification assumption, A0=B0, where

T

ttttt ySySETB

100

10 ])',(),,([

• The estimator B0 has a computational advantage over A0.: Only first derivatives are needed. But A0=B0 only if the distribution is correctlyspecified. This is very difficult to know in practice.

We estimate A0 and B0 by replacing θ0 by its estimated MLE value.


RS – Lecture 12

33

Example: ARCH(1) model.

T

t

T

tttt

T

tt

TfL

1 1

211

2211

1

)/(2

1)log(

2

1)2log(

2)log(

Taking derivatives with respect to θ=(ω,α,γ), where γ=k mean pars:

2211

1 1

2211 )(/)2/1()/(1

t

T

t

T

ttt

L

2211

1 1

21

2211

21

1

)(/)2/1()/(

t

T

t

T

ttttt

L

/)log(2

1)2log(

2)log(

1

222

1

T

ittt

T

tt

TfL

• Assuming normality, we maximize with respect to θ the function:

2

1

/ t

T

ttt

L

xγ

ARCH Estimation: MLE - Example

• Then, we set the f.o.c. => δL/δθ=0.

• We have a (k+2) system. It is a non-linear system. We will use numerical optimization; usually, Gauss-Newton or BHHH.

• Again, note that δL/δγ=0 (k f.o.c.’s) will give us GLS.


0),,(/)( 2

1

MLEMLEt

T

tMLEtt

L γxγ

RS – Lecture 12

34

• Log likelihood of AR(1)-GARCH(1,1) Model:

ARCH Estimation: MLE – Example (in R)

log_lik_garch11 <- function(theta, data) mu <- theta[1]; rho1 <- theta[2]; alpha0 <- abs(theta[3]); alpha1 <- abs(theta[4]); beta1 <-abs(theta[5]); chk0 <- (1 - alpha1 - beta1)r <- ts(data)n <- length(r)

u <- vector(length=n); u <- ts(u)for (t in 2:n)u[t] = r[t]- mu – rho*r[t-1] # this setup allows for ARMA in mean

h <- vector(length=n); h <- ts(h)h[1] = alpha0/chk0 # set initial value for h[t] seriesif (chk0==0) h[1]=.000001 # check to avoid dividing by 0for (t in 2:n)h[t] = abs(alpha0 + alpha1*(u[t-1]^2)+ beta1*h[t-1])if (h[t]==0) h[t]=.00001 #check to avoid log(0)

return(-sum(-0.5*log(2*pi) - 0.5*log(abs(h[2:n])) - 0.5*(u[2:n]^2)/abs(h[2:n])))

• To maximize the likelihood we use optim (mln can also be used):


dat_xy <- read.csv("http://www.bauer.uh.edu/rsusmel/phd/chfusd_17.csv",head=TRUE,sep=",")summary(dat_xy)names(dat_xy)

z <- dat_xy$CHFUSD # CHF/USD 1971-2017 monthly data

theta0 = c(0.05, 0.1, 0.1, 0.25, 0.7) # initial valuesml_2 <- optim(theta0, log_lik_garch11, data=z, method="BFGS", hessian=TRUE)

ml_2$par # estimated parameters

I_Var_m2 <- ml_2$hessianeigen(I_Var_m2) # check if Hessian is pd.sqrt(diag(solve(I_Var_m2))) # parameters SE

chf_usd <- ts(z, frequency=12, start=c(1971,1))plot.ts(chf_usd) # time series plot of data

RS – Lecture 12

35


> ml_2$par[1] -0.17496654 0.25333506 0.82101387 0.06514787 0.82392626> > I_Var_m2 <- ml_2$hessian> eigen(I_Var_m2) # check if Hessian is pd.eigen() decomposition$values[1] 18103.991574 592.516649 504.252999 80.455651 4.248681

$vectors[,1] [,2] [,3] [,4] [,5]

[1,] 0.0003057871 0.03980084 -0.02946857 0.998757150 0.005617731[2,] -0.0007327615 -0.69827499 0.71413675 0.048926415 -0.005138999[3,] -0.0974743965 0.09280099 0.08323696 0.004341412 -0.987390236[4,] -0.6800704864 -0.52322506 -0.51289888 0.006067750 -0.025250547[5,] -0.7266376298 0.47796595 0.46813102 -0.005890306 0.156092801

> sqrt(diag(solve(I_Var_m2))) # parameters SE[1] 0.11140088 0.04324649 0.47905926 0.03405586 0.08114472

AR(1)-GARCH(1,1): Example – USD/CHF

• We estimate an AR(1)-GARCH(1,1) for St - USD/CHF:

st = [log(St) - log(St-1)]x100 = a0 + a1st-1 + εFt, εtΨt-1~N(0,σ2t).

σ2t = α0 + α1 e2

t-1 + ß1 σ2t-1.

• T: 562 (January 1971 - September 2017, monthly).

The estimated model for st is given by:st = -0.175 + 0.253 st-1,

(.111) (0.432)σ2

t = 0.821 + 0.065 e2t-1 + 0.824 σ2

t-1.(0.479) (0.034) (0.081)

Note: α1 + ß1 =.889 < 1. (Persistent.)

RS – Lecture 12

36

AR(1)-GARCH(1,1): Example – USD/CHF

Note: The appeal of MLE is the optimal properties of the resulting estimators under ideal conditions.

• Crowder (1976) gives one set of sufficient regularity conditions for the MLE in models with dependent observations to be consistent and asymptotically normally distributed.

• Verifying these regularity conditions is very difficult for general ARCH models - proof for special cases like GARCH(1,1) exists.

For example, for the GARCH(1,1) model: if E[ln(α1zt2 +β1)] < 0, the

model is strictly stationary and ergodic. See Nelson (1990) & Lumsdaine (1996).

ARCH Estimation: MLE – Regularity Conditions

RS – Lecture 12

37

• Block-diagonalityIn many applications of ARCH, the parameters can be partitioned into mean parameters, θ1, and variance parameters, θ2.

Then, δμt(θ)/δθ2=0 and, although, δσt(θ)/δθ1≠0, the Information matrix is block-diagonal (under general symmetric distributions for zt

and for particular ARCH specifications).

Not a bad result:- Regression can be consistently done with OLS.- Asymptotically efficient estimates for the ARCH parameters can be obtained on the basis of the OLS residuals.

ARCH Estimation: MLE – Regularity Conditions

• But, block diagonality cannot buy everything:- Conventional OLS standard errors could be terrible.

- When testing for serial correlation, in the presence of ARCH, the conventional Bartlett s.e. – T-1/2– could seriously underestimate the true standard errors.

ARCH Estimation: MLE – Remarks

RS – Lecture 12

38

• The assumption of conditional normality is difficult to justify in many empirical applications. But, it is convenient.

• The MLE based on the normal density may be given a quasi-maximum likelihood (QMLE) interpretation.

• If the conditional mean and variance functions are correctly specified, the normal quasi-score evaluated at θ0 has a martingale difference property:

EδL/δθ=S(yt,θ0)=0

Since this equation holds for any value of the true parameters, the QMLE, say θQMLE is Fisher-consistent –i.e., E[S(yT, yT-1,…y1 ; θ)] = 0 for any θ∈Ω.

ARCH Estimation: QMLE

• The asymptotic distribution for the QMLE takes the form:

).,0()ˆ( 100

100

2/1 ABANT QMLE

The covariance matrix (A0-1 B0 A0

-1) is called “robust.” Robust to departures from “normality.”

• Bollerslev and Wooldridge (1992) study the finite sample distribution of the QMLE and the Wald statistics based on the robust covariance matrix estimator:

For symmetric departures from conditional normality, the QMLE is generally close to the exact MLE.

For non-symmetric conditional distributions both the asymptotic and the finite sample loss in efficiency may be large.

ARCH Estimation: QMLE

RS – Lecture 12

39

• The basic GARCH model allows a certain amount of leptokurtosis. It is often insufficient to explain real world data.

Solution: Assume a distribution other than the normal which help to allow for the fat tails in the distribution.

• t Distribution - Bollerslev (1987)The t distribution has a degrees of freedom parameter which allows greater kurtosis. The t likelihood function is

)ln(5.0)))2(1()2()5.0())1(5.0(ln( 22/)1(12/11t

vtt vzvvvl

where Γ is the gamma function and v is the degrees of freedom.As υ→∞, this tends to the normal distribution.

• GED Distribution - Nelson (1991)

ARCH Estimation: Non-Normality

• Suppose we have an ARCH(q). We need moment conditions:

0)]...1/([][)3(

0)]('[][)2(

0)]('[][)1(

12

3

222

1

qt

tt

t

EmE

EmE

yEmE

2t

tt

ε

γxx

Note: (1) refers to the conditional mean, (2) refers to the conditional variance, and (3) to the unconditional mean.

• GMM objective function:

)];([)]';([)(^^

yX,θmWyX,θmθy;X, EEQ

where

]']'[]'[]'[[)];([ 3

^

2

^

1

^^

mmmyX,θm EEEE

ARCH Estimation: GMM

RS – Lecture 12

40

• γ has k free parameters; α has q free parameters. Then, we have r=k+q+1 parameters.

m(θ;X,y) has r = k+q+1 equations.

Dimensions: Q is 1x1; E[m(θ;X,y)] is rx1; W is rxr.

• Problem is over-identified: more equations than parameters so cannot solve E[m(θ;X,y)]=0, exactly.

• Choose a weighting matrix W for objective function and minimize using numerical optimization.

• Optimal weighting matrix: W =[E[m(θ;X,y)]E[m(θ;X,y)]’]-1.Var(θ)=(1/T)[DW-1D’]-1,

where D = δE[m(θ;X,y)]/δθ’ --expressions evaluated at θGMM.

ARCH Estimation: GMM

• Standard BP test , with auxiliary regression given by:et

2 = α0 + α1 et-12 + .... + αm et-q

2 + vt

H0: α1= α2=...= αq= 0 (No ARCH). It is not possible to do GARCH test, since we are using the same lagged squared residuals.

Then, the LM test is (T-q)R2 χ2q -- Engle’s (1982).

• In ARCH Models, testing as usual: LR, Wald, and LM tests.

Reliable inference from the LM, Wald and LR test statisticsgenerally does require moderately large sample sizes of at least twohundred or more observations.

ARCH Estimation: Testing

d

RS – Lecture 12

41

• Issues:

- Non-negative constraints must be imposed. θ0 is often on the

boundary of Ω. (Two sided tests may be conservative.)

- Lack of identification of certain parameters under H0 creates a

singularity of the Information matrix under H0. For example, under

H0: α1=0 (No ARCH), in the GARCH(1,1), ω and β1 are not jointly

identified. See Davies (1977).

• Ignoring ARCH

- You suspect yt has an AR structure: yt = γ0 + γ1 yt-1 + εt

Hamilton (2008) finds that OLS t-test with no correction for ARCH spuriously reject H0: γ1=0 with arbitrarily high probability for sufficiently large T. White’s (1980) SE help. NW SE help less.


Figure. From Hamilton (2008). Fraction of samples in which OLS t-test leads to rejection of H0: γ1=0 as a function of T for regression with Gaussian errors (solid line) and Student’s t errors (dashed line). Note: H0 is actually true and test has nominal size of 5%.


RS – Lecture 12

42

• ARCH Test for the 3 factor F-F model for IBM returns (T=320), with one lag:

IBMRet - rf = 0 + 1 (MktRet - rf) + 2 SMB + 4 HML +

> b <- solve(t(x)%*% x)%*% t(x)%*%y #OLS regression

> e <- y - x%*%b

> e2 <- e^2

> xx1 <- e2[1:T-1]

> fit2 <- lm(e2[2:T]~xx1)


> r2_e2

[1] 0.2656472

> lm_t <- (T-1)*r2_e2

> lm_t

[1] 84.74147

LM-ARCH Test: 84.74 ⟹ reject H0 at 5% level (χ2[1],05≈3.84), the

usual result for financial time series.

Testing for Heteroscedasticity: ARCH

• Questions1) Lots of ARCH models. Which one to use?2) Choice of p and q. How many lags to use?

• Hansen and Lunde (2004) compared lots of ARCH models:

- It turns out that the GARCH(1,1) is a great starting model.

- Add a leverage effect for financial series and it’s even better.

- A t-distribution is also a good addition.

ARCH: Which Model to Use

RS – Lecture 12

43

• French, Schwert and Stambaugh’s (1987) use higher frequency

to estimate the variance as:

k

itt r

ks

1

21

2 1

where rt is realized returns in days, and we estimate monthly variance.

• Model-free measure –i.e., no need for ARCH-family specifications. It is very popular for intra-daily data, called high frequency (HF) data. The measure is called realized volatility (RV).

• Very popular to calculate intra-day or daily volatility. For example, based on TAQ data, say, 1’ or 10’ realized returns (rt,j is jth interval return on day t), we can calculate the daily variance, or RVt:

Realized Volatility (RV) Models

TtrRVM

jjtt ,....,2,1,

1

2,


RS – Lecture 12

44

TtrRVM

jjtt ,....,2,1,

1

2,

where rt,j is jth interval return on day t. That is, RV is defined as the sum of squared intraday returns.

• We can use time series models –say an ARIMA- for RVt to forecast daily volatility.

• RV is affected by microstructure effects: bid-ask bounce, infrequent trading, calendar effects, etc.. For example, the bid-ask bounce induces serial correlation in intra-day returns, which biases RVt. (Big problem!)

-Proposed Solution: Filter the intra-day returns using MA or AR models before constructing RV measures.


• Under some conditions (bounded kurtosis and autocorrelation of squared returns less than 1), RVt is consistent and m.s. convergent.• Realized volatility is a measure. It has a distribution.• For returns, the distribution of RV is non-normal (as expected). It tends to be skewed right and leptokurtic. For log returns, the distribution is approximately normal.• Daily returns standardized by RV measures are nearly Gaussian.• RV is highly persistent.• The key problem is the choice of sampling frequency (or number of observations per day).

— Bandi and Russell (2003) propose a data-based method for choosing frequency that minimizes the MSE of the measurement error.— Simulations and empirical examples suggest optimal sampling is around 1-3 minutes for equity returns.

Realized Volatility (RV) Models - Properties

RS – Lecture 12

45

RV Models - Variation

• Another method: AR model for volatility:

The εt are estimated from a first step procedure -i.e., a regression. Asymmetric/Leverage effects can also be introduced.

OLS estimation possible. Make sure that the variance estimates are positive.

ttt |||| 1

• The Parkinson’s (1980) estimator: s2

t=Σt [ln(Ht)-ln(Lt)]2 /(4ln(2)T),

where Ht is the highest price and Lt is the lowest price.

• There is an RV counterpart, using HF data: Realized Range (RR): RRt=Σj [100x(ln(Ht,j)-ln(Lt,j))]2 /(4ln(2),

where Ht,j and Lt,j are the highest and lowest price in the jth interval.

• These “range estimators are very good and very efficient.

Reference: Christensen and Podolskij (2005).

Other Models - Parkinson’s (1980) estimator

RS – Lecture 12

46

),0(~; 21 Ntttt

),0(~;loglog 21 Ntttt

Or using logs:

• The difference with ARCH models: The shocks that govern the volatility are not necessarily εt’s.

• Usually, the standard model centers log volatility around ω:

Then, E[log(σt)] = ωVar[log(σt)] = κ2 συ2/(1- β2).

⇒ Unconditional distribution: log(σt) ~ N(ω, κ2)

Stochastic volatility (SV/SVOL) models

• Now, instead of a known volatility at time t, like ARCH models, we allow for a stochastic shock to σt, υt:

ttt )(loglog 1

• We have 3 SVOL parameters to estimate: φ = (ω, β, σv).

• Estimation: - GMM: Using moments, like the sample variance and kurtosis of returns. Complicated -see Anderson and Sorensen (1996).- Bayesian: Using MCMC methods (mainly, Gibbs sampling). Modern approach.


•Like ARCH models, SV models produce returns with kurtosis > 3 (and, also, positive autocorrelations between squared excess returns):

Var[rt = E[(rt - E[rt])2] = E[σt2zt

2] = E[σt2] E[zt

2] = E[σt

2] = exp(2ω + 2 κ2 ) (property of log normal)

kurt[rt = E[(rt - E[rt])4] / (E[(rt - E[rt])2] )2 = E[σt

4] E[zt4] / (E[σt

2] )2 (E[zt2])2

= 3 exp(4ω + 8κ2 ) / exp(4ω + 4κ2 )= 3 exp(4κ2 ) > 3!

RS – Lecture 12

47


• The Bayesain approach takes advantage of the idea of hierarchical structure:- f(y|ht) (distribution of the data given the volatilities)- f(ht|φ) (distribution of the volatilities given the parameters)- f(φ) (distribution of the parameters)

Algorithm: MCMC (JPR (1994).)Augment the parameter space to include ht. Using a proper prior for f(ht,φ) MCMC methods provides inference about the joint posterior f(ht,φ|y). We’ll go over this topic in Lecture 17.

Classic references: Jacquier, E., Poulson, N., Rossi, P. (1994), “Bayesian analysis of stochastic volatility models,” Journal of Business and Economic Statistics. (Estimation). Heston, S.L. (1993), “A closed-form solution for options with stochastic volatility with applications to bond and currency options,” Review of Financial Studies. (Theory)

Lecture 12 Heteroscedasticity · • Now, we have the CLM regression with hetero-(different)...

Documents

Transcript of Lecture 12 Heteroscedasticity · • Now, we have the CLM regression with hetero-(different)...