Download - Financial Econometrics Econ 40357 Regression review, Time ...nmark/FinancialEconometrics...Inference in Time Series Regression So time-series econometricians do a thing called asymptotic

Transcript

Financial Econometrics Econ 40357Regression review, Time-series regression

Some Necessary Matrix Algebra (sorry, can’t avoidthis)

N.C. Mark

University of Notre Dame and NBER

2020

1 / 32
Regression reviewA time series is a sequence of observations over time. Let T besample size. We write the sequence as

{yt}Tt=1

We use notation such as this

µy = E (yt )σ2y = Var (yt )

2 / 32
Regression in population

We have in mind a joint distribution between two time series, ytand xt . This is our model.In finance, we are less concerned about exogeneity, instrumentalvariables, and establishing cause and effect.We are more concerned about understanding reduced formcorrelations. Understanding the statistical dependence acrosstime series, and dependence of observations across time.The cross-moments of the joint distribution.

3 / 32
Regression in populationWrite the population regression as

yt =

E(yt |xt )︷︸︸︷α + βxt + et (1)

The systematic part of regression is also called the projection. et isprojection error.Assume error is iid but not necessarily normal (what does this mean?).

et is i.i.d.(

0, σ2e)

Think of fitted part of regression as conditional expectation.Conditional expectation is the best predictor.Prediction means the same thing as forecast

We use regression for things like computing betas, which measure exposure ofan asset to risk factors.

4 / 32
Regression in population1 Take expectation of yt = α + βxt + et

E(yt ) = α + βE(xt ) + E(et )

Using the short-hand notation,

µy = α + βµx

rearrange to get,α = µy − βµx

2 Define ỹt ≡ yt − µy and x̃t ≡ xt − µx . The ‘tilde’ represents thevariable expressed as a deviation from its mean. Substitute thisexpression for α back into the regression (eq. 1). Doing so givesthe regression in deviations from mean form.

ỹt = βx̃t + et (2)

5 / 32
Regression in population

Multiply both sides of eq.(2) by x̃t , then take expectations on bothsides,

ỹt x̃t = βx̃t x̃t + et x̃tE(ỹt x̃t ) = βE(x̃t x̃t ) + E(et x̃t )

Solve for β

β =E(ỹt x̃t )E(x̃t )2

=Cov(yt , xt )

Var(xt )︸︷︷︸Interpretation

=σy ,x

σ2x︸︷︷︸Notation

=σy ,xσy σx

σyσx︸︷︷︸

Algebra

= ρy ,xσyσx︸︷︷︸

Interpretation

(3)

6 / 32
Regression in population

Take conditional expectation on both sides of original regressionconditional on xt .

E(yt |xt ) = α + βxtThere’s a theorem that says the conditional expectation is thebest linear predictor of yt conditional on xt . The best!Since regression is conditional expectation, this motivates using itas the forecast function.

7 / 32
Estimation of α and β by least squaresEviews will do this work for you.Least squares estimates are the sample counterparts to the populationparameters. What does this mean?In deviations from the mean form, solution to least squares problem is

β̂ =∑Tt=1 x̃t ỹt∑Tt=1 x̃

2t

=1T ∑

Tt=1 x̃t ỹt

1T ∑

Tt=1 x̃

2t

ỹt = β̂x̃t + êt

êt is residual, not error. β̂ is a random variable. To see this, makesubstitutions,

β̂ =∑ x̃t (βx̃t + et )

∑ x̃2t= β +

∑ x̃t et∑ x̃2t

β̂ is a linear combination of e′ts, which are random variables. Therefore β̂is a random variable, with a distribution.

8 / 32
Inference in Time Series Regression is a bit DifferentWhat is statistical inference?We want the sample standard deviation of β̂. It is called the standarderror of β̂, because β̂ is a statistic. What is a statistic?β̂ divided by standard error is the t-ratio.

We are mainly interested in t-ratios. Not interested in F-statistics.Nobody’s opinion ever was changed by having a significant F-statisticand insignifcant t-ratios.What is statistical significance?In your previous econometrics class, you assumed xt is exogenous.This alllows you to treat them as constants, and therefore, randomnessin β̂ is induced only by et .In time-series, we can’t do that (If yt is Amazon returns and xt is themarket return, how can we say the market is exogenous? or if xt = yt−1,how can we treat xt as constant? No way!).

9 / 32
Inference in Time Series RegressionLet’s pretend the x̃t are exogenous. Even more inappropriately, let’s pretendthey are non-stochastic constants.

Var

(∑Tt=1 x̃t et∑Tt=1 x̃

2t

)=

1(∑Tt=1 x̃

2t

)2 T∑t=1

(x̃21 σ

2e + x̃

22 σ

2e + · · · x̃2T σ2e

)

=∑ x̃2t(

∑Tt=1 x̃2t

)2 σ2e = σ2e∑ x̃2tThe standard deviation of the term is

sd

(∑Tt=1 x̃t et∑Tt=1 x̃

2t

)=

σe√∑ x̃2t

The standard error of the term, and hence of β̂ is

se(

β̂)=

σ̂e√∑ x̃2t

10 / 32
where we estimate σ̂e with the sample standard deviation of theregression residuals êt . This particular formula is true only when theerrors are iid , and for large (infinite) sample sizes. Why do we use?Because we can find the answer for large samples, and we hope that itis a good approximation to the exact true (but unknown) distribution

11 / 32
Inference in Time Series Regression

So time-series econometricians do a thing called asymptotictheory. They ask how the numerator and denominator,

numerator:1T

T

∑t=1

x̃t et → N(

0, σ2e Q)

denominator:1T

T

∑t=1

x̃2t → Q

behave as T → ∞. It is very complicated business involves a lot ofhigh-level math. Fortunately, at the end of the day, what comes outof all this is the same thing we learned in your first econometricsclass!

12 / 32
Least squares estimation of α and βThat is, we pretend we have an infinite sample size (T = ∞,) in which case,

t =(β̂− β)s.e.

(β̂) ∼ N (0,1)

se(

β̂)=

σ̂e√∑ x̃2t

(4)

σ̂2e =1T ∑ ê

2t (5)

The difference is we don’t consult the t-table or worry about degrees of freedom.We consult the standard normal table.

The strategy: The exact t-distribution is unknown (why?), so we use theasymptotic distribution (why?) and hope it is a good approximation to theunknown distribution.Finally, we are also interested in R2, the measure of goodness of fit.

R2 =SSRSST

=∑ ỹ2t∑ x̃2t

= 1− ∑ê2t

∑ x̃2t= 1− SSE

SST

13 / 32
Story behind the t-testIn the 1890s, William Gosset was studying chemical properties ofbarley with small samples, for the Guinness company (yes, thatGuinness). He showed his results to the great statistician Karl Pearsonat University College London, who mentored him.

Gossett published his work in the journal Biometrika, using thepseudonym Student, because he would have gotten in trouble atGuinness if he used his real name.

14 / 32
t-test review: two sided test

15 / 32
t-test review: one sided test

16 / 32
Some matrix algebra

Scalar: a single number

Matrix: a two-dimensional array. A =

a11 a12a21 a22a31 a23

is a (3× 2)matrix–that is, 3 rows and 2 columns. We say the number of rowsthen columns. a11 is the (1,1) element of A, and is a scalar. Thesubscripts of the elements tell us which row and column they arefrom.Vector: a one-dimensional array. If we take the first column of A

and call it A1 =

a11a21a31

, it is a (3× 1) column vector. If wetake the second row of A and call it A2 =

(a21 a22

), it is a

(1× 2) row vector.

17 / 32
Square matrix: An m× n matrix is square if m = n.

A =

a11 a12 a13a21 a22 a23a31 a32 a33

is a square matrix. The diagonal of thematrix A are the elements a11,a22,a33.It only makes sense to talk about the diagonal of square matrices.

Symmetric matrix. For a square matrix, if the elements aij = aji , fori 6= j , then the matrix is symmetric. (notice the correspondence of thebold entries).

A =

2 3 43 10 64 6 11

18 / 32
Transpose of a matrix. The i − th row becomes the i − th column.The transpose of an (m× n) matrix is (n××m) .

A =(

a11 a12 a13), then A′ = AT =

a11a12a13

.A =

a11a12a13

, then A′ = AT = ( a11 a12 a13 ) .A =

a11 a12a21 a22a31 a32

, then A′ = AT = ( a11 a21 a31a12 a22 a32

)

19 / 32
Zero matrix: All the entries are 0. A =

0 00 00 0

is a zero matrix.Identity matrix: A square matrix with 1s on the diagonal elements and0 on the off-diagonal elements is called the identity matrix.

I =

1 0 00 1 00 0 1

is a (3× 1) identity matrix. We always call anidentity matrix I.

20 / 32
Matrix addition and subtractionTo add two matrices or to subtract one from the other, they must have thesame dimensions. We do element by element addition or subtraction.

A =(

a11 a12a21 a22

), B =

(b11 b12b21 b22

),

AdditionC = A + B =

(a11 a12a21 a22

)+

(b11 b12b21 b22

)=

(a11 + b11 a12 + b12a21 + b21 a22 + b22

)︸︷︷︸

CSubtractionC = A− B =

(a11 a12a21 a22

)−(

b11 b12b21 b22

)=

(a11 − b11 a12 − b12a21 − b21 a22 − b22

)

21 / 32
Scalar multiplication. Let A =(

a11 a12a21 a22

), and c be a scalar.

cA = c(

a11 a12a21 a22

)=

(ca11 ca12ca21 ca22

)= Ac

Multiplying a matrix by a scalar means you multipliy every element bythat scaler.

22 / 32
Matrix multiplicationIf A is (m× n) and B is (n× k) , they can be multiplied as AB, becausethe columns of A matches the rows of B. But you cannot multiply BA,because the columns of B doesn’t match the rows of A. The result ofmultiplying an (m× n) matrix to an (n× k) matrix is (m× k) .

Let A be (1× 2) and B be (2× 1) . A is a row vector and B is a columnvector. C = AB Multiplication is to do element by elementmultiplication, then sum the result.

A =(

a11 a12), B =

(b11b21

),

C = AB =(

a11 a12) ( b11

b21

)= (a11b11 + a12b21) = C, (a scalar).

23 / 32
Next, let’s do it with actual matrices: Let

A =

a11 a12a21 a22a31 a32

, B = ( b11 b12b21 b22).

C = AB, is formed by cij = ∑ aijbji . The i , j element of C is formed from multiplyingrow i of A and column j of B.

C = AB =

a11 a12a21 a22a31 a32

( b11 b12b21 b22

)=

a11b11 + a12b21 a11b12 + a12b22a21b11 + a22b21 a21b12 + a22b22b11a31 + b21a32 a31b12 + a32b22

note: Even if A and B are both square matrices, the order matters. AB 6= BA.

24 / 32
Matrix InverseDeterminant of a (2× 2) matrix. Subtract the product of the off-diagonal elementsfrom the product of the diagonal elements.

Let A =(

a bc d

). |A| =det(A) = ad − bc.

Note: You can only get a determinant from square matrices. Calculating thedeterminant by hand from anything bigger than a (2× 2) is beyond the scope of thisclass. But that’s okay because we’ll be doing it by computer.

Inverse of square matrix. Is a matrix when multiplied by itself gives the identitymatrix. If A−1A = AA−1 = I, then A−1 is the inverse of A. To get the inverse of a(2× 2) matrix A (defined above), switch positions of the diagonal elements, multiplythe off diagonal elements by −1, then divide everything by the determinant of A.

A−1 = 1ad−bc

(d −b−c a

). Let’s check: 1ad−bc

(d −b−c a

)(a bc d

)=(

adad−bc −

bcad−bc 0

0 adad−bc −bc

ad−bc

)= I

Again, computing the inverse of anything bigger than a (2× 2) matrix by hand isbeyond the scope of this class. We just ask the computer to do it.

25 / 32
Regression in Matrix FormBegin with

yt = α + βxt + etstack the dependent variable observations in a column vector and independentvariables Independent variables: constant (a vector of 1s) and xt

y1y2...

yT

︸︷︷︸

y

=

1 x11 x2...

...1 xT

︸︷︷︸

X

(αβ

)︸︷︷︸

b

+

e1e2...

eT

︸︷︷︸

e

y = Xb + e

26 / 32
Multiply through by X ′

X ′y = X ′Xb + X ′e

X ′Xb = X ′ (y − e)

b =(X ′X

)−1 X ′y − (X ′X)−1 X ′eLeast squares forces the residuals ê to be uncorrelated with the regressors. X ′ ê = 0.Hence, in matrix form, the least squares formula is

b̂ =(X ′X

)−1 X ′y

27 / 32
Newey-WestMotivate by runing the market regression and look at the error terms.See if there is volatility clustering. The market model (no excess returnadjustment) for Disney, 8/20/2000 through 8/23/2019.

ri ,t = α + βrm,t + ei ,tDependent Variable: RET DISMethod: Least SquaresDate: 08/26/19 Time: 13:14Sample (adjusted): 8/29/2000 8/23/2019Included observations: 4603 after adjustmentsVariable Coefficient Std. Error t-Statistic Prob.C 0.000 0.000 1.090 0.276RET MKT 1.074 0.016 67.363 0.000R-squared 0.497 Mean dependent var 0.000Adjusted R-squared 0.496 S.D. dependent var 0.018S.E. of regression 0.013 Akaike info criterion -5.876Sum squared resid 0.755 Schwarz criterion -5.874Log likelihood 13526.560 Hannan-Quinn criter. -5.875F-statistic 4537.714 Durbin-Watson stat 2.119Prob(F-statistic) 0.000

28 / 32
Plot residuals

29 / 32
Newey-West

In basic econometrics, the reasoning was

Var(

β̂)

= Var(

∑ x̃t et∑ x̃2t

)=

Var (et )∑ x̃2t(∑ x̃2t

)2=

σ2e∑ x̃2t

= σ2e(X ′X

)−1numerator simplifies, variance of sum is sum of variances underindependence. But now, Var(et ) isn’t constant (due to conditionalheteroskedasticity) and isn’t independent (due to serialcorrelation).The variance after the first equals sign now has a bunch ofcovariance terms.

30 / 32
Newey-West

The Newey-West covariance estimator does the correctcalculations, taking into account these complicating factors.Instead of Var

(β̂)= σ2e (X ′X )

−1 , it is

Var(

β̂)

=(X ′X

)−1 ST (X ′X)−1 (6)ST = S0 +

1T

p

∑`=1

w (`)T

∑t=`+1

et et−` (xtxt−` + xt−`xt ) (7)

w (`) = 1− `p + 1

(8)

Don’t worry. There is an option in Eviews to do this computationand automatically get Newey-West t-ratios.Rule: In time-series regression always do Newey-West.

31 / 32
Newey-WestMethod: Least SquaresDate: 08/26/19 Time: 14:06Sample (adjusted): 8/29/2000 8/23/2019HAC standard errors & covarianceBartlett kernel, Newey-West fixed bandwidth = 10.00)Variable Coefficient Std. Error t-Statistic Prob.C 0.000 0.000 1.206 0.227RET MKT 1.0739 0.0208 51.473 0.00R-squared 0.496 Mean dependent var 0.000Adjusted R-squared 0.496 S.D. dependent var 0.018S.E. of regression 0.0128 Akaike info criterion -5.876Sum squared resid 0.755 Schwarz criterion -5.873Log likelihood 13526.56 Hannan-Quinn criter. -5.875F-statistic 4537.714 Durbin-Watson stat 2.118Prob(F-statistic) 0.000 Wald F-statistic 2649.517Prob(Wald F-statistic) 0.000

32 / 32