     • date post

26-Dec-2019
• Category

## Documents

• view

1

0

Embed Size (px)

### Transcript of Best Linear Prediction - UCSB's Department of econ.ucsb.edu/~doug/241b/Lectures/03 Best Linear......

• Best Linear Prediction Econometrics II

Douglas G. Steigerwald

UC Santa Barbara

D. Steigerwald (UCSB) Linear Prediction 1 / 29

• Overview Reference: B. Hansen Econometrics Chapter 2.18-2.19, 2.24, 2.33

How to approximate E (y jx)?

if x is continuous, E (y jx) is generally unknown I linear approximation xTβ

F β is the linear predictor (or projection) coe¢ cient (βlpc ) F β is not rxE (y jx)

I β is identied if E � xxT

� is invertible

linear prediction error u is uncorrelated with x by construction

D. Steigerwald (UCSB) Linear Prediction 2 / 29

• Approximate the CEF

conditional mean E (y jx) I "best" predictor (mean squared prediction error) I functional form generally unknown

F unless x discrete (and low dimension)

approximate E (y jx) with xTβ I linear approximation, thus a linear predictor

1 select β to form "best" linear predictor of y : P (y jx) 2 select β to form "best" linear approximation to E (y jx)

1 and 2 yield identical β I either criterion could be used to dene β I we use 1 and refer to xTβ as the best linear predictor

D. Steigerwald (UCSB) Linear Prediction 3 / 29

• Best Linear Predictor Coe¢ cient 1. select β to minimize mean-square prediction error

S (β) = E � y � xTβ

�2 β := βlpc satises

E � xxT

� βlpc = E (xy) Solution

2. select β to minimize mean-square approximation error

d (β) = Ex � E (y jx)� xTβ

�2 solution satises

E � xxT

� βlac = E (xy) Solution

D. Steigerwald (UCSB) Linear Prediction 4 / 29

• Identication

Identication (General) I θ and θ0 are separately identied i¤ Pθ 6= Pθ0 ) θ 6= θ0

Identication - Background

Identication (Best Linear Predictor)

I β and β0 are separately identied i¤ � E � xxT

���1 E (xy) from Pβ does

not equal � E � xxT

���1 E (xy) from Pβ0

I i.e. there is a unique solution to βlpc = � E � xxT

���1 E (xy)

I i.e. E � xxT

� is invertible

D. Steigerwald (UCSB) Linear Prediction 5 / 29

• Identication 2

Can we uniquely determine βlpc?

E � xxT

� βlpc = E (xy)

if E � xxT

� is invertible

I there is a unique value of βlpc that solves the equation

F βlpc is identied as there is a unique solution

if E � xxT

� is not invertible

I there are multiple values of βlpc that solve the equation

F βlpc is not identied as there is not a unique solution

F mathematically βlpc = � E � xxT

��� E (xy ) Generalized Inverse

I all solutions yield an equivalent best linear predictor xTβlpc F best linear predictor is identied

D. Steigerwald (UCSB) Linear Prediction 6 / 29

• Invertibility

Required assumption: E � xxT

� is positive denite

for any non-zero α 2 Rk :

αTE � xxT

� α = E

� αTxxTα

� = E

� αTx

�2 � 0 so E

� xxT

� is positive semi-denite by construction

positive semi-denite matrices are invertible IFF they are positive denite

if we assume E � xxT

� is positive denite, then

I E � αTx

�2 > 0

I there is no non-zero α for which αT x = 0

F implies there are no redundant variables in x F i.e. all columns are linearly independent

D. Steigerwald (UCSB) Linear Prediction 7 / 29

• Best Linear Predictor: Error

best linear predictor (linear projection)

P (y jx) = xTβlpc

decomposition

y = xTβlpc + u u = e + �

E (y jx)� xTβlpc �

choice of βlpc implies E (xu) = 0

I E (xu) = E � x � y � xTβlpc

�� =

E (xy)�E � xxT

� � E � xxT

���1 E (xy) = 0

I error from projection onto x is orthogonal to x

D. Steigerwald (UCSB) Linear Prediction 8 / 29

• Best Linear Predictor: Error Variance

Variance of u equals the variance of the error from a linear projection

Variance of u I Eu2 = E

� y � xTβ

�2 = Ey2 �E

� yxT

� β

F because E � xTβ

�2 = E

� yxT

� β

Variance of projection error I projection error is dened as kuk = kyk �

xTβ

I because y2 = kyk2

Var (u) = Var (kuk)

D. Steigerwald (UCSB) Linear Prediction 9 / 29

• Best Linear Predictor: Covariate Error Correlation

E (xu) = 0 is a set of k equations, as

E (xju) = 0

I if x includes an intercept, Eu = 0

because Cov (xj , u) = E (xju)�Exj �Eu

I covariates are uncorrelated with u by construction

for r � 2 if E jy jr < ∞ and E kxkr < ∞ then E jujr < ∞ I if y and x have nite second moments then the variance of u exists I note: E jy jr < ∞ ) E jy js < ∞ for all s � r (Liapunovs Inequality)

D. Steigerwald (UCSB) Linear Prediction 10 / 29

• Linear Projection Model

linear projection model is

y = xTβ+ u E (xu) = 0 β = �

E � xxT

���1 E (xy)

xTβ is the best linear predictor I not necessarily the conditional mean E (y jx)

β is the linear prediction coe¢ cient I not the conditional mean coe¢ cient if E (y jx) 6= xTβ I not a causal (structural) e¤ect if:

F E (y jx) 6= xTβ F E (y jx) = xTβ but rx e 6= 0

D. Steigerwald (UCSB) Linear Prediction 11 / 29

• How Does the Linear Projection Di¤er from the CEF? Example 1

CEF of log(wage) as a function of x (black and female indicators)

discrete covariates, small number of values, compute CEF

E (log (wage) jx) = �.20 black � .24 female + .10 inter + 3.06

I inter = black �female F 20% male race gap (black males 20% below white males) F 10% female race gap

Linear Projection of log(wage) on x (black and female indicators)

P (log(wage)jx) = �.15 black � .23 female + 3.06

I 15% race gap

F average race gap across males and females F ignores the role of gender in race gap, even though gender is included

D. Steigerwald (UCSB) Linear Prediction 12 / 29

• How Does the Linear Projection Di¤er from the CEF? Example 2

CEF of white male log(wage) as a function of years of education (ed)

discrete covariate with multiple values I could use categorical variables to compute CEF

F large number of values leads to cumbersome estimation

approximate CEF with linear projections

D. Steigerwald (UCSB) Linear Prediction 13 / 29

• Approximate CEF of Wage as a Function of Education Approximation 1

Linear Projection of log(wage) on x = ed

P (log(wage)jx) = 0.11 ed + 1.50

I 11% increase in mean wages for every year of education

works well for ed � 9, under predicts if education is lower D. Steigerwald (UCSB) Linear Prediction 14 / 29

• Approximate CEF of Wage as a Function of Education Approximation 2: Linear Spline

Linear Projection of log(wage) on x = (ed , spline)

P (log(wage)jx) = 0.02 ed + 0.10 spline + 2.30

I spline = (ed � 9) � 1 (ed) F 2% increase in mean wages for each year of education below 9 F 12% increase in mean wages for each year of education above 9

D. Steigerwald (UCSB) Linear Prediction 15 / 29

• How Does the Linear Projection Di¤er from the CEF? Example 3

CEF of white male (with 12 years of education) log(wage) as a function of years of experience (ex)

discrete covariate with large number of values I approximate CEF with linear projections

Linear Projection of log(wage) on x = ex I P (log(wage)jx) = 0.011 ex + 2.50

over predicts wage for young and old D. Steigerwald (UCSB) Linear Prediction 16 / 29

• Approximate CEF of Wage as a Function of Experience Approximation 2: Quadratic Projection

Linear Projection of log(wage) on x = � ex , ex2

� P (log(wage)jx) = 0.046 ex � 0.001 ex2 + 2.30

I rP = .046� .001 � ex F captures strong downturn in mean wage for older workers

D. Steigerwald (UCSB) Linear Prediction 17 / 29

• Properties of the Linear Projection Model

Assumption 1

I Ey2 < ∞ E kxk2 < ∞ Qxx = E � xxT

� is positive denite

Theorem: Under Assumption 1 1 E

� xxT

� and E (xy) exist with nite elements

2 The linear projection coe¢ cient exists, is unique, and equals

β = �

E � xxT

���1 E (xy)

3 P (y jx) = xT � E � xxT

���1 E (xy)

4 For u = y � xTβ, E (xu) = 0 and E � u2 � < ∞

5 If x contains a constant, Eu = 0 6 If E jy jr < ∞ and E kxkr < ∞ for r � 2, then E jujr < ∞

Proof

D. Steigerwald (UCSB) Linear Prediction 18 / 29

• Review

How do we approximate E (y jx)? xTβ

How to do you interpret β?

the linear projection coe¢ cient, which is not generally equal to rxE (y jx)

What is required for identication of β?

E � xxT

� is invertible

What is the correlation between x and u?

0 by