Download - Best Linear Prediction - UCSB's Department of …econ.ucsb.edu/~doug/241b/Lectures/03 Best Linear...Best Linear Prediction Econometrics II Douglas G. Steigerwald UC Santa Barbara D.

Transcript
Page 1: Best Linear Prediction - UCSB's Department of …econ.ucsb.edu/~doug/241b/Lectures/03 Best Linear...Best Linear Prediction Econometrics II Douglas G. Steigerwald UC Santa Barbara D.

Best Linear PredictionEconometrics II

Douglas G. Steigerwald

UC Santa Barbara

D. Steigerwald (UCSB) Linear Prediction 1 / 29

Page 2: Best Linear Prediction - UCSB's Department of …econ.ucsb.edu/~doug/241b/Lectures/03 Best Linear...Best Linear Prediction Econometrics II Douglas G. Steigerwald UC Santa Barbara D.

OverviewReference: B. Hansen Econometrics Chapter 2.18-2.19, 2.24, 2.33

How to approximate E (y jx)?

if x is continuous, E (y jx) is generally unknownI linear approximation xTβ

F β is the linear predictor (or projection) coe¢ cient (βlpc )F β is not rxE (y jx)

I β is identi�ed if E�xxT� is invertible

linear prediction error u is uncorrelated with x by construction

D. Steigerwald (UCSB) Linear Prediction 2 / 29

Page 3: Best Linear Prediction - UCSB's Department of …econ.ucsb.edu/~doug/241b/Lectures/03 Best Linear...Best Linear Prediction Econometrics II Douglas G. Steigerwald UC Santa Barbara D.

Approximate the CEF

conditional mean E (y jx)I "best" predictor (mean squared prediction error)I functional form generally unknown

F unless x discrete (and low dimension)

approximate E (y jx) with xTβ

I linear approximation, thus a linear predictor

1 select β to form "best" linear predictor of y : P (y jx)2 select β to form "best" linear approximation to E (y jx)

1 and 2 yield identical β

I either criterion could be used to de�ne βI we use 1 and refer to xTβ as the best linear predictor

D. Steigerwald (UCSB) Linear Prediction 3 / 29

Page 4: Best Linear Prediction - UCSB's Department of …econ.ucsb.edu/~doug/241b/Lectures/03 Best Linear...Best Linear Prediction Econometrics II Douglas G. Steigerwald UC Santa Barbara D.

Best Linear Predictor Coe¢ cient1. select β to minimize mean-square prediction error

S (β) = E�y � xTβ

�2β := βlpc satis�es

E�xxT� βlpc = E (xy) Solution

2. select β to minimize mean-square approximation error

d (β) = Ex�E (y jx)� xTβ

�2solution satis�es

E�xxT� βlac = E (xy) Solution

D. Steigerwald (UCSB) Linear Prediction 4 / 29

Page 5: Best Linear Prediction - UCSB's Department of …econ.ucsb.edu/~doug/241b/Lectures/03 Best Linear...Best Linear Prediction Econometrics II Douglas G. Steigerwald UC Santa Barbara D.

Identi�cation

Identi�cation (General)I θ and θ0 are separately identi�ed i¤ Pθ 6= Pθ0 ) θ 6= θ0

Identi�cation - Background

Identi�cation (Best Linear Predictor)

I β and β0 are separately identi�ed i¤�E�xxT���1 E (xy) from Pβ does

not equal�E�xxT���1 E (xy) from Pβ0

I i.e. there is a unique solution to βlpc =�E�xxT���1 E (xy)

I i.e. E�xxT� is invertible

D. Steigerwald (UCSB) Linear Prediction 5 / 29

Page 6: Best Linear Prediction - UCSB's Department of …econ.ucsb.edu/~doug/241b/Lectures/03 Best Linear...Best Linear Prediction Econometrics II Douglas G. Steigerwald UC Santa Barbara D.

Identi�cation 2

Can we uniquely determine βlpc?

E�xxT

�βlpc = E (xy)

if E�xxT� is invertible

I there is a unique value of βlpc that solves the equation

F βlpc is identi�ed as there is a unique solution

if E�xxT� is not invertible

I there are multiple values of βlpc that solve the equation

F βlpc is not identi�ed as there is not a unique solution

F mathematically βlpc =�E�xxT��� E (xy ) Generalized Inverse

I all solutions yield an equivalent best linear predictor xTβlpcF best linear predictor is identi�ed

D. Steigerwald (UCSB) Linear Prediction 6 / 29

Page 7: Best Linear Prediction - UCSB's Department of …econ.ucsb.edu/~doug/241b/Lectures/03 Best Linear...Best Linear Prediction Econometrics II Douglas G. Steigerwald UC Santa Barbara D.

Invertibility

Required assumption: E�xxT� is positive de�nite

for any non-zero α 2 Rk :

αTE�xxT� α = E

�αTxxTα

�= E

�αTx

�2 � 0so E

�xxT� is positive semi-de�nite by construction

positive semi-de�nite matrices are invertible IFF they are positivede�nite

if we assume E�xxT� is positive de�nite, then

I E�αTx

�2> 0

I there is no non-zero α for which αT x = 0

F implies there are no redundant variables in xF i.e. all columns are linearly independent

D. Steigerwald (UCSB) Linear Prediction 7 / 29

Page 8: Best Linear Prediction - UCSB's Department of …econ.ucsb.edu/~doug/241b/Lectures/03 Best Linear...Best Linear Prediction Econometrics II Douglas G. Steigerwald UC Santa Barbara D.

Best Linear Predictor: Error

best linear predictor (linear projection)

P (y jx) = xTβlpc

decomposition

y = xTβlpc + u u = e +�

E (y jx)� xTβlpc

�choice of βlpc implies E (xu) = 0

I E (xu) = E�x�y � xTβlpc

��=

E (xy)�E�xxT� �E �xxT���1 E (xy) = 0

I error from projection onto x is orthogonal to x

D. Steigerwald (UCSB) Linear Prediction 8 / 29

Page 9: Best Linear Prediction - UCSB's Department of …econ.ucsb.edu/~doug/241b/Lectures/03 Best Linear...Best Linear Prediction Econometrics II Douglas G. Steigerwald UC Santa Barbara D.

Best Linear Predictor: Error Variance

Variance of u equals the variance of the error from a linear projection

Variance of uI Eu2 = E

�y � xTβ

�2= Ey2 �E

�yxT� β

F because E�xTβ

�2= E

�yxT� β

Variance of projection errorI projection error is de�ned as kuk = kyk �

xTβ

I because y2 = kyk2Var (u) = Var (kuk)

D. Steigerwald (UCSB) Linear Prediction 9 / 29

Page 10: Best Linear Prediction - UCSB's Department of …econ.ucsb.edu/~doug/241b/Lectures/03 Best Linear...Best Linear Prediction Econometrics II Douglas G. Steigerwald UC Santa Barbara D.

Best Linear Predictor: Covariate Error Correlation

E (xu) = 0 is a set of k equations, as

E (xju) = 0

I if x includes an intercept, Eu = 0

becauseCov (xj , u) = E (xju)�Exj �Eu

I covariates are uncorrelated with u by construction

for r � 2 if E jy jr < ∞ and E kxkr < ∞ then E jujr < ∞I if y and x have �nite second moments then the variance of u existsI note: E jy jr < ∞ ) E jy js < ∞ for all s � r (Liapunov�s Inequality)

D. Steigerwald (UCSB) Linear Prediction 10 / 29

Page 11: Best Linear Prediction - UCSB's Department of …econ.ucsb.edu/~doug/241b/Lectures/03 Best Linear...Best Linear Prediction Econometrics II Douglas G. Steigerwald UC Santa Barbara D.

Linear Projection Model

linear projection model is

y = xTβ+ u E (xu) = 0 β =�

E�xxT

���1E (xy)

xTβ is the best linear predictorI not necessarily the conditional mean E (y jx)

β is the linear prediction coe¢ cientI not the conditional mean coe¢ cient if E (y jx) 6= xTβI not a causal (structural) e¤ect if:

F E (y jx) 6= xTβF E (y jx) = xTβ but rx e 6= 0

D. Steigerwald (UCSB) Linear Prediction 11 / 29

Page 12: Best Linear Prediction - UCSB's Department of …econ.ucsb.edu/~doug/241b/Lectures/03 Best Linear...Best Linear Prediction Econometrics II Douglas G. Steigerwald UC Santa Barbara D.

How Does the Linear Projection Di¤er from the CEF?Example 1

CEF of log(wage) as a function of x (black and female indicators)

discrete covariates, small number of values, compute CEF

E (log (wage) jx) = �.20 black � .24 female + .10 inter + 3.06

I inter = black �femaleF 20% male race gap (black males 20% below white males)F 10% female race gap

Linear Projection of log(wage) on x (black and female indicators)

P (log(wage)jx) = �.15 black � .23 female + 3.06

I 15% race gap

F average race gap across males and femalesF ignores the role of gender in race gap, even though gender is included

D. Steigerwald (UCSB) Linear Prediction 12 / 29

Page 13: Best Linear Prediction - UCSB's Department of …econ.ucsb.edu/~doug/241b/Lectures/03 Best Linear...Best Linear Prediction Econometrics II Douglas G. Steigerwald UC Santa Barbara D.

How Does the Linear Projection Di¤er from the CEF?Example 2

CEF of white male log(wage) as a function of years of education (ed)

discrete covariate with multiple valuesI could use categorical variables to compute CEF

F large number of values leads to cumbersome estimation

approximate CEF with linear projections

D. Steigerwald (UCSB) Linear Prediction 13 / 29

Page 14: Best Linear Prediction - UCSB's Department of …econ.ucsb.edu/~doug/241b/Lectures/03 Best Linear...Best Linear Prediction Econometrics II Douglas G. Steigerwald UC Santa Barbara D.

Approximate CEF of Wage as a Function of EducationApproximation 1

Linear Projection of log(wage) on x = ed

P (log(wage)jx) = 0.11 ed + 1.50

I 11% increase in mean wages for every year of education

works well for ed � 9, under predicts if education is lowerD. Steigerwald (UCSB) Linear Prediction 14 / 29

Page 15: Best Linear Prediction - UCSB's Department of …econ.ucsb.edu/~doug/241b/Lectures/03 Best Linear...Best Linear Prediction Econometrics II Douglas G. Steigerwald UC Santa Barbara D.

Approximate CEF of Wage as a Function of EducationApproximation 2: Linear Spline

Linear Projection of log(wage) on x = (ed , spline)

P (log(wage)jx) = 0.02 ed + 0.10 spline + 2.30

I spline = (ed � 9) � 1 (ed)F 2% increase in mean wages for each year of education below 9F 12% increase in mean wages for each year of education above 9

D. Steigerwald (UCSB) Linear Prediction 15 / 29

Page 16: Best Linear Prediction - UCSB's Department of …econ.ucsb.edu/~doug/241b/Lectures/03 Best Linear...Best Linear Prediction Econometrics II Douglas G. Steigerwald UC Santa Barbara D.

How Does the Linear Projection Di¤er from the CEF?Example 3

CEF of white male (with 12 years of education) log(wage) as a function ofyears of experience (ex)

discrete covariate with large number of valuesI approximate CEF with linear projections

Linear Projection of log(wage) on x = exI P (log(wage)jx) = 0.011 ex + 2.50

over predicts wage for young and oldD. Steigerwald (UCSB) Linear Prediction 16 / 29

Page 17: Best Linear Prediction - UCSB's Department of …econ.ucsb.edu/~doug/241b/Lectures/03 Best Linear...Best Linear Prediction Econometrics II Douglas G. Steigerwald UC Santa Barbara D.

Approximate CEF of Wage as a Function of ExperienceApproximation 2: Quadratic Projection

Linear Projection of log(wage) on x =�ex , ex2

�P (log(wage)jx) = 0.046 ex � 0.001 ex2 + 2.30

I rP = .046� .001 � exF captures strong downturn in mean wage for older workers

D. Steigerwald (UCSB) Linear Prediction 17 / 29

Page 18: Best Linear Prediction - UCSB's Department of …econ.ucsb.edu/~doug/241b/Lectures/03 Best Linear...Best Linear Prediction Econometrics II Douglas G. Steigerwald UC Santa Barbara D.

Properties of the Linear Projection Model

Assumption 1

I Ey2 < ∞ E kxk2 < ∞ Qxx = E�xxT� is positive de�nite

Theorem: Under Assumption 11 E

�xxT� and E (xy) exist with �nite elements

2 The linear projection coe¢ cient exists, is unique, and equals

β =�

E�xxT

���1E (xy)

3 P (y jx) = xT �E �xxT���1 E (xy)4 For u = y � xTβ, E (xu) = 0 and E

�u2�< ∞

5 If x contains a constant, Eu = 06 If E jy jr < ∞ and E kxkr < ∞ for r � 2, then E jujr < ∞

Proof

D. Steigerwald (UCSB) Linear Prediction 18 / 29

Page 19: Best Linear Prediction - UCSB's Department of …econ.ucsb.edu/~doug/241b/Lectures/03 Best Linear...Best Linear Prediction Econometrics II Douglas G. Steigerwald UC Santa Barbara D.

Review

How do we approximate E (y jx)?xTβ

How to do you interpret β?

the linear projection coe¢ cient, which is not generally equal torxE (y jx)

What is required for identi�cation of β?

E�xxT� is invertible

What is the correlation between x and u?

0 by construction!

D. Steigerwald (UCSB) Linear Prediction 19 / 29

Page 20: Best Linear Prediction - UCSB's Department of …econ.ucsb.edu/~doug/241b/Lectures/03 Best Linear...Best Linear Prediction Econometrics II Douglas G. Steigerwald UC Santa Barbara D.

Best Linear Predictor Coe¢ cient Solution

βlpc is the value of β that minimizes

S (β) = Ey2 � 2βTE (xy) + βTE�xxT� β Vector Calculus

I �rst derivative �2E (xy) + 2E�xxT� β

solution (linear projection coe¢ cient)

E�xxT� βlpc = E (xy)

required assumption

I Ey2 < ∞ E kxk2 < ∞ Euclidean Length

Return to Best Linear Predictor Coe¢ cient

D. Steigerwald (UCSB) Linear Prediction 20 / 29

Page 21: Best Linear Prediction - UCSB's Department of …econ.ucsb.edu/~doug/241b/Lectures/03 Best Linear...Best Linear Prediction Econometrics II Douglas G. Steigerwald UC Santa Barbara D.

Best Linear Approximation Coe¢ cient Solution

let m (x) := E (y jx)βlac is the value of β that minimizes

d (β) =Z

Rk

�m (x)� xTβ

�2fx (x) dx

d (β) = Em (x)2 � 2βTE (xm (x)) + βTE�xxT� β

I �rst derivative �2E (xm (x)) + 2E�xxT� β

I E (xm (x)) = E (xE (y jx)) = E (E (xy jx)) = E (xy)

solution (linear approximation coe¢ cient)

E�xxT� βlac = E (xy)

Return to Best Linear Predictor Coe¢ cient

D. Steigerwald (UCSB) Linear Prediction 21 / 29

Page 22: Best Linear Prediction - UCSB's Department of …econ.ucsb.edu/~doug/241b/Lectures/03 Best Linear...Best Linear Prediction Econometrics II Douglas G. Steigerwald UC Santa Barbara D.

Vector Calculus

vector derivative: inner productI (2� 1) vectors: B and CI BTC = B1C1 + B2C2

I ∂BTC∂B =

"∂BTC∂B1

∂BTC∂B2

#=

�C1C2

�= C

vector derivative: quadratic formI (2� 2) matrix: DI BTDB = B21D11 + B1B2D12 + B1B2D21 + B

22D22

I ∂BTDB∂B =

�(D11 +D11)B1 + (D12 +D21)B2(D21 +D12)B1 + (D22 +D22)B2

�=�D +DT�B

Return to Solution

D. Steigerwald (UCSB) Linear Prediction 22 / 29

Page 23: Best Linear Prediction - UCSB's Department of …econ.ucsb.edu/~doug/241b/Lectures/03 Best Linear...Best Linear Prediction Econometrics II Douglas G. Steigerwald UC Santa Barbara D.

Euclidean Length

Pythagorean Theorem

I a2 + b2 = c2 so the length of the hypotenuse is c =�a2 + b2

�1/2

c is a vector of dimension 2, so for x a vector of dimension n

I the Euclidean length (norm) is kxk =�x21 + x

22 + � � �+ x2n

�1/2

thereforeI E kxk2 = E

�x21 + x

22 + � � �+ x2n

�I BTDB = B21D11 + B1B2D12 + B1B2D21 + B

22D22

Return Again to Solution

D. Steigerwald (UCSB) Linear Prediction 23 / 29

Page 24: Best Linear Prediction - UCSB's Department of …econ.ucsb.edu/~doug/241b/Lectures/03 Best Linear...Best Linear Prediction Econometrics II Douglas G. Steigerwald UC Santa Barbara D.

Identi�cation - Background

identi�cation is important in structural econometric modelingI F distribution of observed data (for example (y , x))I F a collection of distributions FI θ a parameter of interest (for example Ey)

F identi�cation means that a parameter is uniquely determined by thedistribution of the observed variables

De�nitionA parameter θ 2 R is identi�ed on F if for all F 2 F there is auniquely determined value of θ.

equivalently, θ is identi�ed if we can write out a mapping θ = g (F )on the set F

I restriction to F is important

F most parameters are identi�ed only on a strict subset of the space of alldistributions

D. Steigerwald (UCSB) Linear Prediction 24 / 29

Page 25: Best Linear Prediction - UCSB's Department of …econ.ucsb.edu/~doug/241b/Lectures/03 Best Linear...Best Linear Prediction Econometrics II Douglas G. Steigerwald UC Santa Barbara D.

Identi�cation - Moments of Observed Data

consider identi�cation of the mean µ = EyI µ is uniquely determined if Ey < ∞

F µ is identi�ed for the set F =nF :

R ∞�∞ jy j dF (y ) < ∞

oidenti�cation of the conditional mean

Theorem: If Ey < ∞, the conditional mean m (x) = E (y jx) isidenti�ed almost everywhere.

generally, moments of observed data are identi�ed as long as weexclude degenerate cases

D. Steigerwald (UCSB) Linear Prediction 25 / 29

Page 26: Best Linear Prediction - UCSB's Department of …econ.ucsb.edu/~doug/241b/Lectures/03 Best Linear...Best Linear Prediction Econometrics II Douglas G. Steigerwald UC Santa Barbara D.

Identi�cation - More Complicated Modelsconsider the context of censoring

I y is a random variable with distribution FI we observe y� de�ned by the censoring rule

y� =�y if y � ττ if y > τ

F applies to income surveys, where incomes above the top code arerecorded as equal to the top code ("top coded" data)

observed variable y � has distribution

F � (u) =�F (u) if u < τ1 if u � τ

we are interested in the features of F not the censored distribution F �

I we cannot calculate µ = Ey from F � except in the trivial case wherethere is no censoring P (y � τ) = 0

F µ is not generically identi�ed from F �

D. Steigerwald (UCSB) Linear Prediction 26 / 29

Page 27: Best Linear Prediction - UCSB's Department of …econ.ucsb.edu/~doug/241b/Lectures/03 Best Linear...Best Linear Prediction Econometrics II Douglas G. Steigerwald UC Santa Barbara D.

Assumptions to Restore Identi�cation

parametric identi�cationI assume a parametric distribution (y � N

�µ, σ2

�)

F so F is the set of normal distributionsF can show that

�µ, σ2

�are identi�ed for all F 2 F

I not ideal - identi�cation achieved only through use of an arbitrary andunveri�able parametric assumption

nonparametric identi�cationI quantiles qα of F , for α � P (y � τ) are identi�ed

F if 20% of the distribution is censored, can identify all quantiles forα 2 (0, 0.8)

study of identi�cation focuses attention on what can be learned fromthe data distributions available

Return to General Identi�cation

D. Steigerwald (UCSB) Linear Prediction 27 / 29

Page 28: Best Linear Prediction - UCSB's Department of …econ.ucsb.edu/~doug/241b/Lectures/03 Best Linear...Best Linear Prediction Econometrics II Douglas G. Steigerwald UC Santa Barbara D.

Generalized Inverse

for any matrix AI A� (Moore-Penrose generalized inverse) exists and is unique

A� satis�esI AA�A = AI A�AA� = AI AA� and A�A are symmetric

example, if A�111 exists and A =�A11 00 0

�then A� =

�A�111 00 0

�Return to Identi�cation

D. Steigerwald (UCSB) Linear Prediction 28 / 29

Page 29: Best Linear Prediction - UCSB's Department of …econ.ucsb.edu/~doug/241b/Lectures/03 Best Linear...Best Linear Prediction Econometrics II Douglas G. Steigerwald UC Santa Barbara D.

Proof of Theorem 1

1 E�xxT� � E

xxT (Expectation Inequality)

E xxT

= E kxk2 < ∞ (Assumption 1)

A� satis�esI AA�A = AI A�AA� = AI AA� and A�A are symmetric

example, if A�111 exists and A =�A11 00 0

�then A� =

�A�111 00 0

�Return to Properties of the LPM

D. Steigerwald (UCSB) Linear Prediction 29 / 29