Best Linear Prediction - UCSB's Department of econ.ucsb.edu/~doug/241b/Lectures/03 Best Linear......

download Best Linear Prediction - UCSB's Department of econ.ucsb.edu/~doug/241b/Lectures/03 Best Linear... Best

of 29

  • date post

    26-Dec-2019
  • Category

    Documents

  • view

    1
  • download

    0

Embed Size (px)

Transcript of Best Linear Prediction - UCSB's Department of econ.ucsb.edu/~doug/241b/Lectures/03 Best Linear......

  • Best Linear Prediction Econometrics II

    Douglas G. Steigerwald

    UC Santa Barbara

    D. Steigerwald (UCSB) Linear Prediction 1 / 29

  • Overview Reference: B. Hansen Econometrics Chapter 2.18-2.19, 2.24, 2.33

    How to approximate E (y jx)?

    if x is continuous, E (y jx) is generally unknown I linear approximation xTβ

    F β is the linear predictor (or projection) coe¢ cient (βlpc ) F β is not rxE (y jx)

    I β is identied if E � xxT

    � is invertible

    linear prediction error u is uncorrelated with x by construction

    D. Steigerwald (UCSB) Linear Prediction 2 / 29

  • Approximate the CEF

    conditional mean E (y jx) I "best" predictor (mean squared prediction error) I functional form generally unknown

    F unless x discrete (and low dimension)

    approximate E (y jx) with xTβ I linear approximation, thus a linear predictor

    1 select β to form "best" linear predictor of y : P (y jx) 2 select β to form "best" linear approximation to E (y jx)

    1 and 2 yield identical β I either criterion could be used to dene β I we use 1 and refer to xTβ as the best linear predictor

    D. Steigerwald (UCSB) Linear Prediction 3 / 29

  • Best Linear Predictor Coe¢ cient 1. select β to minimize mean-square prediction error

    S (β) = E � y � xTβ

    �2 β := βlpc satises

    E � xxT

    � βlpc = E (xy) Solution

    2. select β to minimize mean-square approximation error

    d (β) = Ex � E (y jx)� xTβ

    �2 solution satises

    E � xxT

    � βlac = E (xy) Solution

    D. Steigerwald (UCSB) Linear Prediction 4 / 29

  • Identication

    Identication (General) I θ and θ0 are separately identied i¤ Pθ 6= Pθ0 ) θ 6= θ0

    Identication - Background

    Identication (Best Linear Predictor)

    I β and β0 are separately identied i¤ � E � xxT

    ���1 E (xy) from Pβ does

    not equal � E � xxT

    ���1 E (xy) from Pβ0

    I i.e. there is a unique solution to βlpc = � E � xxT

    ���1 E (xy)

    I i.e. E � xxT

    � is invertible

    D. Steigerwald (UCSB) Linear Prediction 5 / 29

  • Identication 2

    Can we uniquely determine βlpc?

    E � xxT

    � βlpc = E (xy)

    if E � xxT

    � is invertible

    I there is a unique value of βlpc that solves the equation

    F βlpc is identied as there is a unique solution

    if E � xxT

    � is not invertible

    I there are multiple values of βlpc that solve the equation

    F βlpc is not identied as there is not a unique solution

    F mathematically βlpc = � E � xxT

    ��� E (xy ) Generalized Inverse

    I all solutions yield an equivalent best linear predictor xTβlpc F best linear predictor is identied

    D. Steigerwald (UCSB) Linear Prediction 6 / 29

  • Invertibility

    Required assumption: E � xxT

    � is positive denite

    for any non-zero α 2 Rk :

    αTE � xxT

    � α = E

    � αTxxTα

    � = E

    � αTx

    �2 � 0 so E

    � xxT

    � is positive semi-denite by construction

    positive semi-denite matrices are invertible IFF they are positive denite

    if we assume E � xxT

    � is positive denite, then

    I E � αTx

    �2 > 0

    I there is no non-zero α for which αT x = 0

    F implies there are no redundant variables in x F i.e. all columns are linearly independent

    D. Steigerwald (UCSB) Linear Prediction 7 / 29

  • Best Linear Predictor: Error

    best linear predictor (linear projection)

    P (y jx) = xTβlpc

    decomposition

    y = xTβlpc + u u = e + �

    E (y jx)� xTβlpc �

    choice of βlpc implies E (xu) = 0

    I E (xu) = E � x � y � xTβlpc

    �� =

    E (xy)�E � xxT

    � � E � xxT

    ���1 E (xy) = 0

    I error from projection onto x is orthogonal to x

    D. Steigerwald (UCSB) Linear Prediction 8 / 29

  • Best Linear Predictor: Error Variance

    Variance of u equals the variance of the error from a linear projection

    Variance of u I Eu2 = E

    � y � xTβ

    �2 = Ey2 �E

    � yxT

    � β

    F because E � xTβ

    �2 = E

    � yxT

    � β

    Variance of projection error I projection error is dened as kuk = kyk �

    xTβ

    I because y2 = kyk2

    Var (u) = Var (kuk)

    D. Steigerwald (UCSB) Linear Prediction 9 / 29

  • Best Linear Predictor: Covariate Error Correlation

    E (xu) = 0 is a set of k equations, as

    E (xju) = 0

    I if x includes an intercept, Eu = 0

    because Cov (xj , u) = E (xju)�Exj �Eu

    I covariates are uncorrelated with u by construction

    for r � 2 if E jy jr < ∞ and E kxkr < ∞ then E jujr < ∞ I if y and x have nite second moments then the variance of u exists I note: E jy jr < ∞ ) E jy js < ∞ for all s � r (Liapunovs Inequality)

    D. Steigerwald (UCSB) Linear Prediction 10 / 29

  • Linear Projection Model

    linear projection model is

    y = xTβ+ u E (xu) = 0 β = �

    E � xxT

    ���1 E (xy)

    xTβ is the best linear predictor I not necessarily the conditional mean E (y jx)

    β is the linear prediction coe¢ cient I not the conditional mean coe¢ cient if E (y jx) 6= xTβ I not a causal (structural) e¤ect if:

    F E (y jx) 6= xTβ F E (y jx) = xTβ but rx e 6= 0

    D. Steigerwald (UCSB) Linear Prediction 11 / 29

  • How Does the Linear Projection Di¤er from the CEF? Example 1

    CEF of log(wage) as a function of x (black and female indicators)

    discrete covariates, small number of values, compute CEF

    E (log (wage) jx) = �.20 black � .24 female + .10 inter + 3.06

    I inter = black �female F 20% male race gap (black males 20% below white males) F 10% female race gap

    Linear Projection of log(wage) on x (black and female indicators)

    P (log(wage)jx) = �.15 black � .23 female + 3.06

    I 15% race gap

    F average race gap across males and females F ignores the role of gender in race gap, even though gender is included

    D. Steigerwald (UCSB) Linear Prediction 12 / 29

  • How Does the Linear Projection Di¤er from the CEF? Example 2

    CEF of white male log(wage) as a function of years of education (ed)

    discrete covariate with multiple values I could use categorical variables to compute CEF

    F large number of values leads to cumbersome estimation

    approximate CEF with linear projections

    D. Steigerwald (UCSB) Linear Prediction 13 / 29

  • Approximate CEF of Wage as a Function of Education Approximation 1

    Linear Projection of log(wage) on x = ed

    P (log(wage)jx) = 0.11 ed + 1.50

    I 11% increase in mean wages for every year of education

    works well for ed � 9, under predicts if education is lower D. Steigerwald (UCSB) Linear Prediction 14 / 29

  • Approximate CEF of Wage as a Function of Education Approximation 2: Linear Spline

    Linear Projection of log(wage) on x = (ed , spline)

    P (log(wage)jx) = 0.02 ed + 0.10 spline + 2.30

    I spline = (ed � 9) � 1 (ed) F 2% increase in mean wages for each year of education below 9 F 12% increase in mean wages for each year of education above 9

    D. Steigerwald (UCSB) Linear Prediction 15 / 29

  • How Does the Linear Projection Di¤er from the CEF? Example 3

    CEF of white male (with 12 years of education) log(wage) as a function of years of experience (ex)

    discrete covariate with large number of values I approximate CEF with linear projections

    Linear Projection of log(wage) on x = ex I P (log(wage)jx) = 0.011 ex + 2.50

    over predicts wage for young and old D. Steigerwald (UCSB) Linear Prediction 16 / 29

  • Approximate CEF of Wage as a Function of Experience Approximation 2: Quadratic Projection

    Linear Projection of log(wage) on x = � ex , ex2

    � P (log(wage)jx) = 0.046 ex � 0.001 ex2 + 2.30

    I rP = .046� .001 � ex F captures strong downturn in mean wage for older workers

    D. Steigerwald (UCSB) Linear Prediction 17 / 29

  • Properties of the Linear Projection Model

    Assumption 1

    I Ey2 < ∞ E kxk2 < ∞ Qxx = E � xxT

    � is positive denite

    Theorem: Under Assumption 1 1 E

    � xxT

    � and E (xy) exist with nite elements

    2 The linear projection coe¢ cient exists, is unique, and equals

    β = �

    E � xxT

    ���1 E (xy)

    3 P (y jx) = xT � E � xxT

    ���1 E (xy)

    4 For u = y � xTβ, E (xu) = 0 and E � u2 � < ∞

    5 If x contains a constant, Eu = 0 6 If E jy jr < ∞ and E kxkr < ∞ for r � 2, then E jujr < ∞

    Proof

    D. Steigerwald (UCSB) Linear Prediction 18 / 29

  • Review

    How do we approximate E (y jx)? xTβ

    How to do you interpret β?

    the linear projection coe¢ cient, which is not generally equal to rxE (y jx)

    What is required for identication of β?

    E � xxT

    � is invertible

    What is the correlation between x and u?

    0 by