Best Linear PredictionEconometrics II
Douglas G. Steigerwald
UC Santa Barbara
D. Steigerwald (UCSB) Linear Prediction 1 / 29
OverviewReference: B. Hansen Econometrics Chapter 2.18-2.19, 2.24, 2.33
How to approximate E (y jx)?
if x is continuous, E (y jx) is generally unknownI linear approximation xTβ
F β is the linear predictor (or projection) coe¢ cient (βlpc )F β is not rxE (y jx)
I β is identi�ed if E�xxT� is invertible
linear prediction error u is uncorrelated with x by construction
D. Steigerwald (UCSB) Linear Prediction 2 / 29
Approximate the CEF
conditional mean E (y jx)I "best" predictor (mean squared prediction error)I functional form generally unknown
F unless x discrete (and low dimension)
approximate E (y jx) with xTβ
I linear approximation, thus a linear predictor
1 select β to form "best" linear predictor of y : P (y jx)2 select β to form "best" linear approximation to E (y jx)
1 and 2 yield identical β
I either criterion could be used to de�ne βI we use 1 and refer to xTβ as the best linear predictor
D. Steigerwald (UCSB) Linear Prediction 3 / 29
Best Linear Predictor Coe¢ cient1. select β to minimize mean-square prediction error
S (β) = E�y � xTβ
�2β := βlpc satis�es
E�xxT� βlpc = E (xy) Solution
2. select β to minimize mean-square approximation error
d (β) = Ex�E (y jx)� xTβ
�2solution satis�es
E�xxT� βlac = E (xy) Solution
D. Steigerwald (UCSB) Linear Prediction 4 / 29
Identi�cation
Identi�cation (General)I θ and θ0 are separately identi�ed i¤ Pθ 6= Pθ0 ) θ 6= θ0
Identi�cation - Background
Identi�cation (Best Linear Predictor)
I β and β0 are separately identi�ed i¤�E�xxT���1 E (xy) from Pβ does
not equal�E�xxT���1 E (xy) from Pβ0
I i.e. there is a unique solution to βlpc =�E�xxT���1 E (xy)
I i.e. E�xxT� is invertible
D. Steigerwald (UCSB) Linear Prediction 5 / 29
Identi�cation 2
Can we uniquely determine βlpc?
E�xxT
�βlpc = E (xy)
if E�xxT� is invertible
I there is a unique value of βlpc that solves the equation
F βlpc is identi�ed as there is a unique solution
if E�xxT� is not invertible
I there are multiple values of βlpc that solve the equation
F βlpc is not identi�ed as there is not a unique solution
F mathematically βlpc =�E�xxT��� E (xy ) Generalized Inverse
I all solutions yield an equivalent best linear predictor xTβlpcF best linear predictor is identi�ed
D. Steigerwald (UCSB) Linear Prediction 6 / 29
Invertibility
Required assumption: E�xxT� is positive de�nite
for any non-zero α 2 Rk :
αTE�xxT� α = E
�αTxxTα
�= E
�αTx
�2 � 0so E
�xxT� is positive semi-de�nite by construction
positive semi-de�nite matrices are invertible IFF they are positivede�nite
if we assume E�xxT� is positive de�nite, then
I E�αTx
�2> 0
I there is no non-zero α for which αT x = 0
F implies there are no redundant variables in xF i.e. all columns are linearly independent
D. Steigerwald (UCSB) Linear Prediction 7 / 29
Best Linear Predictor: Error
best linear predictor (linear projection)
P (y jx) = xTβlpc
decomposition
y = xTβlpc + u u = e +�
E (y jx)� xTβlpc
�choice of βlpc implies E (xu) = 0
I E (xu) = E�x�y � xTβlpc
��=
E (xy)�E�xxT� �E �xxT���1 E (xy) = 0
I error from projection onto x is orthogonal to x
D. Steigerwald (UCSB) Linear Prediction 8 / 29
Best Linear Predictor: Error Variance
Variance of u equals the variance of the error from a linear projection
Variance of uI Eu2 = E
�y � xTβ
�2= Ey2 �E
�yxT� β
F because E�xTβ
�2= E
�yxT� β
Variance of projection errorI projection error is de�ned as kuk = kyk �
xTβ
I because y2 = kyk2Var (u) = Var (kuk)
D. Steigerwald (UCSB) Linear Prediction 9 / 29
Best Linear Predictor: Covariate Error Correlation
E (xu) = 0 is a set of k equations, as
E (xju) = 0
I if x includes an intercept, Eu = 0
becauseCov (xj , u) = E (xju)�Exj �Eu
I covariates are uncorrelated with u by construction
for r � 2 if E jy jr < ∞ and E kxkr < ∞ then E jujr < ∞I if y and x have �nite second moments then the variance of u existsI note: E jy jr < ∞ ) E jy js < ∞ for all s � r (Liapunov�s Inequality)
D. Steigerwald (UCSB) Linear Prediction 10 / 29
Linear Projection Model
linear projection model is
y = xTβ+ u E (xu) = 0 β =�
E�xxT
���1E (xy)
xTβ is the best linear predictorI not necessarily the conditional mean E (y jx)
β is the linear prediction coe¢ cientI not the conditional mean coe¢ cient if E (y jx) 6= xTβI not a causal (structural) e¤ect if:
F E (y jx) 6= xTβF E (y jx) = xTβ but rx e 6= 0
D. Steigerwald (UCSB) Linear Prediction 11 / 29
How Does the Linear Projection Di¤er from the CEF?Example 1
CEF of log(wage) as a function of x (black and female indicators)
discrete covariates, small number of values, compute CEF
E (log (wage) jx) = �.20 black � .24 female + .10 inter + 3.06
I inter = black �femaleF 20% male race gap (black males 20% below white males)F 10% female race gap
Linear Projection of log(wage) on x (black and female indicators)
P (log(wage)jx) = �.15 black � .23 female + 3.06
I 15% race gap
F average race gap across males and femalesF ignores the role of gender in race gap, even though gender is included
D. Steigerwald (UCSB) Linear Prediction 12 / 29
How Does the Linear Projection Di¤er from the CEF?Example 2
CEF of white male log(wage) as a function of years of education (ed)
discrete covariate with multiple valuesI could use categorical variables to compute CEF
F large number of values leads to cumbersome estimation
approximate CEF with linear projections
D. Steigerwald (UCSB) Linear Prediction 13 / 29
Approximate CEF of Wage as a Function of EducationApproximation 1
Linear Projection of log(wage) on x = ed
P (log(wage)jx) = 0.11 ed + 1.50
I 11% increase in mean wages for every year of education
works well for ed � 9, under predicts if education is lowerD. Steigerwald (UCSB) Linear Prediction 14 / 29
Approximate CEF of Wage as a Function of EducationApproximation 2: Linear Spline
Linear Projection of log(wage) on x = (ed , spline)
P (log(wage)jx) = 0.02 ed + 0.10 spline + 2.30
I spline = (ed � 9) � 1 (ed)F 2% increase in mean wages for each year of education below 9F 12% increase in mean wages for each year of education above 9
D. Steigerwald (UCSB) Linear Prediction 15 / 29
How Does the Linear Projection Di¤er from the CEF?Example 3
CEF of white male (with 12 years of education) log(wage) as a function ofyears of experience (ex)
discrete covariate with large number of valuesI approximate CEF with linear projections
Linear Projection of log(wage) on x = exI P (log(wage)jx) = 0.011 ex + 2.50
over predicts wage for young and oldD. Steigerwald (UCSB) Linear Prediction 16 / 29
Approximate CEF of Wage as a Function of ExperienceApproximation 2: Quadratic Projection
Linear Projection of log(wage) on x =�ex , ex2
�P (log(wage)jx) = 0.046 ex � 0.001 ex2 + 2.30
I rP = .046� .001 � exF captures strong downturn in mean wage for older workers
D. Steigerwald (UCSB) Linear Prediction 17 / 29
Properties of the Linear Projection Model
Assumption 1
I Ey2 < ∞ E kxk2 < ∞ Qxx = E�xxT� is positive de�nite
Theorem: Under Assumption 11 E
�xxT� and E (xy) exist with �nite elements
2 The linear projection coe¢ cient exists, is unique, and equals
β =�
E�xxT
���1E (xy)
3 P (y jx) = xT �E �xxT���1 E (xy)4 For u = y � xTβ, E (xu) = 0 and E
�u2�< ∞
5 If x contains a constant, Eu = 06 If E jy jr < ∞ and E kxkr < ∞ for r � 2, then E jujr < ∞
Proof
D. Steigerwald (UCSB) Linear Prediction 18 / 29
Review
How do we approximate E (y jx)?xTβ
How to do you interpret β?
the linear projection coe¢ cient, which is not generally equal torxE (y jx)
What is required for identi�cation of β?
E�xxT� is invertible
What is the correlation between x and u?
0 by construction!
D. Steigerwald (UCSB) Linear Prediction 19 / 29
Best Linear Predictor Coe¢ cient Solution
βlpc is the value of β that minimizes
S (β) = Ey2 � 2βTE (xy) + βTE�xxT� β Vector Calculus
I �rst derivative �2E (xy) + 2E�xxT� β
solution (linear projection coe¢ cient)
E�xxT� βlpc = E (xy)
required assumption
I Ey2 < ∞ E kxk2 < ∞ Euclidean Length
Return to Best Linear Predictor Coe¢ cient
D. Steigerwald (UCSB) Linear Prediction 20 / 29
Best Linear Approximation Coe¢ cient Solution
let m (x) := E (y jx)βlac is the value of β that minimizes
d (β) =Z
Rk
�m (x)� xTβ
�2fx (x) dx
d (β) = Em (x)2 � 2βTE (xm (x)) + βTE�xxT� β
I �rst derivative �2E (xm (x)) + 2E�xxT� β
I E (xm (x)) = E (xE (y jx)) = E (E (xy jx)) = E (xy)
solution (linear approximation coe¢ cient)
E�xxT� βlac = E (xy)
Return to Best Linear Predictor Coe¢ cient
D. Steigerwald (UCSB) Linear Prediction 21 / 29
Vector Calculus
vector derivative: inner productI (2� 1) vectors: B and CI BTC = B1C1 + B2C2
I ∂BTC∂B =
"∂BTC∂B1
∂BTC∂B2
#=
�C1C2
�= C
vector derivative: quadratic formI (2� 2) matrix: DI BTDB = B21D11 + B1B2D12 + B1B2D21 + B
22D22
I ∂BTDB∂B =
�(D11 +D11)B1 + (D12 +D21)B2(D21 +D12)B1 + (D22 +D22)B2
�=�D +DT�B
Return to Solution
D. Steigerwald (UCSB) Linear Prediction 22 / 29
Euclidean Length
Pythagorean Theorem
I a2 + b2 = c2 so the length of the hypotenuse is c =�a2 + b2
�1/2
c is a vector of dimension 2, so for x a vector of dimension n
I the Euclidean length (norm) is kxk =�x21 + x
22 + � � �+ x2n
�1/2
thereforeI E kxk2 = E
�x21 + x
22 + � � �+ x2n
�I BTDB = B21D11 + B1B2D12 + B1B2D21 + B
22D22
Return Again to Solution
D. Steigerwald (UCSB) Linear Prediction 23 / 29
Identi�cation - Background
identi�cation is important in structural econometric modelingI F distribution of observed data (for example (y , x))I F a collection of distributions FI θ a parameter of interest (for example Ey)
F identi�cation means that a parameter is uniquely determined by thedistribution of the observed variables
De�nitionA parameter θ 2 R is identi�ed on F if for all F 2 F there is auniquely determined value of θ.
equivalently, θ is identi�ed if we can write out a mapping θ = g (F )on the set F
I restriction to F is important
F most parameters are identi�ed only on a strict subset of the space of alldistributions
D. Steigerwald (UCSB) Linear Prediction 24 / 29
Identi�cation - Moments of Observed Data
consider identi�cation of the mean µ = EyI µ is uniquely determined if Ey < ∞
F µ is identi�ed for the set F =nF :
R ∞�∞ jy j dF (y ) < ∞
oidenti�cation of the conditional mean
Theorem: If Ey < ∞, the conditional mean m (x) = E (y jx) isidenti�ed almost everywhere.
generally, moments of observed data are identi�ed as long as weexclude degenerate cases
D. Steigerwald (UCSB) Linear Prediction 25 / 29
Identi�cation - More Complicated Modelsconsider the context of censoring
I y is a random variable with distribution FI we observe y� de�ned by the censoring rule
y� =�y if y � ττ if y > τ
F applies to income surveys, where incomes above the top code arerecorded as equal to the top code ("top coded" data)
observed variable y � has distribution
F � (u) =�F (u) if u < τ1 if u � τ
we are interested in the features of F not the censored distribution F �
I we cannot calculate µ = Ey from F � except in the trivial case wherethere is no censoring P (y � τ) = 0
F µ is not generically identi�ed from F �
D. Steigerwald (UCSB) Linear Prediction 26 / 29
Assumptions to Restore Identi�cation
parametric identi�cationI assume a parametric distribution (y � N
�µ, σ2
�)
F so F is the set of normal distributionsF can show that
�µ, σ2
�are identi�ed for all F 2 F
I not ideal - identi�cation achieved only through use of an arbitrary andunveri�able parametric assumption
nonparametric identi�cationI quantiles qα of F , for α � P (y � τ) are identi�ed
F if 20% of the distribution is censored, can identify all quantiles forα 2 (0, 0.8)
study of identi�cation focuses attention on what can be learned fromthe data distributions available
Return to General Identi�cation
D. Steigerwald (UCSB) Linear Prediction 27 / 29
Generalized Inverse
for any matrix AI A� (Moore-Penrose generalized inverse) exists and is unique
A� satis�esI AA�A = AI A�AA� = AI AA� and A�A are symmetric
example, if A�111 exists and A =�A11 00 0
�then A� =
�A�111 00 0
�Return to Identi�cation
D. Steigerwald (UCSB) Linear Prediction 28 / 29
Proof of Theorem 1
1 E�xxT� � E
xxT (Expectation Inequality)
E xxT
= E kxk2 < ∞ (Assumption 1)
A� satis�esI AA�A = AI A�AA� = AI AA� and A�A are symmetric
example, if A�111 exists and A =�A11 00 0
�then A� =
�A�111 00 0
�Return to Properties of the LPM
D. Steigerwald (UCSB) Linear Prediction 29 / 29
Top Related