9. Stochastic Regressors (Violation of Assumption #C1) · PDF fileStochastic Regressors...
Transcript of 9. Stochastic Regressors (Violation of Assumption #C1) · PDF fileStochastic Regressors...
9. Stochastic Regressors(Violation of Assumption #C1)
Assumptions #C1:
• None of the elements of the (N × [K + 1]) matrix X is arandom variable
Remarks: [I]
• In practice, Assumption #C1 is often violated
234
Remarks: [II]
• Validity of Assumption #C1 has convenient consequences:
OLS estimator is given by
β = (X′X)−1X′y
= (X′X)−1X′(Xβ + u)
= β + (X′X)−1X′u
−→ since X is deterministic, it follows that
β is unbiased(because E(β) = β + (X′X)−1X′E(u) = β)
Cov(β) = σ2(X′X)−1
235
Remarks: [III]
β is BLUE
β ∼ N(β, σ2(X′X)−1)(essential for hypothesis testing)
Now:
• What happens if X is stochastic?(violation of Assumption #C1)
236
Conceivable consequences:• OLS estimators can be
unbiased, efficient and consistentbiased, but consistentbiased and non-consistent
• Distribution of the OLS estimator
β = (X′X)−1X′y = β + (X′X)−1X′u
depends on the distributions of u and X−→ statistical inference becomes more difficult
Obviously:• If #C1 is violated, we often can only state asymptotic prop-
erties of the estimators(i.e. properties for large sample sizes, N →∞)
237
Concepts of asymptotics: [I]
• Convergence in probability (plim)(cf. Advanced Statistics, Chapter 5, Slide 212)
• plim-calculus (sums, products, Slutsky-Theorem)(cf. Advanced Statistics, Theorem 5.9, Slide 215)
• (Weak) consistency of an estimator:
plimβN = β,
i.e. for every ε > 0 we have
limN→∞
P (|βN − β| ≥ ε) = 0
(cf. Advanced Statistics, Definition 5.18, Slide 234)
238
Concepts of asymptotics: [II]
• Sufficient condition for consistency:
limN→∞
E(βN) = β (asymptotic unbiasedness)
limN→∞
Var(βN) = 0
(cf. Advanced Statistics, Chapter 5, Slide 239)
Remark:
• Asymptotic concepts of ’plim’ and ’consistency’ carry overfrom univariate random variables to random vectors
239
Examples of stochastic regressors: [I]
• Use of yi−1 as exogenous variable:
yi = α + β1xi + β2yi−1 + ui
(typical time-series specification)
• Random errors in data of the exogenous variables
occurs frequently when we cannot observe the relevantexogenous variable x∗i directly and use the proxy variablexi instead:
xi = x∗i + vi
(vi is a random error term)
240
Examples of stochastic regressors: [II]
• ’Simultaneity’ or ’endogenous regressors’
consider the Keynesian consumption function
Ci = α + β · Yi + ui
Yi is endogenously determined by the identity
Yi = Ci + Ii
(closed economy, no governmental activity, Ii = invest-ment)
241
Result preview:
• Appropriateness of OLS estimator depends on the stochas-tic dependence between the matrix X (i.e. the exogenousvariables) and the error-term vector u
Distinction of 3 dependency structures: [I]
• Case #1: (stochastic independence)The error terms ui and the exogenous variables xkj are sto-chastically independent for all i, j = 1, . . . , N and k = 1, . . . , K
−→ under additional conditions the OLS estimator is unbiased,efficient and consistent
242
Distinction of 3 dependency structures: [II]
• Case #2: (Contemporary uncorrelatedness)The error terms ui and the exogenous variables xki are un-correlated for all i = 1, . . . , N and k = 1, . . . , K, i.e.
Cov(xki, ui) = 0
−→ under additional conditions the OLS estimator is biased,but consistent
• Case #3: (Contemporary correlatedness)The error terms ui and the exogenous variables xki are (con-temporarily) correlated, i.e. for at least one i we have
Cov(xki, ui) 6= 0
−→ the OLS estimator is biased and inconsistent
243
Technical device:
• Conditional expectation E(X|Y = y)(cf. Advanced Statistics, Slide 136)
Calculus:
• Conditional expectation is defined as
E(X|Y = y) ≡∫ +∞
−∞x · fX|Y =y(x) dx
(ordinary expectation of a conditional distribution)
−→ rules for unconditional expectations remain valid(e.g. E(a ·X + b|Y = y) = a · E(X|Y = y) + b)
244
Two specific rules:
• If X and Y are stochastically independent, then
E(X|Y = y) = E(X)
• If the conditional value y has not realized yet, then E(X|Y )constitutes a random variable and its unconditional expectedvalue is given by
E [E(X|Y )] = E(X)
245
9.1 Consequences
Setting: [I]
• We consider the linear regression model
y = Xβ + u
• X contains stochastic regressors
• Moreover,
E(u) = 0N×1
Cov(u) = E(uu′) = σ2IN
246
Setting: [II]
• The OLS estimator is
β = (X′X)−1X′y
= (X′X)−1X′(Xβ + u)
= β + (X′X)−1X′u
• It follows that
E(
β)
= β + E[
(X′X)−1X′u]
• We now analyze the Cases #1 – #3
247
Case #1:
• The error-term vector u is stochastically independent ofall exogenous variables in X
Expectation of the OLS estimator:
• The stochastic independence of u and X implies
E(
β)
= β + E[
(X′X)−1X′u]
= β + E[
(X′X)−1X′]
E (u)
= β
−→ the OLS estimator is unbiased
248
Covariance matrix of the OLS estimator: [I]
• First,
Cov(
β)
= E
[
β − E(
β)] [
β − E(
β)]′
= E
[
β − β] [
β − β]′
= E
(
X′X)−1
X′u[
(
X′X)−1
X′u]′
= E
(
X′X)−1
X′uu′X(
X′X)−1
249
Covariance matrix of the OLS estimator: [II]
• Using the rule E [E(X|Y )] = E(X) from Slide 245 where
X ≡(
X′X)−1
X′uu′X(
X′X)−1
and Y = X
we obtain
Cov(
β)
= E
E[
(
X′X)−1
X′uu′X(
X′X)−1 ∣
∣
∣X]
• The inner expectation is conditional upon X and thus
Cov(
β)
= E
(
X′X)−1
X′E[
uu′|X]
X(
X′X)−1
250
Covariance matrix of the OLS estimator: [III]
• Owing to the stochastic independence of u and X it followsthat
E(
uu′|X)
= E(
uu′)
= σ2IN
• Thus,
Cov(
β)
= E
(
X′X)−1
X′σ2INX(
X′X)−1
= σ2 · E[
(
X′X)−1
]
Now:
• Unbiased estimator of Cov(
β)
251
First step:
• Unbiased estimator of the error-term variance σ2
Proposition:
• The (ordinary) estimator
σ2 =u′u
N −K − 1
is an unbiased estimator of σ2
252
Proof:
• We have
E(
σ2)
= E[
E(
σ2|X)]
= E
E
[
u′u(N −K − 1)
∣
∣
∣X
]
= E 1
N −K − 1E
[
u′u|X]
=1
N −K − 1E
E[
u′u|X]
=1
N −K − 1E
(
u′u)
=σ2 · (N −K − 1)
N −K − 1= σ2
253
Remark:
• Since the matrix X is stochastic and independent of u, itfollows that
E(
u′u)
= E(
u′u|X)
= σ2 · (N −K − 1),
• Proof: Econometrics I
Overall:
• An unbiased estimator of the covariance matrix Cov(
β)
isgiven by
Cov(
β)
= σ2(
X′X)−1
,
254
because
E[
Cov(
β)]
= E[
σ2(
X′X)−1
]
= E
E[
σ2(
X′X)−1 ∣
∣
∣X]
= E
(
X′X)−1 [
E(
σ2|X)]
= E
(
X′X)−1
[ 1N −K − 1
E(
u′u|X)
]
= E
(
X′X)−1
[ 1N −K − 1
σ2 · (N −K − 1)]
= σ2 · E[
(
X′X)−1
]
= Cov(
β)
255
Summary:
• The ordinary estimators of σ2 and Cov(
β)
are unbiased es-timators
Remaining question:
• Is the OLS estimator consistent?
256
Asymptotic properties of the matrix X
• First,
plim(β) = plim[
(X′X)−1X′y]
= plim[
β + (X′X)−1X′u]
= plim (β) + plim[
(X′X)−1X′u]
= β + plim[
(X′X/N)−1X′u/N]
• The matrix X′X/N is given by
X′X/N =
1∑N
i=1 x1i/N · · ·∑N
i=1 xKi/N∑N
i=1 x1i/N∑N
i=1 x21i/N · · ·
∑Ni=1 x1ixKi/N... ... ... ...
∑Ni=1 xKi/N
∑Ni=1 xKix1i/N · · ·
∑Ni=1 x2
Ki/N
257
Further assumptions on the matrix X′X/N :• The expectations of all elements in X′X/N converge towards
a finite limit as N → ∞ and we collect these finite limits inthe matrix Q:
limN→∞
E(
X′X/N)
= Q
(additional Assumption #1)
• Q is a positively definite (K + 1)× (K + 1) matrix
• The variances of all elements in X′X/N converge towardszero as N →∞:
limN→∞
Var
1N
N∑
i=1xkixli
= 0 for all k, l = 1, . . . , K
(additional Assumption #2)
258
Implications:
• Additional assumptions provide a sufficient condition for
plim(
X′X/N)
= Q
(cf. Slide 239)
• Due to Assumption #C2 the inverse(
X′X/N)−1 exists and
using the Slutsky-theorem, we obtain
plim[
(
X′X/N)−1
]
=[
plim(
X′X/N)]−1
= Q−1
259
Asymptotic properties of the error terms
• The homoskedastic variance of the ui’s can differ for distinctsampling sizes N
• Example:
For N = 100 we may have σ2 = σ2N = 10 for i = 1, . . . , N
For N = 1000 we may have σ2 = σ2N = 100 for i =
1, . . . , N
• Here, we assume a finite error-term variance as N →∞:
limN→∞
σ2 = limN→∞
σ2N = σ2
u < ∞
260
Implications: [I]
(a)
limN→∞
E(
u′u/N)
= limN→∞
1N
E
N∑
i=1u2
i
= limN→∞
1N
N∑
i=1E
[
u2i
]
= limN→∞
1N
N∑
i=1Var [ui] = lim
N→∞1N
Nσ2
= limN→∞
σ2 = σ2u
261
Implications: [II]
(b)
limN→∞
Var(
u′u/N)
= limN→∞
1N2Var
N∑
i=1u2
i
= limN→∞
1N2
N∑
i=1Var
[
u2i
]
︸ ︷︷ ︸
≡θ2<∞
= limN→∞
1N2Nθ2 = lim
N→∞θ2
N= 0
• From (a) and (b) we have
plim(
u′u/N)
= σ2u
262
Summary:
• From Slide 257 we know that
plim(β) = β + plim[
(X′X/N)−1X′u/N]
• From the additional assumptions on X′X/N we have
plim[
(
X′X/N)−1
]
= Q−1
• From the additional assumptions on the error-term variancewe have
plim(
u′u/N)
= σ2u
263
Consistency of the OLS estimator
• To prove the consistency of the OLS estimator β it remainsto show that
plim[
X′u]
= 0(K+1)×1
• It is sufficient to prove the following conditions:
limN→∞
E(
X′u/N)
= 0(K+1)×1
limN→∞
Cov(
X′u/N)
= 0(K+1)×(K+1)
(cf. Slide 239)
264
Expectation condition:
• The independence of X and u implies
E(
X′u)
= E(X′)E(u) = 0(K+1)×1
• It follows that
E(
X′u/N)
= 0(K+1)×1 for all N
and thus
limN→∞
E(
X′u/N)
= limN→∞
0(K+1)×1 = 0(K+1)×1
265
Variance condition: [I]
• First,
Cov(
X′u)
= E
[
X′u− E(
X′u)] [
X′u− E(
X′u)]′
= E
[
X′u] [
X′u]′
= E[
X′uu′X]
= E[
E(
X′uu′X|X)]
= E[
X′E(
uu′|X)
X]
= E[
X′σ2INX]
= σ2E[
X′X]
266
Variance condition: [II]
• It follows that
Cov(
X′u/N)
=1
N2σ2E[
X′X]
=1N
σ2E[
X′X/N]
• Due to the assumption limN→∞E(
X′X/N)
= Q from Slide258, it follows that
limN→∞
Cov(
X′u/N)
= limN→∞
1N· limN→∞
σ2 · limN→∞
E[
X′X/N]
= 0 · σ2u ·Q = 0(K+1)×(K+1)
267
Overall summary: [I]
• From Slide 263 we know that
plim[
(
X′X/N)−1
]
= Q−1
• We have shown that
limN→∞
E(
X′u/N)
= 0(K+1)×1
limN→∞
Cov(
X′u/N)
= 0(K+1)×(K+1)
• It follows that
plim[
X′u/N]
= 0(K+1)×1
268
Overall summary: [II]
• Thus, we have
plim(β) = β + plim[
(X′X/N)−1X′u/N]
= β + plim[
(X′X/N)−1]
· plim[
X′u/N]
= β + Q−1 · 0(K+1)×1 = β
Theorem 9.1: (Unbiasedness and consistency of OLS estima-tors)
If the error terms in u and all regressors in X are stochasticallyindependent, then the OLS estimator β =
(
X′X)−1 X′y is un-
biased. If the additional assumptions on the matrix X′X/N andthe error-term variance also hold, then the OLS estimator is alsoconsistent.
269
Consistent estimation of the error-term variance
Now:
• Consistency proof of the estimator
σ2 =u′u
N −K − 1
Proof: [I]
• Consider
plim(
σ2)
= plim[
u′u/(N −K − 1)]
= plim[
N/(N −K − 1)(
u′u/N)]
270
Proof: [II]
• For u′u we have the following representation:
u′u = u′[
IN −X(
X′X)−1
X′]
u
= u′u− u′X(
X′X)−1
X′u
(cf. Econometrics I)
• It follows that
u′u/N = u′u/N − (u′X/N)[
(
X′X/N)−1
]
(
X′u/N)
• Now, computing plims and applying the Slutsky-Theorem
271
Proof: [III]
• We know that
plim(
u′u/N)
= σ2u (cf. Slide 262)
plim[
(
X′X/N)−1
]
= Q−1 (cf. Slide 263)
plim[
X′u/N]
= 0(K+1)×1 (cf. Slide 268)
plim[
u′X/N]
= 01×(K+1) (Slutsky-Theorem)
• Thus,
plim(
u′u/N)
= σ2u − 01×(K+1)Q
−10(K+1)×1 = σ2u
272
Proof: [IV]
• Overall we have
plim(
σ2)
= plim[
u′u/(N −K − 1)]
= plim[
N/(N −K − 1)(
u′u/N)]
= plim [N/(N −K − 1)] · plim[
u′u/N]
= 1 · σ2u = σ2
u
Theorem 9.2: (Unbiasedness and consistency of σ2)
If the error terms in u and all regressors in X are stochasticallyindependent, then the estimator σ2 = u′u/(N−K−1) is unbiased.If the additional assumptions on the matrix X′X/N and the error-term variance also hold, then σ2 is also consistent.
273
Validity of hypothesis tests
Important question:
• What is the distribution of the OLS estimator
β = (X′X)−1X′y = β + (X′X)−1X′u
Problem:
• Under the Assumption #B4 we have
u ∼ N(0N×1, σ2IN)
• However, the distribution of X is unknown−→ unknown distribution of β for finite sampling sizes
274
Asymptotic result: [I]
• Application of the central limit theorem provides
β appr∼ N(β, σ2uQ
−1/N)
• Consistent estimator of σ2uQ
−1 is given by
σ2(
X′X/N)−1
since
plim[
σ2(
X′X/N)−1
]
= plim[
σ2]
plim[
(
X′X/N)−1
]
= σ2u ·Q−1
(cf. Slides 263, 272)
275
Asymptotic result: [II]
• For large N an appropriate estimator of Cov(
β)
= σ2uQ
−1/Nis given by
σ2(
X′X/N)−1
/N = σ2(
X′X)−1
−→ use this estimator in test statistics of the t- and F -tests
Theorem 9.3: (Approximative validity of hypothesis tests)
If the error terms in u and all regressors in X are stochasti-cally independent and if the additional assumptions on the ma-trix X′X/N and the error-term variance σ2
u also hold, then thehypothesis tests (t- and F -tests) are approximatively valid forsufficiently large sampling sizes N .
276
Case #2:
• The error-term vector u and the exogenous variables in Xare contemporarily uncorrelated
Expectation of the OLS estimator:
• The term
E(
β)
= β + E[
(X′X)−1X′u]
cannot be simplified any further
−→ OLS estimator is biased
277
Consistency of the OLS estimator: [I]
• We retain the additional assumptions on
the matrix X′X/N(cf. Slide 258)
the error-term variance(cf. Slide 260)
• Again, we have
plim(β) = β + plim[
(X′X/N)−1X′u/N]
(cf. Slide 257)
278
Consistency of the OLS estimator: [II]
• Via the additional assumption on X′X/N and the Slutsky-Theorem, we have
plim[
(X′X/N)−1]
= Q−1
• It remains to show that
plim(X′u/N) = 0(K+1)×1
(cf. Slide 268)
• First, we have
X′u/N =
∑Ni=1 ui/N
∑Ni=1 x1iui/N
...∑N
i=1 xKiui/N
279
Consistency of the OLS estimator: [III]
• Due to contemporary uncorrelatedness we have
E(∑
ui/N)
=1N
∑
E(ui) = 0
E(∑
x1iui/N)
=1N
∑
E(x1iui) =1N
∑
E(x1i)E(ui) = 0
...
E(∑
xKiui/N)
=1N
∑
E(xKiui) =1N
∑
E(xKi)E(ui) = 0
and thus
E(X′u/N) = 0(K+1)×1
280
Consistency of the OLS estimator: [IV]
• It follows that
limN→∞
E(X′u/N) = 0(K+1)×1
• Furthermore, it can be shown that
limN→∞
Cov(X′u/N) = 0(K+1)×(K+1)
and, thus, it follows that
plim(
X′u/N)
= 0(K+1)×1
281
Consistency of the OLS estimator: [V]
• Overall, we have
plim(β) = β + plim[
(X′X/N)−1X′u/N]
= β + plim[
(X′X/N)−1]
· plim[
X′u/N]
= β + Q−1 · 0(K+1)×1 = β
−→ OLS estimator β is consistent
Theorem 9.4: (Consistency of the OLS estimator)
If the error terms in u and the regressors in X are contemporar-ily uncorrelated, then under the additional assumptions on thematrix X′X/N and the error-term variance the OLS estimatorβ =
(
X′X)−1 X′y is biased, but consistent.
282
Validity of hypothesis tests:
• In analogy to Case #1 we also find that
the estimator σ2 = u′u/(N −K − 1) of σ2 is consistent
for sufficiently large N we have
β appr∼ N(β, σ2uQ
−1/N)
Theorem 9.5: (Approximative validity of hypothesis tests)
If the error terms in u and all regressors in X are contemporar-ily uncorrelated and if the additional assumptions on the ma-trix X′X/N and the error-term variance σ2
u also hold, then thehypothesis tests (t- and F -tests) are approximatively valid forsufficiently large sampling sizes N .
283
Case #3:
• The error terms in u and the regressors in X are contem-porarily correlated
In this case:
• Even if all additional assumption are satisfied, the OLS esti-mators are biased and inconsistent
• We can no longer derive any approximative distribution of β
284
Theorem 9.6:
If the error terms in u and the regressors in X are contemporarilycorrelated, then the OLS estimators are biased and inconsistent(even if the additional assumptions are satisfied). The hypoth-esis tests (t- and F -tests) are entirely unreliable (even for largesampling sizes).
285
9.2 Instrumental Variables Regression
Obviously:
• For Cases #1 and #2
OLS estimators are consistenthypothesis tests are approximatively valid
−→ OLS estimation is feasible
• Case #3 is problematic, because
OLS estimators are biased and inconsistenthypothesis tests are highly unreliable
−→ application of alternative estimation procedures
286
Remark:
• Inconsistency for Case #3 results from
plim(
X′u)
6= 0(K+1)×1
implying
plim(β) = β + Q−1 · plim[
X′u/N]
6= β
(cf. Slide 282)
• Even if only a single regressor xki is contemporarily correlatedwith ui, then the OLS estimators of all K + 1 parametersbecome inconsistent
287
Alternative estimation procedure:
• Instrumental Variables regression (IV estimation)
Idea behind IV estimation:
• Suppose the regressor xki is contemporarily correlated withui
• We look for a variable zi (instrumental variable) that (a) iscorrelated with the problematic variable xki, but (b) is notcontemporarily correlated with the error term ui, i.e.
Cov(zi, xki) 6= 0 for all i = 1, . . . , NCov(zi, ui) = 0 for all i = 1, . . . , N
• We base estimation of β both on zi and xki
288
Remarks:
• Two types of IV estimation
classical IV estimationgeneral IV estimation(two-stage least squares estimation, 2SLS)
• 2SLS estimator is a generalized OLS estimator(GLS estimator, cf. Sections 6 and 7)
289
1. Classical IV estimation:
Procedure: [I]
• Consider the regression model
y = Xβ + u
• Identify those x-variables in the X matrix that are contem-porarily correlated with u
• Replace the problematic x-variables by relevant instrumentvariables (z-variables, instruments)(for instrument relevance, see Slides 292–293)
290
Procedure: [II]
• Collect the unproblematic x-variables and all z-variables (in-struments) in the N × (K + 1) matrix Z(note: X and Z are both N × (K + 1) matrices)
• Estimate the unknown parameter vector β by
βCIV =(
Z′X)−1
Z′y
Definition 9.7: (Classical IV estimator)
The estimator
βCIV =(
Z′X)−1
Z′y
is called classical instrumental-variables estimator.291
Remark:
• The instruments need to satisfy some conditions in order togarantuee ’good’ statistical properties of the IV estimator
Conditions for valid instruments: [I]
1. We postulate validity of the same asymptotic assumptionsfor the Z-matrix than those postulated for the X-matrix onSlide 258:
limN→∞
E(
Z′Z/N)
= QZZ
limN→∞
Var
1N
N∑
i=1zkizli
= 0 for all k, l = 1, . . . , K
It follows that
plim(
Z′Z/N)
= QZZ
292
Conditions for valid instruments: [II]
2. Z and u have to be independent or at least contemporarilyuncorrelated
−→ from limN→∞
E(
Z′Z/N)
= QZZ and limN→∞
σ2 = σ2u it follows
that
plim(
Z′u/N)
= 0(K+1)×1
3. The instruments in Z and the regressors in X need to becorrelated
plim(
Z′X/N)
= QZX
(stable correlation for N →∞)
293
Theorem 9.8: (Properties of the classical IV estimator) [I]
If the instrumental variables satisfy the three conditions statedon Slides 292, 293, then the classiscal IV estimator
βCIV =(
Z′X)−1
Z′y
has the following properties:
(a) We have
plim(
βCIV
)
= β,
i.e. the classical IV estimator is consistent.
(b) For sufficiently large N the IV estimator’s asymptotic covari-ance matrix is given by
Cov(
βCIV
)
= σ2u
(
Q′ZXQ−1
ZZQZX)−1
/N.
294
Theorem 9.8: (Properties of the classical IV estimator) [II]
(c) A consistent estimator of σ2u is given by
σ2u =
u′CIVuCIV
N −K − 1,
where uCIV = y−XβCIV.
(d) In practice, the asymptotic covariance matrix from part (b)is estimated by
Cov(
βCIV
)
= σ2u
(
Z′X)−1 (
Z′Z) (
X′Z)−1
.
Proof:(a): see class(b)–(d): cf. Von Auer (2007), p. 471; Vogelvang (2005), pp. 201-203
295
Remarks:
• In practice, finding appropriate instrument variables often ap-pears to be a non-trivial task
• It is important that the instruments are contemporarily un-correlated with the error terms, but highly correlated withthe x-variables
• The lower the correlation between the x- and the z-variables,the larger the elements in Cov
(
βCIV
)
−→ high standard errors
296
Example: [I]
• Keynesian (mini) macro model
• For i = 1, . . . , N we have
Ci = α + β · Yi + ui (1)Yi = Ci + Ii (2)
Cov(Ii, ui) = 0 (3)
(C = consumption, Y = income, I = investment)
• Inserting Eq. (1) into Eq. (2), we obtain
Yi =α
1− β+
11− β
Ii +1
1− βui
−→ Yi depends on ui
297
Example: [II]
• Computation of the contemporary covariance:
per definition, we have
Cov(Yi, ui) = E [Yi − E(Yi)] [ui − E(ui)]
furthermore,
E(Yi) =α
1− β+
11− β
Ii
and thus
Yi − E(Yi) =1
1− βui
298
Example: [III]
consequently,
Cov(Yi, ui) = E [Yi − E(Yi)] [ui − E(ui)]
= E
(
11− β
ui · ui
)
=1
1− βE
(
u2i
)
=σ2
1− β6= 0
−→ OLS estimators of α and β are inconsistent
299
Example: [IV]
• Consistent estimators via the IV approach
• Use Ii as an instrument for Yi
• The X- and Z-matrices are given by
X =
1 Y11 Y2...1 YN
, Z =
1 I11 I2...1 IN
300
Example: [V]
• Since Cov(Ii, ui) = 0 for all i = 1, . . . , N , Z is contemporarilyuncorrelated with u
−→ the classical IV estimator
βCIV =(
Z′X)−1
Z′y
is consistent, i.e.
plim(
βCIV
)
=
[
αβ
]
301
Problem with classical IV estimator:
• Estimation results depend on the explicit choice of the in-struments(i.e. distinct instruments yield different estimates)
Resort:
• Generalized IV estimation (2SLS)
302
2. Generalized IV estimation (2SLS)
Setting:
• There are more instruments available than there are x-variab-les that are contemporarily correlated with u
Idea:
• Use a linear combination of the instruments to estimate β
To this end:
• Collect all unproblematic x-variables and all z-variables (in-struments) in der N × (p + 1) matrix Z(here: p ≥ K)
303
Basic idea behind IV estimation:
• Look for the ’optimal linear combination’ of the intruments(via appropriate OLS estimation; Stage #1)
• Use it to estimate β(via appropriate OLS estimation; Stage #2)
Remark:
• This historical two-stage procedure can be summarized in asingle estimation formula
Here:
• Derivation of the estimation formula as a GLS estimator
304
Derivation: [I]
• Consider the linear regression model
y = Xβ + u,
where
u ∼ N(0N×1, σ2IN)
• Left-hand side multiplication with Z′ yields
Z′y = Z′Xβ + Z′u
(transformed model)
305
Derivation: [II]
• The covariance matrix of Z′u is given by
Cov(
Z′u)
= Z′Cov(u)Z
= Z′σ2INZ= σ2(Z′Z) = σ2Ω
• Setting X∗ ≡ Z′X, we obtain the GLS estimator from thetransformed model as (cf. Definition 6.2, Slide 123)
βGIV =[
X∗′Ω−1X∗]−1
X∗′Ω−1Z′y
=[
(Z′X)′(Z′Z)−1Z′X]−1
(Z′X)′(Z′Z)−1Z′y
=[
X′Z(Z′Z)−1Z′X]−1
X′Z(Z′Z)−1Z′y
306
Definition 9.9: (Generalized IV estimator)
The estimator
βGIV =[
X′Z(Z′Z)−1Z′X]−1
X′Z(Z′Z)−1Z′y
is called generalized instrumental-variables estimator.
Two special cases: [I]
1. If p = K the matrices X′Z and Z′X are quadratic and regularand it follows that
βGIV = (Z′X)−1(Z′Z)(X′Z)−1(X′Z)(Z′Z)−1Z′y
= (Z′X)−1Z′y
= βCIV
(generalized IV estimator = classical IV estimator)
307
Two special cases: [II]
2. If there are no stochastic regressors (i.e. if Assumption #C1is valid), we then set Z = X and it follows that
βGIV = (X′X)−1X′y = β
(generalized IV estimator = OLS estimator)
Finally:
• Asymptotic properties of the generalized IV estimator
308
Theorem 9.10: (Properties of the generalized IV estimator) [I]
If the three valid-instruments conditions from Slides 292–293 aresatisfied, then the generalized IV estimator
βGIV =[
X′Z(Z′Z)−1Z′X]−1
X′Z(Z′Z)−1Z′y
has the following properties:
(a) We have
plim(
βGIV
)
= β,
i.e. the generalized IV estimator is consistent.
(b) For sufficiently large N its asymptotic covariance matrix isgiven by
Cov(
βGIV
)
= σ2u
(
Q′ZXQ−1
ZZQZX)−1
/N.
309
Theorem 9.10: (Properties of the generalized IV estimator) [II]
(c) A consistent estimator of σ2u is given by
σ2u =
u′GIVuGIV
N −K − 1,
where uGIV = y−XβGIV.
(d) In practice, the asymptotic covariance matrix from (b) isestimated by
Cov(
βGIV
)
= σ2u
[
X′Z(
Z′Z)−1
Z′X]−1
.
Proof:(a): see class(b)–(d): cf. Von Auer (2007), p. 471; Vogelvang (2005), p. 204
310
Summary:
• If a sufficient number of valid instruments are available, thenthe generalized IV estimator
βGIV =[
X′Z(Z′Z)−1Z′X]−1
X′Z(Z′Z)−1Z′y
is consistent and its asymptotic standard errors can be esti-mated in good precision
Further questions:
1. What are inappropriate (weak) instruments?
2. What are the consequences of using weak instruments?
311
Ad #1:
• Weak instruments satisfy the properties stated on Slides 292,293
• However, they explain only a little portion of the variation inthe problematic x-variables
Ad #2:
• The generalized IV estimator βGIV is strongly biased
• The standard errors are strongly biased
−→ the GIV estimator βGIV is entirely unreliable
312
Remarks:
• There are statistical tests for weak instruments(cf. Stock & Watson, 2003, Chapter 10)
• In the case of weak instruments alternative estimation pro-cedures can be used(limited information maximum-likelihood techniques)(cf. Greene, 2008, Chapter 13)
313
9.3 Diagnostics
Summary:
• IV estimation becomes necessary in the case of contemporarycorrelation among the error terms in u and the x-variables(cf. Case #3, Slide 285)
• In the case of no contemporary correlation, we prefer OLSestimation to IV estimation(cf. Cases #1 and #2, Slides 248 - 283)
314
Question:
• Is there contemporary correlation in the data set?
−→ Hausman’s specification test(see Hausman, 1978)
Idea behind the test: [I]
• Contemporary uncorrelatedness implies
plim(X′u/N) = 0(K+1)×1
(cf. Slides 279–281)
315
Idea behind the test: [II]
• Consider the statistical testing problem:
H0 : plim(X′u/N) = 0 versus H1 : plim(X′u/N) 6= 0
• Under H0, both the OLS estimator
β =(
X′X)−1
X′y
and the generalized IV estimator
βGIV =[
X′Z(Z′Z)−1Z′X]−1
X′Z(Z′Z)−1Z′y
are consistent
−→ both estimators should not differ substantially from eachother
316
Idea behind the test: [III]
• Use the test statistic
H =(
βGIV − β)′ [
Cov(
βGIV
)
− Cov(
β)]−1 (
βGIV − β)
with a common σ2u in both covariance matrices
• Distribution of H under H0:
H ∼asymp. χ2
K
(chisquare distribution with K degrees-of-freedom)
−→ reject H0 at the α-level if H > χ2K;1−α
317