Econometrics I - Stanford University doubleh/eco270/pointestimation.pdf · PDF file...
date post
08-Apr-2020Category
Documents
view
9download
1
Embed Size (px)
Transcript of Econometrics I - Stanford University doubleh/eco270/pointestimation.pdf · PDF file...
Econometrics I
Department of Economics Stanford University
November, 2016
Part II
Topics
• Point Estimation.
• Interval Estimation
• Hypothesis Testing
• Sufficiency and Data Reduction (maybe).
Different Approaches
• Frequentist: There exists a true parameter value θ0.
• Bayesian: θ is a random variable. Prior+Data =⇒ Posterior.
• Fiduciary Inference: No prior. Data =⇒ Posterior. Or Bayesian with uniform (diffused) prior.
Data, sample of size n
• I.I.D Sampling (sampling with replacement)
• Different Sampling Schemes is a science by itself.
• Usual Notations: X1, . . . ,Xn, Y1, . . . ,Yn, Z1, . . . ,Zn.
• Xn, Yn, Zn.
• Parameter θ: a function(al) of the distribution.
µ (FX (·)) = ∫
xfX (x) dx =
∫ xdFX (x) .
• Estimators: a function of the data:
θ̂ = φn (Xn) = φn (X1,X2, . . . , n)
• Strictly speaking, a sequence of functions of the data, since it is a different function for a different n. For example:
θ̂ = X̄n = X1 + X2 + · · ·+ Xn
n .
• Estimate: a realized value of the estimator.
• Is θ̂ = 1/2 an estimator or an estimate?
• Empirical Distribution Function (EDF):
F̂X (x) = 1
n
n∑ i=1
1 (Xi ≤ x)
• Analog principle: replace true population value (CDF) with estimated sample value (EDF):
X̄n = µ̂ =
∫ xdF̂X (x)
• Properties of Estimators: • Finite Sample properties:
• Unbiasedness • Mean Square Error • Finite sample distribution
• Asymptotic properties: • Consistency • Asymptotic Distribution
• Unbiasedness
Eθθ̂ =
∫ . . .
∫ θ̂ (X1, . . . ,Xn) f (X1, . . . ,Xn|θ) dX1 . . . dXn = θ.
• MSE (function of θ, or θ0):
MSE = E ( θ̂ − θ0
)2 = Eθ
( θ̂ − θ
)2 MSE = Var
( θ̂ )
+ ( E ( θ̂ − θ0
))2 • More general loss functions Eθ`
( θ̂ − θ
) .
• Suppose X1,X2 ∼ i .i .d .Bernoulli(p).
p̂1 = X1 + X2
2 p̂2 = X1 p̂3 =
1
2 p̂1 and p̂2 are unbiased. p̂3 is biased.
• MSE:
MSE (p̂1) =Var (p̂1) = 1
2 p (1− p)
MSE (p̂2) =Var (p̂2) = p (1− p)
MSE (p̂3) =Bias (p̂2) 2 =
( 1
2 − p )2
.
• p̂2 is inadmissible: p̂1 is better.
• θ̂ is admissible if there is no estimator that is better (in the MSE sense) than θ̂ for some p and is at least as good as θ̂ for all p.
• p̂1 and p̂3 are admissible: can not choose one over another b/c p is unknown. A typical Bayesian estimator is p̂4 = wp̂1 + (1− w) p̂3.
• X̄n is best linear unbiased estimator (BLUE).
• Linearity: θ̂ = ∑n
i=1 ωiXi
• Unbiasedness: E θ̂ − θ ⇐⇒ ∑n
i=1 wi = 1.
MSE ( θ̂ )
= n∑
i=1
ω2i σ 2 = Var
( θ̂ )
(ω̂1, ω̂2, . . . , ω̂n) = arg min
ω1,ω2,...,ωn
n∑ i=1
ω2i such that n∑
i=1
ωi = 1.
Solution:
ω̂1 = ω̂2 = ω̂3 = . . . = ω̂n = 1
n .
• X̄n is BLUE (Gauss-Markov)
• What can be better than X̄n? • Nonlinear and unbiased estimator • Linear biased estimator • Nonlinear and biased estimator
• Example 1: Xi ∼ i .i .dUniform (0, θ), µ = EX =
∫ xdFX (x) =
θ 2 . Want to estimate µ.
• µ̂1 = X̄n
µ̂2 = n + 1
2n Zn, Zn = max (X1, . . . ,Xn)
• Since Zn < θ, bias correct by multiplying by n+1n .
MSE (µ̂1) = Var ( X̄n )
= σ2
n =
θ2
12n
MSE (µ̂2) = Var (µ̂2) + bias (µ̂2) 2
FZn (z) =P (max (X1, . . . ,Xn) ≤ z) = n∏
i=1
P (Xi ≤ z) = (z θ
)n fZn (z) =
∂
∂z FZn (z) = n
( zn−1
θn
) .
• Moments of Zn:
EZn =
∫ θ 0
zn zn−1
θn dz =
n
n + 1 θ
EZ 2n =
∫ θ 0
z2n zn−1
θn dz =
n
n + 2 θ2
Var (Zn) = EZ 2 n − (EZn)
2 = n
n + 2 θ2 −
[ n
n + 1 θ
]2 =
n
(n + 2)(n + 1)2 θ2
Var (µ̂2) = (n + 1)2
4n2 Var (Zn) =
θ2
4n (n + 2)
Bias (µ̂2) = E µ̂2 − θ
2 = 0
MSE (µ̂1)−MSE (µ̂2) = 1
12n θ2 − 1
4n (n + 2) θ2
= θ2 (
1
12n − 1
4n (n + 2)
) > 0
if n > 1.
• Large Sample Analysis • Weak consistency: θ̂n
p−→ θ0 as n→∞. • Strong consistency: θ̂n
a.s.−→ θ0 as n→∞.
• Rate of Convergence and Asymptotic Distribution (typically normal)
• Asymptotic Efficiency: this can be a difficult concept.
• Maximum Likelihood Estimator
• Likelihood function, a random function: f (Xn|θ) ≡ L (θ|Xn)
• Joint likelihood, conditional likelihood, marginal likelihood, partial likelihood.
• Joint likelihood, conditional likelihood, marginal likelihood, partial
• If Xn = (X1, . . . ,Xn) is i.i.d, then f (Xn|θ) = ∏n
i=1 f (Xi |θ).
• We can define θ̂MLE = arg maxθ∈Θ f (Xn|θ) ≡ L (θ|Xn).
• But for computational and statistical reasons, define
θ̂LMLE = arg max θ∈Θ
log L (θ|Xn) i .i .d .≡
n∑ i=1
log f (Xi |θ)
• θ̂LMLE = θ̂MLE if θ̂MLE can be computed analytically. But often times θ̂LMLE can be computed numerically but θ̂MLE can not. E.g. log L (θ|Xn) ≈ −500, then L (θ|Xn) ≈ 0.
• Recall that, using thea average log likelihood (to facilitate the proofs)
θ̂ = arg max θ∈Θ
1
n
n∑ i=1
log f (Xi |θ) = 1
n log L (θ|Xn) .
• Example 1:
Xi =
{ 1, p 0, 1− p i .i .d .
L (θ|Xn) = n∏
i=1
p (Xi |θ)
= n∏
i=1
pXi (1− p)1−Xi = p ∑
Xi (1− p)n− ∑
Xi
max log L (p|Xn) = ∑
(Xi log p + (1− Xi ) log (1− p))
• First order condition
∂ log L (p|Xn) ∂p
= 1
p
∑ Xi −
1
1− p ∑
(1− Xi ) = ∑ Xi − p
p (1− p) = 0
=⇒ p̂ = 1 n
n∑ i=1
Xi
• Example 2: Xi ∼ N ( µ, σ2
) , θ =
( µ, σ2
) L (θ|Xn) =
n∏ i=1
f (Xi |θ) = n∏
i=1
1√ 2πσ2
exp
( −(Xi − µ)
2
2σ2
)
. log L (θ|Xn) =C + ∑[
log 1
σ − (Xi − µ)
2
2σ2
]
µ̂ = arg min µ
n∑ i=1
(Xi − µ)2 ≡ X̄n
∂ log L (θ|Xn) ∂σ2
=− n 2σ2
+
∑n i=1 (Xi − µ)
2
2σ4 = 0
• σ̂2 = 1n ∑n
i=1 (Xi − µ̂) 2.
• σ̂2 is biased, but
S2 = 1
n − 1
n∑ i=1
(Xi − µ̂)2
is unbiased.
• Example 3: Xi ∼ vectork×1N (µ,Σ), θ = (µ,Σ), k + k(k+1)2 parameters.
L (θ|Xn) = ∏
f (Xi |θ)
= ∏ 1(√
2π )k |Σ|1/2 exp
( −(Xi − µ)
′Σ−1 (Xi − µ) 2
) log L (θ|Xn) =C +
n
2 log |Σ−1| − 1
2
∑ (Xi − µ)′Σ−1 (Xi − µ)
• Recall (from Dhrymes book)
∂x ′Ax
∂x = ( A + A′
) x = 2AX if A is symmetric
∂
∂A log |A| =A−1 using principal minors and cofactors
∂
∂A tr (AB) =B ′ tr (AB) = tr (BA)
tr (ABC ) =tr (BCA) = tr(CAB).
log L (θ|Xn) =C + n
2 log |Σ−1| − 1
2
∑ tr ( (Xi − µ)′Σ−1 (Xi − µ)
) =C +
n
2 log |Σ−1| − 1
2
∑ tr ( Σ−1 (Xi − µ) (Xi − µ)′
) ∂
∂µ log L (θ|Xn) =
∑ Σ−1 (Xi − µ) = 0
=⇒Σ−1 n∑
i=1
(Xi − µ) = 0
=⇒µ̂ = X̄n.
∂
∂Σ−1 log ( µ,Σ−1|Xn
) =
n
2 Σ− 1
2
n∑ i=1
(Xi − µ) (Xi − µ)′
Σ̂ = 1
n
n∑ i=1
(Xi − µ̂) (Xi − µ̂)′ = 1
n
n∑ i=1
( Xi − X̄n
) ( Xi − X̄n
)′ .
Often times can not compute MLE by hand: Qn (θ) = log L (θ|Xn) • Newton Raphson Iteration (max of quadratic approximation)
• Stochastic Optimization
• (Ken Judd) Numerical Methods for Economists
• Root finding: Bisection, Gauss-Newton Iteration
Initial Guess: θ(0):
Q (θ) ≈ Q ( θ(0)
) + ∂Q
( θ(0)
) ∂θ
( θ − θ(0)
) +
1
2
( θ − θ(0)
)′ ∂2 ∂θ∂θ′
Q ( θ(0)
)( θ − θ(0)
)
Q (θ) ≈ Q̃ ( θ, θ(0)
)
0 = ∂Q̃ ( θ, θ(0)
) ∂θ
= ∂Q ( θ(0) )
∂θ + ∂2Q
( θ(0) )