• date post

08-Apr-2020
• Category

## Documents

• view

9

1

Embed Size (px)

### Transcript of Econometrics I - Stanford University doubleh/eco270/pointestimation.pdf · PDF file...

• Econometrics I

Department of Economics Stanford University

November, 2016

Part II

• Topics

• Point Estimation.

• Interval Estimation

• Hypothesis Testing

• Sufficiency and Data Reduction (maybe).

• Different Approaches

• Frequentist: There exists a true parameter value θ0.

• Bayesian: θ is a random variable. Prior+Data =⇒ Posterior.

• Fiduciary Inference: No prior. Data =⇒ Posterior. Or Bayesian with uniform (diffused) prior.

• Data, sample of size n

• I.I.D Sampling (sampling with replacement)

• Different Sampling Schemes is a science by itself.

• Usual Notations: X1, . . . ,Xn, Y1, . . . ,Yn, Z1, . . . ,Zn.

• Xn, Yn, Zn.

• • Parameter θ: a function(al) of the distribution.

µ (FX (·)) = ∫

xfX (x) dx =

∫ xdFX (x) .

• Estimators: a function of the data:

θ̂ = φn (Xn) = φn (X1,X2, . . . , n)

• Strictly speaking, a sequence of functions of the data, since it is a different function for a different n. For example:

θ̂ = X̄n = X1 + X2 + · · ·+ Xn

n .

• Estimate: a realized value of the estimator.

• Is θ̂ = 1/2 an estimator or an estimate?

• • Empirical Distribution Function (EDF):

F̂X (x) = 1

n

n∑ i=1

1 (Xi ≤ x)

• Analog principle: replace true population value (CDF) with estimated sample value (EDF):

X̄n = µ̂ =

∫ xdF̂X (x)

• Properties of Estimators: • Finite Sample properties:

• Unbiasedness • Mean Square Error • Finite sample distribution

• Asymptotic properties: • Consistency • Asymptotic Distribution

• • Unbiasedness

Eθθ̂ =

∫ . . .

∫ θ̂ (X1, . . . ,Xn) f (X1, . . . ,Xn|θ) dX1 . . . dXn = θ.

• MSE (function of θ, or θ0):

MSE = E ( θ̂ − θ0

)2 = Eθ

( θ̂ − θ

)2 MSE = Var

( θ̂ )

+ ( E ( θ̂ − θ0

))2 • More general loss functions Eθ`

( θ̂ − θ

) .

• • Suppose X1,X2 ∼ i .i .d .Bernoulli(p).

p̂1 = X1 + X2

2 p̂2 = X1 p̂3 =

1

2 p̂1 and p̂2 are unbiased. p̂3 is biased.

• MSE:

MSE (p̂1) =Var (p̂1) = 1

2 p (1− p)

MSE (p̂2) =Var (p̂2) = p (1− p)

MSE (p̂3) =Bias (p̂2) 2 =

( 1

2 − p )2

.

• p̂2 is inadmissible: p̂1 is better.

• θ̂ is admissible if there is no estimator that is better (in the MSE sense) than θ̂ for some p and is at least as good as θ̂ for all p.

• p̂1 and p̂3 are admissible: can not choose one over another b/c p is unknown. A typical Bayesian estimator is p̂4 = wp̂1 + (1− w) p̂3.

• • X̄n is best linear unbiased estimator (BLUE).

• Linearity: θ̂ = ∑n

i=1 ωiXi

• Unbiasedness: E θ̂ − θ ⇐⇒ ∑n

i=1 wi = 1.

MSE ( θ̂ )

= n∑

i=1

ω2i σ 2 = Var

( θ̂ )

(ω̂1, ω̂2, . . . , ω̂n) = arg min

ω1,ω2,...,ωn

n∑ i=1

ω2i such that n∑

i=1

ωi = 1.

Solution:

ω̂1 = ω̂2 = ω̂3 = . . . = ω̂n = 1

n .

• • X̄n is BLUE (Gauss-Markov)

• What can be better than X̄n? • Nonlinear and unbiased estimator • Linear biased estimator • Nonlinear and biased estimator

• • Example 1: Xi ∼ i .i .dUniform (0, θ), µ = EX =

∫ xdFX (x) =

θ 2 . Want to estimate µ.

• µ̂1 = X̄n

µ̂2 = n + 1

2n Zn, Zn = max (X1, . . . ,Xn)

• Since Zn < θ, bias correct by multiplying by n+1n .

MSE (µ̂1) = Var ( X̄n )

= σ2

n =

θ2

12n

MSE (µ̂2) = Var (µ̂2) + bias (µ̂2) 2

FZn (z) =P (max (X1, . . . ,Xn) ≤ z) = n∏

i=1

P (Xi ≤ z) = (z θ

)n fZn (z) =

∂z FZn (z) = n

( zn−1

θn

) .

• • Moments of Zn:

EZn =

∫ θ 0

zn zn−1

θn dz =

n

n + 1 θ

EZ 2n =

∫ θ 0

z2n zn−1

θn dz =

n

n + 2 θ2

Var (Zn) = EZ 2 n − (EZn)

2 = n

n + 2 θ2 −

[ n

n + 1 θ

]2 =

n

(n + 2)(n + 1)2 θ2

Var (µ̂2) = (n + 1)2

4n2 Var (Zn) =

θ2

4n (n + 2)

Bias (µ̂2) = E µ̂2 − θ

2 = 0

MSE (µ̂1)−MSE (µ̂2) = 1

12n θ2 − 1

4n (n + 2) θ2

= θ2 (

1

12n − 1

4n (n + 2)

) > 0

if n > 1.

• • Large Sample Analysis • Weak consistency: θ̂n

p−→ θ0 as n→∞. • Strong consistency: θ̂n

a.s.−→ θ0 as n→∞.

• Rate of Convergence and Asymptotic Distribution (typically normal)

• Asymptotic Efficiency: this can be a difficult concept.

• • Maximum Likelihood Estimator

• Likelihood function, a random function: f (Xn|θ) ≡ L (θ|Xn)

• Joint likelihood, conditional likelihood, marginal likelihood, partial likelihood.

• Joint likelihood, conditional likelihood, marginal likelihood, partial

• If Xn = (X1, . . . ,Xn) is i.i.d, then f (Xn|θ) = ∏n

i=1 f (Xi |θ).

• We can define θ̂MLE = arg maxθ∈Θ f (Xn|θ) ≡ L (θ|Xn).

• But for computational and statistical reasons, define

θ̂LMLE = arg max θ∈Θ

log L (θ|Xn) i .i .d .≡

n∑ i=1

log f (Xi |θ)

• θ̂LMLE = θ̂MLE if θ̂MLE can be computed analytically. But often times θ̂LMLE can be computed numerically but θ̂MLE can not. E.g. log L (θ|Xn) ≈ −500, then L (θ|Xn) ≈ 0.

• • Recall that, using thea average log likelihood (to facilitate the proofs)

θ̂ = arg max θ∈Θ

1

n

n∑ i=1

log f (Xi |θ) = 1

n log L (θ|Xn) .

• Example 1:

Xi =

{ 1, p 0, 1− p i .i .d .

L (θ|Xn) = n∏

i=1

p (Xi |θ)

= n∏

i=1

pXi (1− p)1−Xi = p ∑

Xi (1− p)n− ∑

Xi

max log L (p|Xn) = ∑

(Xi log p + (1− Xi ) log (1− p))

• • First order condition

∂ log L (p|Xn) ∂p

= 1

p

∑ Xi −

1

1− p ∑

(1− Xi ) = ∑ Xi − p

p (1− p) = 0

=⇒ p̂ = 1 n

n∑ i=1

Xi

• Example 2: Xi ∼ N ( µ, σ2

) , θ =

( µ, σ2

) L (θ|Xn) =

n∏ i=1

f (Xi |θ) = n∏

i=1

1√ 2πσ2

exp

( −(Xi − µ)

2

2σ2

)

. log L (θ|Xn) =C + ∑[

log 1

σ − (Xi − µ)

2

2σ2

]

µ̂ = arg min µ

n∑ i=1

(Xi − µ)2 ≡ X̄n

∂ log L (θ|Xn) ∂σ2

=− n 2σ2

+

∑n i=1 (Xi − µ)

2

2σ4 = 0

• • σ̂2 = 1n ∑n

i=1 (Xi − µ̂) 2.

• σ̂2 is biased, but

S2 = 1

n − 1

n∑ i=1

(Xi − µ̂)2

is unbiased.

• Example 3: Xi ∼ vectork×1N (µ,Σ), θ = (µ,Σ), k + k(k+1)2 parameters.

L (θ|Xn) = ∏

f (Xi |θ)

= ∏ 1(√

2π )k |Σ|1/2 exp

( −(Xi − µ)

′Σ−1 (Xi − µ) 2

) log L (θ|Xn) =C +

n

2 log |Σ−1| − 1

2

∑ (Xi − µ)′Σ−1 (Xi − µ)

• • Recall (from Dhrymes book)

∂x ′Ax

∂x = ( A + A′

) x = 2AX if A is symmetric

∂A log |A| =A−1 using principal minors and cofactors

∂A tr (AB) =B ′ tr (AB) = tr (BA)

tr (ABC ) =tr (BCA) = tr(CAB).

• log L (θ|Xn) =C + n

2 log |Σ−1| − 1

2

∑ tr ( (Xi − µ)′Σ−1 (Xi − µ)

) =C +

n

2 log |Σ−1| − 1

2

∑ tr ( Σ−1 (Xi − µ) (Xi − µ)′

) ∂

∂µ log L (θ|Xn) =

∑ Σ−1 (Xi − µ) = 0

=⇒Σ−1 n∑

i=1

(Xi − µ) = 0

=⇒µ̂ = X̄n.

∂Σ−1 log ( µ,Σ−1|Xn

) =

n

2 Σ− 1

2

n∑ i=1

(Xi − µ) (Xi − µ)′

Σ̂ = 1

n

n∑ i=1

(Xi − µ̂) (Xi − µ̂)′ = 1

n

n∑ i=1

( Xi − X̄n

) ( Xi − X̄n

)′ .

• Often times can not compute MLE by hand: Qn (θ) = log L (θ|Xn) • Newton Raphson Iteration (max of quadratic approximation)

• Stochastic Optimization

• (Ken Judd) Numerical Methods for Economists

• Root finding: Bisection, Gauss-Newton Iteration

• Initial Guess: θ(0):

Q (θ) ≈ Q ( θ(0)

) + ∂Q

( θ(0)

) ∂θ

( θ − θ(0)

) +

1

2

( θ − θ(0)

)′ ∂2 ∂θ∂θ′

Q ( θ(0)

)( θ − θ(0)

)

Q (θ) ≈ Q̃ ( θ, θ(0)

)

0 = ∂Q̃ ( θ, θ(0)

) ∂θ

= ∂Q ( θ(0) )

∂θ + ∂2Q

( θ(0) )