Econometrics I - Stanford University doubleh/eco270/ ¢  2016-11-16¢ ...

download Econometrics I - Stanford University doubleh/eco270/ ¢  2016-11-16¢  Econometrics

of 49

  • date post

    08-Apr-2020
  • Category

    Documents

  • view

    7
  • download

    1

Embed Size (px)

Transcript of Econometrics I - Stanford University doubleh/eco270/ ¢  2016-11-16¢ ...

  • Econometrics I

    Department of Economics Stanford University

    November, 2016

    Part II

  • Topics

    • Point Estimation.

    • Interval Estimation

    • Hypothesis Testing

    • Sufficiency and Data Reduction (maybe).

  • Different Approaches

    • Frequentist: There exists a true parameter value θ0.

    • Bayesian: θ is a random variable. Prior+Data =⇒ Posterior.

    • Fiduciary Inference: No prior. Data =⇒ Posterior. Or Bayesian with uniform (diffused) prior.

  • Data, sample of size n

    • I.I.D Sampling (sampling with replacement)

    • Different Sampling Schemes is a science by itself.

    • Usual Notations: X1, . . . ,Xn, Y1, . . . ,Yn, Z1, . . . ,Zn.

    • Xn, Yn, Zn.

  • • Parameter θ: a function(al) of the distribution.

    µ (FX (·)) = ∫

    xfX (x) dx =

    ∫ xdFX (x) .

    • Estimators: a function of the data:

    θ̂ = φn (Xn) = φn (X1,X2, . . . , n)

    • Strictly speaking, a sequence of functions of the data, since it is a different function for a different n. For example:

    θ̂ = X̄n = X1 + X2 + · · ·+ Xn

    n .

    • Estimate: a realized value of the estimator.

    • Is θ̂ = 1/2 an estimator or an estimate?

  • • Empirical Distribution Function (EDF):

    F̂X (x) = 1

    n

    n∑ i=1

    1 (Xi ≤ x)

    • Analog principle: replace true population value (CDF) with estimated sample value (EDF):

    X̄n = µ̂ =

    ∫ xdF̂X (x)

    • Properties of Estimators: • Finite Sample properties:

    • Unbiasedness • Mean Square Error • Finite sample distribution

    • Asymptotic properties: • Consistency • Asymptotic Distribution

  • • Unbiasedness

    Eθθ̂ =

    ∫ . . .

    ∫ θ̂ (X1, . . . ,Xn) f (X1, . . . ,Xn|θ) dX1 . . . dXn = θ.

    • MSE (function of θ, or θ0):

    MSE = E ( θ̂ − θ0

    )2 = Eθ

    ( θ̂ − θ

    )2 MSE = Var

    ( θ̂ )

    + ( E ( θ̂ − θ0

    ))2 • More general loss functions Eθ`

    ( θ̂ − θ

    ) .

  • • Suppose X1,X2 ∼ i .i .d .Bernoulli(p).

    p̂1 = X1 + X2

    2 p̂2 = X1 p̂3 =

    1

    2 p̂1 and p̂2 are unbiased. p̂3 is biased.

    • MSE:

    MSE (p̂1) =Var (p̂1) = 1

    2 p (1− p)

    MSE (p̂2) =Var (p̂2) = p (1− p)

    MSE (p̂3) =Bias (p̂2) 2 =

    ( 1

    2 − p )2

    .

    • p̂2 is inadmissible: p̂1 is better.

    • θ̂ is admissible if there is no estimator that is better (in the MSE sense) than θ̂ for some p and is at least as good as θ̂ for all p.

    • p̂1 and p̂3 are admissible: can not choose one over another b/c p is unknown. A typical Bayesian estimator is p̂4 = wp̂1 + (1− w) p̂3.

  • • X̄n is best linear unbiased estimator (BLUE).

    • Linearity: θ̂ = ∑n

    i=1 ωiXi

    • Unbiasedness: E θ̂ − θ ⇐⇒ ∑n

    i=1 wi = 1.

    MSE ( θ̂ )

    = n∑

    i=1

    ω2i σ 2 = Var

    ( θ̂ )

    (ω̂1, ω̂2, . . . , ω̂n) = arg min

    ω1,ω2,...,ωn

    n∑ i=1

    ω2i such that n∑

    i=1

    ωi = 1.

    Solution:

    ω̂1 = ω̂2 = ω̂3 = . . . = ω̂n = 1

    n .

  • • X̄n is BLUE (Gauss-Markov)

    • What can be better than X̄n? • Nonlinear and unbiased estimator • Linear biased estimator • Nonlinear and biased estimator

  • • Example 1: Xi ∼ i .i .dUniform (0, θ), µ = EX =

    ∫ xdFX (x) =

    θ 2 . Want to estimate µ.

    • µ̂1 = X̄n

    µ̂2 = n + 1

    2n Zn, Zn = max (X1, . . . ,Xn)

    • Since Zn < θ, bias correct by multiplying by n+1n .

    MSE (µ̂1) = Var ( X̄n )

    = σ2

    n =

    θ2

    12n

    MSE (µ̂2) = Var (µ̂2) + bias (µ̂2) 2

    FZn (z) =P (max (X1, . . . ,Xn) ≤ z) = n∏

    i=1

    P (Xi ≤ z) = (z θ

    )n fZn (z) =

    ∂z FZn (z) = n

    ( zn−1

    θn

    ) .

  • • Moments of Zn:

    EZn =

    ∫ θ 0

    zn zn−1

    θn dz =

    n

    n + 1 θ

    EZ 2n =

    ∫ θ 0

    z2n zn−1

    θn dz =

    n

    n + 2 θ2

    Var (Zn) = EZ 2 n − (EZn)

    2 = n

    n + 2 θ2 −

    [ n

    n + 1 θ

    ]2 =

    n

    (n + 2)(n + 1)2 θ2

    Var (µ̂2) = (n + 1)2

    4n2 Var (Zn) =

    θ2

    4n (n + 2)

    Bias (µ̂2) = E µ̂2 − θ

    2 = 0

    MSE (µ̂1)−MSE (µ̂2) = 1

    12n θ2 − 1

    4n (n + 2) θ2

    = θ2 (

    1

    12n − 1

    4n (n + 2)

    ) > 0

    if n > 1.

  • • Large Sample Analysis • Weak consistency: θ̂n

    p−→ θ0 as n→∞. • Strong consistency: θ̂n

    a.s.−→ θ0 as n→∞.

    • Rate of Convergence and Asymptotic Distribution (typically normal)

    • Asymptotic Efficiency: this can be a difficult concept.

  • • Maximum Likelihood Estimator

    • Likelihood function, a random function: f (Xn|θ) ≡ L (θ|Xn)

    • Joint likelihood, conditional likelihood, marginal likelihood, partial likelihood.

    • Joint likelihood, conditional likelihood, marginal likelihood, partial

    • If Xn = (X1, . . . ,Xn) is i.i.d, then f (Xn|θ) = ∏n

    i=1 f (Xi |θ).

    • We can define θ̂MLE = arg maxθ∈Θ f (Xn|θ) ≡ L (θ|Xn).

    • But for computational and statistical reasons, define

    θ̂LMLE = arg max θ∈Θ

    log L (θ|Xn) i .i .d .≡

    n∑ i=1

    log f (Xi |θ)

    • θ̂LMLE = θ̂MLE if θ̂MLE can be computed analytically. But often times θ̂LMLE can be computed numerically but θ̂MLE can not. E.g. log L (θ|Xn) ≈ −500, then L (θ|Xn) ≈ 0.

  • • Recall that, using thea average log likelihood (to facilitate the proofs)

    θ̂ = arg max θ∈Θ

    1

    n

    n∑ i=1

    log f (Xi |θ) = 1

    n log L (θ|Xn) .

    • Example 1:

    Xi =

    { 1, p 0, 1− p i .i .d .

    L (θ|Xn) = n∏

    i=1

    p (Xi |θ)

    = n∏

    i=1

    pXi (1− p)1−Xi = p ∑

    Xi (1− p)n− ∑

    Xi

    max log L (p|Xn) = ∑

    (Xi log p + (1− Xi ) log (1− p))

  • • First order condition

    ∂ log L (p|Xn) ∂p

    = 1

    p

    ∑ Xi −

    1

    1− p ∑

    (1− Xi ) = ∑ Xi − p

    p (1− p) = 0

    =⇒ p̂ = 1 n

    n∑ i=1

    Xi

    • Example 2: Xi ∼ N ( µ, σ2

    ) , θ =

    ( µ, σ2

    ) L (θ|Xn) =

    n∏ i=1

    f (Xi |θ) = n∏

    i=1

    1√ 2πσ2

    exp

    ( −(Xi − µ)

    2

    2σ2

    )

    . log L (θ|Xn) =C + ∑[

    log 1

    σ − (Xi − µ)

    2

    2σ2

    ]

    µ̂ = arg min µ

    n∑ i=1

    (Xi − µ)2 ≡ X̄n

    ∂ log L (θ|Xn) ∂σ2

    =− n 2σ2

    +

    ∑n i=1 (Xi − µ)

    2

    2σ4 = 0

  • • σ̂2 = 1n ∑n

    i=1 (Xi − µ̂) 2.

    • σ̂2 is biased, but

    S2 = 1

    n − 1

    n∑ i=1

    (Xi − µ̂)2

    is unbiased.

    • Example 3: Xi ∼ vectork×1N (µ,Σ), θ = (µ,Σ), k + k(k+1)2 parameters.

    L (θ|Xn) = ∏

    f (Xi |θ)

    = ∏ 1(√

    2π )k |Σ|1/2 exp

    ( −(Xi − µ)

    ′Σ−1 (Xi − µ) 2

    ) log L (θ|Xn) =C +

    n

    2 log |Σ−1| − 1

    2

    ∑ (Xi − µ)′Σ−1 (Xi − µ)

  • • Recall (from Dhrymes book)

    ∂x ′Ax

    ∂x = ( A + A′

    ) x = 2AX if A is symmetric

    ∂A log |A| =A−1 using principal minors and cofactors

    ∂A tr (AB) =B ′ tr (AB) = tr (BA)

    tr (ABC ) =tr (BCA) = tr(CAB).

  • log L (θ|Xn) =C + n

    2 log |Σ−1| − 1

    2

    ∑ tr ( (Xi − µ)′Σ−1 (Xi − µ)

    ) =C +

    n

    2 log |Σ−1| − 1

    2

    ∑ tr ( Σ−1 (Xi − µ) (Xi − µ)′

    ) ∂

    ∂µ log L (θ|Xn) =

    ∑ Σ−1 (Xi − µ) = 0

    =⇒Σ−1 n∑

    i=1

    (Xi − µ) = 0

    =⇒µ̂ = X̄n.

    ∂Σ−1 log ( µ,Σ−1|Xn

    ) =

    n

    2 Σ− 1

    2

    n∑ i=1

    (Xi − µ) (Xi − µ)′

    Σ̂ = 1

    n

    n∑ i=1

    (Xi − µ̂) (Xi − µ̂)′ = 1

    n

    n∑ i=1

    ( Xi − X̄n

    ) ( Xi − X̄n

    )′ .

  • Often times can not compute MLE by hand: Qn (θ) = log L (θ|Xn) • Newton Raphson Iteration (max of quadratic approximation)

    • Stochastic Optimization

    • (Ken Judd) Numerical Methods for Economists

    • Root finding: Bisection, Gauss-Newton Iteration

  • Initial Guess: θ(0):

    Q (θ) ≈ Q ( θ(0)

    ) + ∂Q

    ( θ(0)

    ) ∂θ

    ( θ − θ(0)

    ) +

    1

    2

    ( θ − θ(0)

    )′ ∂2 ∂θ∂θ′

    Q ( θ(0)

    )( θ − θ(0)

    )

    Q (θ) ≈ Q̃ ( θ, θ(0)

    )

    0 = ∂Q̃ ( θ, θ(0)

    ) ∂θ

    = ∂Q ( θ(0) )

    ∂θ + ∂2Q

    ( θ(0) )