Example - University of Chicagogalton.uchicago.edu/~eichler/stat24600/Handouts/s02add.pdf ·...

1
Example Fitting a Poisson distribution (correctly specified case) Suppose that X 1 ,...,X n are independent and Poisson distributed, X i iid Poisson(λ 0 ). The log-likelihood function is l n (λ|X) = log(λ) n i=1 X i - n i=1 log(X i !) - n λ. Differentiating with respect to λ, we obtain the score function S(θ|X)= ∂ln(λ|X) ∂λ = 1 λ n i=1 X i - n and the ML estimator ˆ λ ML = 1 n n i=1 X i . The second derivative of the log-likelihood function is 2 ln(λ|X) ∂λ 2 = - 1 λ 2 n i=1 X i which yields the observed Fisher-information I (λ|X)= - 2 ln(λ|X) ∂λ 2 = 1 λ 2 n i=1 X i and the (expected) Fisher-information I (λ|X)= - 2 ln(λ|X) ∂λ 2 · = λ 2 = n λ . Therefore the MLE is approximately normally distributed with mean λ and variance λ/n. Maximum Likelihood Estimation (Addendum), Apr 8, 2004 -1- Example Fitting a Poisson distribution (misspecified case) Now suppose that the variables X i and binomially distributed, X i iid Bin(m, θ 0 ). How does the MLE ˆ λ ML of the fitted Poisson model relate to the true distribution? The “distance” between the fitted model and the true model can be mea- sured by the Kullback-Leibler distance, log fBin(X|θ0) fPoiss(X|λ) · = ( log f Bin (X|θ 0 ) ) - ( log f Poiss (X|λ) ) = ( λ - X i log(λ) ) + terms constant in λ = λ - log(λ) + terms constant in λ. Differentiating with respect to λ, we obtain 1 - m θ0 λ =0 λ = 0 . Thus the MLE ˆ λ ML converges to λ 0 = 0 . Maximum Likelihood Estimation (Addendum), Apr 8, 2004 -2- Asymptotic Properties of the MLE Let ˆ θ be the MLE for θ 0 . Taylor expansion of the score function at ˆ θ about θ 0 yields ∂ln( ˆ θ|Y ) ∂θ ∂ln(θ0|Y ) ∂θ + 2 ln(θ0|Y ) ∂θ 2 ( ˆ θ - θ 0 ) (1) and hence ˆ θ - θ 0 ≈- 2 ln(θ0|Y ) ∂θ 2 · -1 ∂ln(θ0|Y ) ∂θ , since the left side of (1) is zero. Furthermore since 2 ln(θ0|Y ) ∂θ 2 I (θ 0 |Y ) and ∂ln(θ0|Y ) ∂θ · = log f (Y |θ0) ∂θ · =0, this suggests that var ( ˆ θ - θ 0 ) I (θ 0 ) -1 ∂ln(θ0|Y ) ∂θ · 2 I (θ 0 ) -1 . If the model is correctly specified, we have 2 ln(θ0|Y ) ∂θ 2 · = 2 log f (Y |θ0) ∂θ 2 · = h ∂θ ∂f (Y |θ0) ∂θ 1 f (Y |θ0) = h 2 f (Y |θ0) ∂θ 2 1 f (Y |θ0) i - h‡ ∂f (Y |θ0) ∂θ · 2 1 f (Y |θ0) · 2 i . Noting that h 2 f (Y |θ0) ∂θ 2 1 f (Y |θ 0 ) i = Z 2 f (y|θ0) ∂θ 2 dy = 2 ∂θ 2 Z f (y|θ 0 ) dy =0, we obtain I (θ 0 )= ∂f (Y |θ0) ∂θ 1 f (Y |θ0 · 2 = log f (Y |θ0) ∂θ · 2 = ∂ln(θ0|Y ) ∂θ · 2 = ( S(θ|Y ) ) 2 Maximum Likelihood Estimation (Addendum), Apr 8, 2004 -3- Example Fitting a Poisson distribution (misspecified case) The variance of the MLE ˆ λ can be approximated by I (λ 0 ) -1 ( S(λ 0 |Y ) 2 ) I (λ 0 ) -1 . Using the formulas for the first and second derivative, we find that ( S(λ 0 |Y ) ) 2 = 1 λ0 n i=1 X i - n · 2 = nmθ0 (1 - θ0) λ 2 0 + n 2 m 2 θ 2 0 λ 2 - 2 n 2 0 λ + n 2 · = n 1 - θ0 λ0 = n 1 0 - 1 m · and I (λ 0 )= 1 λ 2 0 n i=1 X i · = nmθ0 λ 2 0 = n 0 , where we used that n i=1 X i Bin(m n, θ 0 ). Hence the variance of ˆ λ becomes m 2 θ 2 0 n 1 0 - 1 m · = 0 (1 - θ0) n . Maximum Likelihood Estimation (Addendum), Apr 8, 2004 -4-

Transcript of Example - University of Chicagogalton.uchicago.edu/~eichler/stat24600/Handouts/s02add.pdf ·...

Page 1: Example - University of Chicagogalton.uchicago.edu/~eichler/stat24600/Handouts/s02add.pdf · Example Fitting a Poisson distribution (correctly specifled case) Suppose that X1;:::;Xn

Example

Fitting a Poisson distribution (correctly specified case)

Suppose that X1, . . . , Xn are independent and Poisson distributed,

Xi

iid∼ Poisson(λ0).

The log-likelihood function is

ln(λ|X) = log(λ)n∑

i=1

Xi −n∑

i=1

log(Xi!) − n λ.

Differentiating with respect to λ, we obtain the score function

S(θ|X) =∂l

n(λ|X)

∂λ=

1

λ

n∑

i=1

Xi − n

and the ML estimator

λ̂ML =1

n

n∑

i=1

Xi.

The second derivative of the log-likelihood function is

∂2ln(λ|X)

∂λ2= −

1

λ2

n∑

i=1

Xi

which yields the observed Fisher-information

I(λ|X) = −∂2l

n(λ|X)

∂λ2=

1

λ2

n∑

i=1

Xi

and the (expected) Fisher-information

I(λ|X) = −�(

∂2ln(λ|X)

∂λ2

)

=n λ

λ2=

n

λ.

Therefore the MLE is approximately normally distributed with mean λ

and variance λ/n.

Maximum Likelihood Estimation (Addendum), Apr 8, 2004 - 1 -

Example

Fitting a Poisson distribution (misspecified case)

Now suppose that the variables Xi and binomially distributed,

Xi

iid∼ Bin(m, θ0).

How does the MLE λ̂ML of the fitted Poisson model relate to the true

distribution?

The “distance” between the fitted model and the true model can be mea-

sured by the Kullback-Leibler distance,

�(

logfBin(X|θ0)

fPoiss(X|λ)

)

=� (

log fBin(X|θ0))

−� (

log fPoiss(X|λ))

=� (

λ − Xi log(λ))

+ terms constant in λ

= λ − m θ log(λ) + terms constant in λ.

Differentiating with respect to λ, we obtain

1 − mθ0

λ= 0 ⇔ λ = m θ0.

Thus the MLE λ̂ML converges to λ0 = m θ0.

Maximum Likelihood Estimation (Addendum), Apr 8, 2004 - 2 -

Asymptotic Properties of the MLE

Let θ̂ be the MLE for θ0. Taylor expansion of the score function at θ̂ about

θ0 yields

∂ln(θ̂|Y )

∂θ≈

∂ln(θ0|Y )

∂θ+

∂2ln(θ0|Y )

∂θ2(θ̂ − θ0) (1)

and hence

θ̂ − θ0 ≈ −(

∂2ln(θ0|Y )

∂θ2

)−1∂ln(θ0|Y )

∂θ,

since the left side of (1) is zero. Furthermore since

∂2ln(θ0|Y )

∂θ2→ I(θ0|Y )

and

�(

∂ln(θ0|Y )

∂θ

)

=�(

∂ log f(Y |θ0)

∂θ

)

= 0,

this suggests that

var(

θ̂ − θ0

)

≈ I(θ0)−1 �

(

∂ln(θ0|Y )

∂θ

)2

I(θ0)−1.

If the model is correctly specified, we have

�(

∂2ln(θ0|Y )

∂θ2

)

=�(

∂2 log f(Y |θ0)

∂θ2

)

=�[

∂θ

(∂f(Y |θ0)

∂θ

1

f(Y |θ0)

])

=�[

∂2f(Y |θ0)

∂θ2

1

f(Y |θ0)

]

−�[(

∂f(Y |θ0)

∂θ

)2( 1

f(Y |θ0)

)2]

.

Noting that

�[

∂2f(Y |θ0)

∂θ2

1

f(Y |θ0)

]

=

∂2f(y|θ0)

∂θ2dy =

∂2

∂θ2

f(y|θ0) dy = 0,

we obtain

I(θ0) =�(

∂f(Y |θ0)

∂θ

1

f(Y |θ0

)2

=�(

∂ log f(Y |θ0)

∂θ

)2

=�(

∂ln(θ0|Y )

∂θ

)2

=� (

S(θ|Y ))2

Maximum Likelihood Estimation (Addendum), Apr 8, 2004 - 3 -

Example

Fitting a Poisson distribution (misspecified case)

The variance of the MLE λ̂ can be approximated by

I(λ0)−1 � (

S(λ0|Y )2)

I(λ0)−1.

Using the formulas for the first and second derivative, we find that

� (S(λ0|Y )

)2=

�(

1

λ0

n∑

i=1

Xi − n)2

=(

n m θ0 (1 − θ0)

λ2

0

+n2 m2 θ2

0

λ2− 2

n2 m θ0

λ+ n2

)

= n1 − θ0

λ0

= n(

1

m θ0

−1

m

)

and

I(λ0) =�(

1

λ2

0

n∑

i=1

Xi

)

=n m θ0

λ2

0

=n

m θ0

,

where we used that

n∑

i=1

Xi ∼ Bin(m n, θ0).

Hence the variance of λ̂ becomes

m2θ2

0

n

(

1

m θ0

−1

m

)

=m θ0 (1 − θ0)

n.

Maximum Likelihood Estimation (Addendum), Apr 8, 2004 - 4 -