Maximum Likelihood Estimation -...

52
University of Pavia 2007 Maximum Likelihood Estimation Eduardo Rossi University of Pavia

Transcript of Maximum Likelihood Estimation -...

Page 1: Maximum Likelihood Estimation - unipveconomia.unipv.it/pagp/pagine_personali/erossi/macroeconometria_4... · Maximum Likelihood Estimation Eduardo Rossi University of Pavia. Likelihood

University of Pavia

2007

Maximum Likelihood Estimation

Eduardo Rossi

University of Pavia

Page 2: Maximum Likelihood Estimation - unipveconomia.unipv.it/pagp/pagine_personali/erossi/macroeconometria_4... · Maximum Likelihood Estimation Eduardo Rossi University of Pavia. Likelihood

Likelihood function

Choosing parameter values that make what one has observed more

likely to occur than any other parameter values do.

Distribution: The pair {U, V } is a random variable and the N

variables

{(U1, V1), . . . , (UN , VN )}

are i.i.d. random sample of (U, V ).

FU |V (u|v; θ0) is completely known but θ0 (true value of the

real-valued parameter vector) is unknown, θ ∈ RK .

Support of FU |V is S(θ0)

S(θ0)

dFU |V (u|v; θ0) = 1 =

∑u∈S(θ0)

f(u|v; θ0) if U discrete∫

S(θ0)f(u|v; θ0)du if U continuous

Eduardo Rossi c© - Macroeconometria 07 2

Page 3: Maximum Likelihood Estimation - unipveconomia.unipv.it/pagp/pagine_personali/erossi/macroeconometria_4... · Maximum Likelihood Estimation Eduardo Rossi University of Pavia. Likelihood

Likelihood function

Probability function for (U1, . . . , UN)|(V1, . . . , VN )

N∏

t=1

f(ut|vt; θ0)

Normal Linear Regression: yt = x′tβ0 + ǫt, (yt,xt) i.i.d. normal

ut = yt, vt = xt

f(ut|vt; θ0) =1√2πσ2

0

exp

[− (yt − x′

tβ0)

2σ20

]

S(θ0) = R. Since the obs are i.i.d. normal. The conditional p.d.f. of

the sample is

N∏

t=1

f(ut|vt; θ0) =[2πσ2

0

]−N/2exp

[− (y − Xβ0)

′(y − Xβ0)

2σ20

]

Eduardo Rossi c© - Macroeconometria 07 3

Page 4: Maximum Likelihood Estimation - unipveconomia.unipv.it/pagp/pagine_personali/erossi/macroeconometria_4... · Maximum Likelihood Estimation Eduardo Rossi University of Pavia. Likelihood

Likelihood function

The marginal distribution of xt does not depend on θ0.

Student’s t Linear Regression

yt − x′tβ0 |xt

σ0∼ tν0

f(ut|vt; θ0) =Γ[(ν0 + 1)/2]

Γ(ν0/2)

1√πν0σ2

0

[1 +

(yt − x′tβ0)

2

ν0σ20

]−(ν0+1)/2

Eduardo Rossi c© - Macroeconometria 07 4

Page 5: Maximum Likelihood Estimation - unipveconomia.unipv.it/pagp/pagine_personali/erossi/macroeconometria_4... · Maximum Likelihood Estimation Eduardo Rossi University of Pavia. Likelihood

Likelihood function

Laplace Linear Regression

f(ut|vt; σ20) =

1√2σ2

0

exp−√

2|yt − x′

tβ0|σ0

U = yt, V = xt, S(θ0) = R, θ0 = [β′0, σ

20 ]

′.

We can obtain

h(θ0) ≡ E[g(u)] =

∫g(u)dF (u; θ0)

h(v; θ0) ≡ E[g(U, V )|V = v] =

∫g(u, v)dF (u|v; θ0)

Eduardo Rossi c© - Macroeconometria 07 5

Page 6: Maximum Likelihood Estimation - unipveconomia.unipv.it/pagp/pagine_personali/erossi/macroeconometria_4... · Maximum Likelihood Estimation Eduardo Rossi University of Pavia. Likelihood

The likelihood function

Unconditional specification: f(u; θ) describes the likely values of

every r.v. Ut, t = 1, 2 . . . , N for a specific value of θ0.

The sample likelihood function treats the u argument as given and θ0

as variable.

It describes the likely values of the unknown θ0 given the realizations

of the r.v. U .

The likelihood function of θ for a random variable U with p.f.

F (u; θ0) is defined to be

l(θ; U) = f(u; θ)

L(θ; U) = log l(θ; U)

Eduardo Rossi c© - Macroeconometria 07 6

Page 7: Maximum Likelihood Estimation - unipveconomia.unipv.it/pagp/pagine_personali/erossi/macroeconometria_4... · Maximum Likelihood Estimation Eduardo Rossi University of Pavia. Likelihood

The Likelihood function

Likelihood function: we evaluate the p.f. at a random variable and

consider the result as a function of the variable θ:

L(θ; U1, . . . , UN ) = log

[N∏

t=1

f(Ut; θ)

]

=

N∑

t=1

L(θ; Ut)

The conditional likelihood function of θ for a r.v. U with p.f.

f(u|v; θ0) given the r.v. V is

l(θ, U |V ) = f(u|v; θ)

L(θ; U |V ) = log l(θ; U |V )

θ0 ∈ Θ, Θ parameter space, the set of permitted values of the model.

Eduardo Rossi c© - Macroeconometria 07 7

Page 8: Maximum Likelihood Estimation - unipveconomia.unipv.it/pagp/pagine_personali/erossi/macroeconometria_4... · Maximum Likelihood Estimation Eduardo Rossi University of Pavia. Likelihood

Assumptions

Assumption (Dominance condition)

E

[supθ∈Θ

|L(θ; U |V )|]

exists.

This means that |L(θ; U |V )| is dominated by

h(U, V ) ≡ supθ∈Θ

|L(θ; U |V )|

where h(U, V ) does not depend on θ. The existence of E[h(U)]

implies the existence of E[L(θ; U |V )], θ ∈ Θ.

Lemma. If L(θ; U |V ) is the conditional log-likelihood for θ, the

Dominance condition holds, then

E [L(θ; U |V )|V ] ≤ E[L(θ0; U |V )|V ].

Eduardo Rossi c© - Macroeconometria 07 8

Page 9: Maximum Likelihood Estimation - unipveconomia.unipv.it/pagp/pagine_personali/erossi/macroeconometria_4... · Maximum Likelihood Estimation Eduardo Rossi University of Pavia. Likelihood

Proof

E

[log

(fW (U)

fU (U)

)]= E [h(Z)] ≤ h [E(Z)] ≤ log (1) = 0

Unconditional case:

E[L(θ0; U)] ≥ E[L(θ; U)]

The specification of p.f. of U determines expected values of functions

of U .

Therefore

Q(θ, θ0) ≡ E[L(θ; U)]

which depends on θ because the L does and depends on θ0 because

Q is the expected value of a function of U . The expected

loglikelihood inequality states that

Q(θ0, θ0) = maxθ∈Θ

Q(θ, θ0)

Eduardo Rossi c© - Macroeconometria 07 9

Page 10: Maximum Likelihood Estimation - unipveconomia.unipv.it/pagp/pagine_personali/erossi/macroeconometria_4... · Maximum Likelihood Estimation Eduardo Rossi University of Pavia. Likelihood

Normal linear regression model

yt|xt ∼ N(x′tβ0, σ

20)

E [L(θ, yt|xt)|xt] = − 1

2log (2πσ2) − E[(yt − x′

tβ)2|xt]

2σ2

= − 1

2log (2πσ2)+

− 1

2

E[(yt − x′tβ0 + x′

tβ0 − x′tβ)2|xt]

σ2

= − 1

2

[log (2πσ2) +

σ20 + (x′

tβ − x′tβ0)

2

σ2

]

which is uniquely maximized at x′tβ = x′

tβ0 and σ2 = σ20 .

Eduardo Rossi c© - Macroeconometria 07 10

Page 11: Maximum Likelihood Estimation - unipveconomia.unipv.it/pagp/pagine_personali/erossi/macroeconometria_4... · Maximum Likelihood Estimation Eduardo Rossi University of Pavia. Likelihood

Normal linear regression model

The conditional expectation of the conditional log-likelihood of the

entire sample is the sum of such terms

E [L(θ;y|X)|X] = −N

2log (2πσ2)− Nσ2

0 + (β − β0)′X′X(β − β0)

2σ2

which is uniquely maximized at β = β0, Xβ = Xβ0 and σ2 = σ20 if

X is full-column rank.

Eduardo Rossi c© - Macroeconometria 07 11

Page 12: Maximum Likelihood Estimation - unipveconomia.unipv.it/pagp/pagine_personali/erossi/macroeconometria_4... · Maximum Likelihood Estimation Eduardo Rossi University of Pavia. Likelihood

Student t Linear Regression

The expected log-likelihood is analytically intractable. We show that

E[L(θ; U |V )] exists, for ν0 > 2, because the concavity of the

logarithmic function

log (1 + z2) ≤ z2

E

[log

[1 +

(yt − x′tβ)2

νσ2

]∣∣∣∣xt

]≤ E

[(yt − x′

tβ)2

νσ2

∣∣∣∣xt

]

=ν0σ

20 + (x′

tβ0 − x′tβ)2

νσ2(ν0 − 2)

provided that E[xtx′t] exists, the expected log-lik exists.

Eduardo Rossi c© - Macroeconometria 07 12

Page 13: Maximum Likelihood Estimation - unipveconomia.unipv.it/pagp/pagine_personali/erossi/macroeconometria_4... · Maximum Likelihood Estimation Eduardo Rossi University of Pavia. Likelihood

Unconditional inequality

The expected log-likelihood inequality implies the unconditional

inequality

E[L(θ; U |V )] ≤ E[L(θ0; U |V )]

starting from

E[L(θ; U |V )|V ] ≤ E[L(θ0; U |V )|V ]

we can take the E[·] over V

E[L(θ; U |V )] = E [E[L(θ; U |V )|V ]]

≤ E[E[L(θ0; U |V )|V ]]

= E[L(θ0; U |V )]

Eduardo Rossi c© - Macroeconometria 07 13

Page 14: Maximum Likelihood Estimation - unipveconomia.unipv.it/pagp/pagine_personali/erossi/macroeconometria_4... · Maximum Likelihood Estimation Eduardo Rossi University of Pavia. Likelihood

The ML estimator

Because θ0 maximizes E[L(θ; U |V )] it is natural to construct an

estimator of θ0 from the value of θ that maximizes the sample: the

average log-likelihood functions of the N observations

1

N

t

L(θ; Ut|Vt) ≡ EN [L(θ; U |V )]

E[L(θ; U |V )] =

∫L(θ; u|v)dF (u|v; θ0)

ML estimator: the MLE is a value of the parameter vector that

maximizes the sample average log-lik function

θN ≡ arg maxθ∈Θ

EN [L(θ)]

Eduardo Rossi c© - Macroeconometria 07 14

Page 15: Maximum Likelihood Estimation - unipveconomia.unipv.it/pagp/pagine_personali/erossi/macroeconometria_4... · Maximum Likelihood Estimation Eduardo Rossi University of Pavia. Likelihood

Normal Linear Regression Model

The empirical expectation of the log-likelihood

EN [L(θ)] = −1

2log (2πσ2) − EN [(yt − x′

tβ)2]

2σ2

= −1

2log (2πσ2) − (y − Xβ)′(y − Xβ)/N

2σ2

The log-lik is differentiable. F.O.C’s:

EN [Lβ(θ)] =1

σ2EN [xt(yt − x′

tβ)]

=1

Nσ2[X′(y − Xβ)]

EN [Lσ2(θ)] = − 1

2σ4{σ2 − EN [(yt − x′

tβ)2]}

= − 1

2σ4

[σ2 − 1

N(y − Xβ)′(y − Xβ)

]

Eduardo Rossi c© - Macroeconometria 07 15

Page 16: Maximum Likelihood Estimation - unipveconomia.unipv.it/pagp/pagine_personali/erossi/macroeconometria_4... · Maximum Likelihood Estimation Eduardo Rossi University of Pavia. Likelihood

Normal Linear Regression Model

Solutions:

1

Nσ2[X′(y − Xβ)] = 0

β = (X′X)−1X′y

σ2 =1

N(y − Xβ)′(y − Xβ)

The Hessian matrix:

EN [Lθθ(θ)] =

− 1

σ2N X′X −X′(y−Xβ)σ4N

− (y−Xβ)′Xσ4N

12σ4 − 1

σ6N (y − Xβ)′(y − Xβ)

Eduardo Rossi c© - Macroeconometria 07 16

Page 17: Maximum Likelihood Estimation - unipveconomia.unipv.it/pagp/pagine_personali/erossi/macroeconometria_4... · Maximum Likelihood Estimation Eduardo Rossi University of Pavia. Likelihood

Normal Linear Regression Model

EN [Lθθ(θ)] =

− 1bσ2N X′X −X′(y−Xbβ)bσ4N

− (y−Xbβ)′Xbσ4N1

2bσ4 − 1bσ6N (y − Xβ)′(y − Xβ)

=

− 1bσ2N X′X 0

0′ 12bσ4 − 1bσ6N (y − Xβ)′(y − Xβ)

which is negative definite.

The second-order necessary condition for a point to be the local

maximum of a twice continuously differentiable function is that the

Hessian be negative semidefinite at the point.

Eduardo Rossi c© - Macroeconometria 07 17

Page 18: Maximum Likelihood Estimation - unipveconomia.unipv.it/pagp/pagine_personali/erossi/macroeconometria_4... · Maximum Likelihood Estimation Eduardo Rossi University of Pavia. Likelihood

Normal Linear Regression Model

The MLE of σ2 is

σ2 =ǫ′ǫ

N=

N − K

Ns2

Eduardo Rossi c© - Macroeconometria 07 18

Page 19: Maximum Likelihood Estimation - unipveconomia.unipv.it/pagp/pagine_personali/erossi/macroeconometria_4... · Maximum Likelihood Estimation Eduardo Rossi University of Pavia. Likelihood

Identification

Is the DGP sufficiently informative about the parameters of the

model? If

f(u|v; θ0) = f(u|v; θ1)

data drawn from these two distributions will have the same sampling

properties. There is no way to distinguish whether θ = θ0 or θ = θ1.

Eduardo Rossi c© - Macroeconometria 07 19

Page 20: Maximum Likelihood Estimation - unipveconomia.unipv.it/pagp/pagine_personali/erossi/macroeconometria_4... · Maximum Likelihood Estimation Eduardo Rossi University of Pavia. Likelihood

Global Identification

The parameter θ0 is globally identified in Θ if, for every θ1 ∈ Θ,

θ0 6= θ1, implies that

Pr{f(U |V ; θ0) 6= f(U |V ; θ1)} > 0

Assumption (Global identification): Every parameter vector θ0 ∈ Θ

is globally identified.

Lemma (Strict expected log-likelihood inequality): Under the

Distribution, Dominance and Global identification assumptions:

θ 6= θ0

implies

E[L(θ)] < E[L(θ0)].

Eduardo Rossi c© - Macroeconometria 07 20

Page 21: Maximum Likelihood Estimation - unipveconomia.unipv.it/pagp/pagine_personali/erossi/macroeconometria_4... · Maximum Likelihood Estimation Eduardo Rossi University of Pavia. Likelihood

Example

Exact multicollinearity among explanatory variables in a linear

regression E[y|X] = Xβ0 is a failure of global identification.

If rank(X) < K then

E[L(θ)] ≤ E[L(θ0)]

still holds. The normal log-likelihood still attains its maximum in β

at β0 because

−(β − β0)′X′X(β − β0) ≤ 0

but inequality is not strict for all β 6= β0.

If rank(X) = K then β0 is the unique maximum of E[L(θ)].

Eduardo Rossi c© - Macroeconometria 07 21

Page 22: Maximum Likelihood Estimation - unipveconomia.unipv.it/pagp/pagine_personali/erossi/macroeconometria_4... · Maximum Likelihood Estimation Eduardo Rossi University of Pavia. Likelihood

Example

Identification concerns E[L(θ)] and not the EN [L(θ)].

One can discover failures of identification in the sample log-likelihood.

But if a sample log-likelihood function fails to have a unique global

maximum this does not always imply a failure of global identification.

Eduardo Rossi c© - Macroeconometria 07 22

Page 23: Maximum Likelihood Estimation - unipveconomia.unipv.it/pagp/pagine_personali/erossi/macroeconometria_4... · Maximum Likelihood Estimation Eduardo Rossi University of Pavia. Likelihood

Example

Exact multicollinearity among explanatory variables in a LRM

E[y|X] = Xβ0

is a failure of global identification. Note that if

rank(X) < K

the expected log-likelihood inequality

E[L(θ)] ≤ E[L(θ0)]

still holds.

Eduardo Rossi c© - Macroeconometria 07 23

Page 24: Maximum Likelihood Estimation - unipveconomia.unipv.it/pagp/pagine_personali/erossi/macroeconometria_4... · Maximum Likelihood Estimation Eduardo Rossi University of Pavia. Likelihood

Differentiability

When the support of the distribution depends on the unknown

parameter values the MLE cannot be found with simple calculus.

In such cases the log-lik cannot be differentiable everywhere in the

parameter space.

Assumption (Differentiability): The p.f. f(u|v; θ) is twice

continuously differentiable in θ, ∀θ ∈ Θ. The S(θ) does not depend

on θ, and differentiation and integration are interchangeable in the

sense that

∂θ

S(θ)

dF (u|v; θ) =

S(θ)

∂θdF (u|v; θ)

∂2

∂θ2

S(θ)

dF (u|v; θ) =

S(θ)

∂2

∂θ2 dF (u|v; θ)

Eduardo Rossi c© - Macroeconometria 07 24

Page 25: Maximum Likelihood Estimation - unipveconomia.unipv.it/pagp/pagine_personali/erossi/macroeconometria_4... · Maximum Likelihood Estimation Eduardo Rossi University of Pavia. Likelihood

Differentiability

∂E[L(θ)|V = v]

∂θ= E

[∂L(θ)

∂θ

∣∣∣∣V = v

]

∂2E[L(θ)|V = v]

∂θ∂θ′ = E

[∂2L(θ)

∂θ∂θ′

∣∣∣∣V = v

]

The interchange of differentiation and integration is ensured in part

by S(θ) = S.

θ0 = arg maxθ∈Θ

E[L(θ)]

translates into the conditions

∂E[L(θ)]

∂θ

∣∣∣∣θ=θ0

= 0

and the second order conditions that the Hessian matrix

∂2E[L(θ)]

∂θ∂θ′

∣∣∣∣θ=θ0

is a n.d. matrix.

Eduardo Rossi c© - Macroeconometria 07 25

Page 26: Maximum Likelihood Estimation - unipveconomia.unipv.it/pagp/pagine_personali/erossi/macroeconometria_4... · Maximum Likelihood Estimation Eduardo Rossi University of Pavia. Likelihood

The score function

The MLE θ is an implicit function of the data u

θ = arg maxθ∈Θ

EN [L(θ)] ∈ arg zeroθ∈ΘEN [Lθ(θ)]

The F.O.C. Normal equations or likelihood equations

EN [Lθ(θ)] = 0

where the score function

Lθ ≡ ∂L(θ)

∂θ

θ must be calculated by numerical methods for maximizing

differentiable functions.

Eduardo Rossi c© - Macroeconometria 07 26

Page 27: Maximum Likelihood Estimation - unipveconomia.unipv.it/pagp/pagine_personali/erossi/macroeconometria_4... · Maximum Likelihood Estimation Eduardo Rossi University of Pavia. Likelihood

Score Identity

Lemma (Score identity): Under Distribution and Differentiability

assumptions

E[Lθ(θ0)|V = v] = 0

Proof : Continuous random variables case

1 =

S

dF (u|v; θ) =

S

f(u|v; θ)du

Eduardo Rossi c© - Macroeconometria 07 27

Page 28: Maximum Likelihood Estimation - unipveconomia.unipv.it/pagp/pagine_personali/erossi/macroeconometria_4... · Maximum Likelihood Estimation Eduardo Rossi University of Pavia. Likelihood

Score Identity

we can differentiate both sides of this equality w.r.t. θ

0 =

S

∂θf(u|v; θ)du

=

S

fθ(u|v; θ)du

=

S

1

f(u|v; θ)fθ(u|v; θ)f(u|v; θ)du

consider

Lθ(θ; U |V ) =1

f(u|v; θ)fθ(u|v; θ)

E[Lθ(θ; U |V )|V = v] =

S

1

f(u|v; θ)fθ(u|v; θ)f(u|v; θ0)du

Eduardo Rossi c© - Macroeconometria 07 28

Page 29: Maximum Likelihood Estimation - unipveconomia.unipv.it/pagp/pagine_personali/erossi/macroeconometria_4... · Maximum Likelihood Estimation Eduardo Rossi University of Pavia. Likelihood

Score Identity

The E[·|V = v] is evaluated at θ = θ0. For θ 6= θ0

E[Lθ(θ; U |V )|V = v] 6= 0

But if θ = θ0 then

E[Lθ(θ0; U |V )|V = v] =

S

1

f(u|v; θ0)fθ(u|v; θ0)f(u|v; θ0)du = 0.

Eduardo Rossi c© - Macroeconometria 07 29

Page 30: Maximum Likelihood Estimation - unipveconomia.unipv.it/pagp/pagine_personali/erossi/macroeconometria_4... · Maximum Likelihood Estimation Eduardo Rossi University of Pavia. Likelihood

Score Identity

In the Normal Linear Regression Model

E[Lβ(θ)] =1

σ2E[xtx

′t](β0 − β)

E[Lσ2(θ)] = − 1

2σ4

(σ2 −

{σ2

0 + E[(x′tβ0 − x′

tβ)2]})

θ0 = (β0, σ20)

E[Lβ(θ0)] =1

σ20

E[xtx′t](β0 − β0) = 0

E[Lσ2(θ0)] = − 1

2σ40

(σ2

0 −{σ2

0 + E[(x′tβ0 − x′

tβ0)2]

})= 0

Eduardo Rossi c© - Macroeconometria 07 30

Page 31: Maximum Likelihood Estimation - unipveconomia.unipv.it/pagp/pagine_personali/erossi/macroeconometria_4... · Maximum Likelihood Estimation Eduardo Rossi University of Pavia. Likelihood

The Information Matrix

If there exists θ such that

EN [Lθ(θN )] = 0

we must check that we have a global maximum. Otherwise our

solution cannot be the MLE (θN ). A sufficient condition for θN to

be a local maximum is that the Hessian matrix

EN [Lθθ(θN )] ≡ ∂2EN [L(θ)]

∂θ∂θ′

∣∣∣∣θ=eθN

evaluated at θN is negative definite: ∀c ∈ RK , c 6= 0

c′EN [Lθθ(θN )]c < 0

it guarantees that EN [L(θ)] is strictly concave in a neighborhood of

θ.

Eduardo Rossi c© - Macroeconometria 07 31

Page 32: Maximum Likelihood Estimation - unipveconomia.unipv.it/pagp/pagine_personali/erossi/macroeconometria_4... · Maximum Likelihood Estimation Eduardo Rossi University of Pavia. Likelihood

Information Matrix

We investigate the second-order conditions for E[Lθ(θ)].

Assumption (Finite Information): V ar[Lθ(θ0)] exists.

Lemma (Information Identity): Under Distribution, Differentiability,

Finite Information assumptions

E[Lθθ(θ0)|V = v] = −V ar[Lθ(θ0)|V = v]

and this matrix is negative semidefinite.

Eduardo Rossi c© - Macroeconometria 07 32

Page 33: Maximum Likelihood Estimation - unipveconomia.unipv.it/pagp/pagine_personali/erossi/macroeconometria_4... · Maximum Likelihood Estimation Eduardo Rossi University of Pavia. Likelihood

Information Matrix

Proof :

0 =

S

Lθ(θ; u|v)f(u|v; θ)du

Differentiating both sides

∂(Lθ(θ)f(θ))

∂θ′ =∂Lθ

∂θ′ f + Lθ

∂f

∂θ′

= Lθθf + Lθ(fθ)′

= (Lθθ + LθL′θ)f

f ≡ f(u|v; θ).

0 =

S

[Lθθ(θ; u|v) + Lθ(θ; u|v)Lθ(θ; u|v)′]dF (u|v; θ)

Eduardo Rossi c© - Macroeconometria 07 33

Page 34: Maximum Likelihood Estimation - unipveconomia.unipv.it/pagp/pagine_personali/erossi/macroeconometria_4... · Maximum Likelihood Estimation Eduardo Rossi University of Pavia. Likelihood

Information Matrix

S

Lθθ(θ; u|v)dF (u|v; θ) = −∫

S

[Lθ(θ; u|v)Lθ(θ; u|v)′]dF (u|v; θ)

Setting θ = θ0

E[Lθθ(θ0; U |V )|V = v] = −E[Lθ(θ0; U |V )Lθ(θ0; U |V )′|V = v]

= −V ar[Lθ(θ0; U |V )|V = v]

because E[Lθ(θ0; U |V )|V ] = 0. The Hessian is negative semidefinite

since is the negative of a variance matrix.

Eduardo Rossi c© - Macroeconometria 07 34

Page 35: Maximum Likelihood Estimation - unipveconomia.unipv.it/pagp/pagine_personali/erossi/macroeconometria_4... · Maximum Likelihood Estimation Eduardo Rossi University of Pavia. Likelihood

Conditional Information

The conditional variance matrix of the score vector Lθ(θ; U |V ) given

V = v and evaluated at θ0

I(θ0|v) ≡ E[Lθ(θ0)Lθ(θ0)′|V = v] = V ar[Lθ(θ0)|V = v]

we can always find the conditional information matrix function

I(θ|v) ≡∫

S

Lθ(θ; u|v)Lθ(θ; u|v)′dF (u|v; θ)

Eduardo Rossi c© - Macroeconometria 07 35

Page 36: Maximum Likelihood Estimation - unipveconomia.unipv.it/pagp/pagine_personali/erossi/macroeconometria_4... · Maximum Likelihood Estimation Eduardo Rossi University of Pavia. Likelihood

Population Information

The marginal expectation

I(θ0) ≡ E[Lθ(θ; U |V )Lθ(θ; U |V )′]

is the population information matrix.

The population information matrix is the unconditional variance

matrix of the conditional score vector because

E[Lθ(θ0; U |V )|V ] = 0

V ar[Lθ(θ0; U |V )] = E[V ar[Lθ(θ0; U |V )]] + V ar[E[Lθ(θ0; U |V )]|V ]

= E[I(θ0|V )] = I(θ0)

Eduardo Rossi c© - Macroeconometria 07 36

Page 37: Maximum Likelihood Estimation - unipveconomia.unipv.it/pagp/pagine_personali/erossi/macroeconometria_4... · Maximum Likelihood Estimation Eduardo Rossi University of Pavia. Likelihood

Normal linear regression model

The conditional information matrix for the normal linear regression

model:

I(θ0|xt) =

1σ2

0

xtx′t 0

0 12σ4

0

The Hessian of the conditional normal regression log-likelihood

function

Lθθ(θ; yt|xt) =

− 1

σ2 xtx′t − 1

σ4 xt(yt − x′tβ)

− 1σ4 (yt − x′

tβ)x′t

12σ4

0

− (yt − x′tβ)2/σ6

−E[Lθθ(θ0; yt|xt)|V ] = I(θ0|xt)

Eduardo Rossi c© - Macroeconometria 07 37

Page 38: Maximum Likelihood Estimation - unipveconomia.unipv.it/pagp/pagine_personali/erossi/macroeconometria_4... · Maximum Likelihood Estimation Eduardo Rossi University of Pavia. Likelihood

Nonsigular information

It is possible that information matrix can be singular even θ0 is

globally identifiable and the expected log-lik is uniquely maximized

at θ0.

The second order condition that the Hessian be negative

definite is sufficient but not necessary for a local maximum.

We assume this condition explicitly.

Assumption (Nonsingular Information) The information matrix

I(θ0) is nonsingular for all possible θ0 ∈ Θ.

Eduardo Rossi c© - Macroeconometria 07 38

Page 39: Maximum Likelihood Estimation - unipveconomia.unipv.it/pagp/pagine_personali/erossi/macroeconometria_4... · Maximum Likelihood Estimation Eduardo Rossi University of Pavia. Likelihood

The Cramer - Rao Lower Bound

Information matrix: measure of how much we can learn about θ0

from the random sample {(U1, V1), . . . , (UN , VN )}.Theorem: θ unbiased estimator of θ0, with finite variance matrix

with interchangeability between differentiation and integration

∂E[θ|v1, . . . , vN ]

∂θ0=

∂θ0

S

θ

N∏

t=1

dF (ut|vt; θ0)

=

S

θ∂

∂θ0

N∏

t=1

dF (ut|vt; θ0)

if Distribution, Differentiability, Finite Information Nonsingularity

assumptions also hold then that for any a ∈ RK

a′V ar[θ|v]a ≥ a′ (NE[I(θ0)|v])−1a.

Eduardo Rossi c© - Macroeconometria 07 39

Page 40: Maximum Likelihood Estimation - unipveconomia.unipv.it/pagp/pagine_personali/erossi/macroeconometria_4... · Maximum Likelihood Estimation Eduardo Rossi University of Pavia. Likelihood

The Cramer - Rao Lower Bound

Unbiased estimator:

E[θ|v] =

S

θ

N∏

t=1

dF (ut|vt; θ0)

differentiate w.r.t. θ0

IK =

S

θ

[N∑

t=1

Lθ(θ0; ut|vt)′

]N∏

t=1

dF (ut|vt; θ0)

= N

S

θEN [Lθ(θ0)]′

N∏

t=1

dF (ut|vt; θ0)

= NE[θEN [Lθ(θ0)]′|v]

= NCov[θ, EN [Lθ(θ0)]|v]

Eduardo Rossi c© - Macroeconometria 07 40

Page 41: Maximum Likelihood Estimation - unipveconomia.unipv.it/pagp/pagine_personali/erossi/macroeconometria_4... · Maximum Likelihood Estimation Eduardo Rossi University of Pavia. Likelihood

The Cramer - Rao Lower Bound

The covariance matrix of the vector (θ′, EN [Lθ(θ0)]

′)

Ψ = E

θ − θ0

EN [Lθ(θ0)]

((θ − θ0)

′ EN [Lθ(θ0)]′

)∣∣v

=

V ar[θ|v] N−1IK

N−1IK N−1EN [I(θ0|v)]

Ψ is a p.s.d covariance matrix. It follows that for each a ∈ RK

a′(Ψ)a ≥ 0

take

a′ =[a′,−a′EN [I(θ0|v)]−1

]

it follows that

a′V ar[θ|v]a ≥ a′N−1EN [I(θ0|v)]−1a = a′{NEN [I(θ0|v)]}−1a

Eduardo Rossi c© - Macroeconometria 07 41

Page 42: Maximum Likelihood Estimation - unipveconomia.unipv.it/pagp/pagine_personali/erossi/macroeconometria_4... · Maximum Likelihood Estimation Eduardo Rossi University of Pavia. Likelihood

The Cramer - Rao Lower Bound

In some cases we can find estimators with variances equal to the

Cramer-Rao lower bound.

The OLS estimator β is efficient relative to all unbiased estimators of

β0.

Proof : Using

I(θ0|xt) =

1σ2

0

xtx′t 0

0 12σ4

0

(N · EN [I(θ0|xt)])−1

=

1σ2

0

(X′X) 0

0 N2σ4

0

−1

=

σ2

0(X′X) 0

02σ4

0

N

because

V ar[β|X] = σ20(X′X)−1

The OLS/MLE estimator attains the Cramer-Rao lower bound.

Eduardo Rossi c© - Macroeconometria 07 42

Page 43: Maximum Likelihood Estimation - unipveconomia.unipv.it/pagp/pagine_personali/erossi/macroeconometria_4... · Maximum Likelihood Estimation Eduardo Rossi University of Pavia. Likelihood

MLE Asymptotics

The MLE is an implicit function of the random sample. MLE is not a

function of sample averages of the data.

But the sample log-likelihood is a sum of i.i.d. random variables.

Because the (Ut, Vt) ∼ i.i.d. so are any such transformations as the

L(θ) ≡ L(θ; Ut|Vt), t = 1, 2, . . . , N . The LLN can apply to the sample

average log-likelihood function itself

EN [L(θ)]p→ E[L(θ)]

for any fixed θ.

Eduardo Rossi c© - Macroeconometria 07 43

Page 44: Maximum Likelihood Estimation - unipveconomia.unipv.it/pagp/pagine_personali/erossi/macroeconometria_4... · Maximum Likelihood Estimation Eduardo Rossi University of Pavia. Likelihood

Consistency

Under the assumptions

1. Distribution

2. Dominance

3. Global Identification

4. Compactness of Θ

The MLE is consistent

θNp→ θ0

Eduardo Rossi c© - Macroeconometria 07 44

Page 45: Maximum Likelihood Estimation - unipveconomia.unipv.it/pagp/pagine_personali/erossi/macroeconometria_4... · Maximum Likelihood Estimation Eduardo Rossi University of Pavia. Likelihood

Consistency

• The sample average log-likelihood converges to the expected

log-likelihood for any value of θ:

EN [L(θ)]p→ E[L(θ)]

θN = arg maxθ∈Θ

EN [L(θ)] by construction

θ0 = arg maxθ∈Θ

E[L(θ)] by strict log-likelihood inequality

As a result, θNp→ θ0, provided that the relationships are

continuous.

Eduardo Rossi c© - Macroeconometria 07 45

Page 46: Maximum Likelihood Estimation - unipveconomia.unipv.it/pagp/pagine_personali/erossi/macroeconometria_4... · Maximum Likelihood Estimation Eduardo Rossi University of Pavia. Likelihood

Consistency

The argument of arg maxθ∈Θ is a function of θ, EN [L(θ)].

arg maxθ∈Θ must be a continuous function of its functional argument.

The distance between two functions over a set containing an infinite

number of possible comparisons at different values of θ: Uniform

Convergence in Probability: The sequence of real-valued

functions {gN (θ)} converges in probability to the limit function

{g0(θ)} if

supθ∈Θ

|gN (θ) − g0(θ)| p−→ 0

we say gN (θ)p−→ g0(θ) uniformly.

Eduardo Rossi c© - Macroeconometria 07 46

Page 47: Maximum Likelihood Estimation - unipveconomia.unipv.it/pagp/pagine_personali/erossi/macroeconometria_4... · Maximum Likelihood Estimation Eduardo Rossi University of Pavia. Likelihood

Consistency

We use the Uniform Convergence in Probability in order to define the

probability limit of a sequence of random functions.

Uniform LLN. g(θ, U) continuous function over θ ∈ Θ, where

Θ ⊂ RK is closed and bounded, {Ut} is a sequence of i.i.d. r.v. with

c.d.f. FU (u). If E[supθ∈Θ||g(θ; U)||] exists, then

1. E[g(θ; U)] is continuous over θ ∈ Θ

2. EN [g(θ; U)]p→ E[g(θ; U)]

Eduardo Rossi c© - Macroeconometria 07 47

Page 48: Maximum Likelihood Estimation - unipveconomia.unipv.it/pagp/pagine_personali/erossi/macroeconometria_4... · Maximum Likelihood Estimation Eduardo Rossi University of Pavia. Likelihood

Consistency

We apply the uniform LLN to the sample average log-likelihood.

Consistency of Maxima. If there is a sequence of functions QN (θ)

that converges in probability uniformly to a function Q0(θ) on the

closed and bounded Θ and if Q0(θ) is continuous and uniquely

maximized at θ0, then

θN = arg maxθ∈Θ

QN (θ)p→ θ0

Compactness and differentiability guarantee that EN [L(θ)] has a

maximum.

Eduardo Rossi c© - Macroeconometria 07 48

Page 49: Maximum Likelihood Estimation - unipveconomia.unipv.it/pagp/pagine_personali/erossi/macroeconometria_4... · Maximum Likelihood Estimation Eduardo Rossi University of Pavia. Likelihood

Consistency

Let

g(θ; U) ≡ L(θ; U |V )

the conditional likelihood function for θ evaluated at the r.v. (U, V ).

The conditions for uniform convergence are satisfied:

• Differentiability implies continuity of L(θ)

• Compactness of Θ.

• (Ut, Vt) are i.i.d. with c.d.f. FU |V (u|v; θ)

• Dominance states that E[supθ∈Θ |L(θ)|] exists

Then E[L(θ)] is continuous and

EN [L(θ)]p→ E[L(θ)]

uniformly.

Eduardo Rossi c© - Macroeconometria 07 49

Page 50: Maximum Likelihood Estimation - unipveconomia.unipv.it/pagp/pagine_personali/erossi/macroeconometria_4... · Maximum Likelihood Estimation Eduardo Rossi University of Pavia. Likelihood

Consistency

For the Consistency of Maxima

QN (θ) = EN [L(θ)] andQ0(θ) = E[L(θ)].

Under the assumptions:

• From Likelihood Identification: if ∀θ1 ∈ Θ, θ0 6= θ1 implies

Pr{L(θ0) 6= L(θ1)} > 0

• we have the Strict Expected Log-likelihood Inequality : θ 6= θ0

implies

E[L(θ)] < E[L(θ0)]

Hence E[L(θ)] is uniquely maximized at θ0. Therefore

θN = arg maxθ∈Θ

EN [L(θ)]p→ θ0 = arg max E[L(θ)]

Eduardo Rossi c© - Macroeconometria 07 50

Page 51: Maximum Likelihood Estimation - unipveconomia.unipv.it/pagp/pagine_personali/erossi/macroeconometria_4... · Maximum Likelihood Estimation Eduardo Rossi University of Pavia. Likelihood

Asymptotic Normality

Assumption: There is an open subset of Θ that contains the

population parameter value θ0.

θ0 is not on the boundary of Θ.

Assumption:

EN [Lθ(θN )] = 0

the MLE solves the normal equations.

First-order Taylor series expansion:

EN [Lθ(θN )] = 0 = EN [Lθ(θ0)] + EN [Lθθ(θN )](θN − θ0)

θN = αN θN + (1 − αN )θ0 αN ∈ [0, 1]

Eduardo Rossi c© - Macroeconometria 07 51

Page 52: Maximum Likelihood Estimation - unipveconomia.unipv.it/pagp/pagine_personali/erossi/macroeconometria_4... · Maximum Likelihood Estimation Eduardo Rossi University of Pavia. Likelihood

Asymptotic Normality

√N(θN − θ0) = {−EN [Lθθ(θN )]}−1

√NEN [Lθ(θ0)]

•√

NEN [Lθ(θ0)]d→ N(0, I(θ0)) (by CLT)

• EN [Lθθ(θN )]p→ −I(θ0) (by LLN)

then,√

N(θN − θ0)d→ N(0, I(θ0)

−1)

Eduardo Rossi c© - Macroeconometria 07 52