9 Maximum Likelihood Estimation - Stanford...

3

Click here to load reader

Transcript of 9 Maximum Likelihood Estimation - Stanford...

Page 1: 9 Maximum Likelihood Estimation - Stanford Universitystatweb.stanford.edu/~susan/courses/s200/lectures/lect11.pdfIf the distribution is discrete, fwill be the frequency distribution

8.4 Lecture 11 Friday 02/09/01

Homework and Labs. see the logictics sectionPlease hand in your labs to Johan by next Monday.

9 Maximum Likelihood Estimation

X1, X2, X3, . . . Xn have joint density denoted

fθ(x1, x2, . . . , xn) = f(x1, x2, . . . , xn|θ)

Given observed values X1 = x1, X2 = x2, . . . , Xn = xn, the likelihood of θ is the function

lik(θ) = f(x1, x2, . . . , xn|θ)

considered as a function of θ.If the distribution is discrete, f will be the frequency distribution function.

In words: lik(θ)=probability of observing the given data as a function of θ.

Definition:The maximum likelihood estimate (mle) of θ is that value of θ that maximises lik(θ): it isthe value that makes the observed data the “most probable”.

If the Xi are iid, then the likelihood simplifies to

lik(θ) =n∏i=1

f(xi|θ)

Rather than maximising this product which can be quite tedious, we often use the factthat the logarithm is an increasing function so it will be equivalent to maximise the loglikelihood:

l(θ) =n∑i=1

log(f(xi|θ))

9.0.1 Poisson Example

P (X = x) =λxe−λ

x!For X1, X2, . . . , Xn iid Poisson random variables will have a joint frequency function that isa product of the marginal frequency functions, the log likelihood will thus be:

l(λ) =∑ni=1(Xilogλ− λ− logXi!)

= logλ∑ni=1 Xi − nλ−

∑ni=1 logXi!

We need to find the maximum by finding the derivative:

l′(λ) =1

λ

n∑i=1

xi − n = 0

1

Page 2: 9 Maximum Likelihood Estimation - Stanford Universitystatweb.stanford.edu/~susan/courses/s200/lectures/lect11.pdfIf the distribution is discrete, fwill be the frequency distribution

which implies that the estimate should be

λ̂ = X̄

(as long as we check that the function l is actually concave, which it is).The mle agrees with the method of moments in this case, so does its sampling distribu-

tion.

9.0.2 Normal Example

If X1, X2, . . . , Xn are iid N (µ, σ2) random variables their density is written:

f(x1, . . . , xn|µ, σ) =n∏i

1

σ√

2πexp (−1

2[xi − µσ

]2)

Regarded as a function of the two parameters, µ and σ this is the likelihood:

`(µ, σ) = −nlogσ − n

2log2π − 1

2σ2

n∑i=1

(xi − µ)2

∂`

∂µ=

1

σ2

n∑i=1

(xi − µ)

∂`

∂σ= −n

σ+ σ−3∑n

i=1(xi − µ)2

so setting these to zero gives X̄ as the mle for µ, and σ̂2 as the usual.

9.0.3 Gamma Example

f(x|α, λ) =1

Γ(α)λαxα−1e−λx

giving the log-likelihood:

l(x|α, λ) =n∑i=1

[αlogλ+ (α− 1)logxi − λxi − logΓ(α)]

One ends up with a nonlinear equation in α̂ this cannot be solved in closed form, thereare basically two methods and they are called root-finding methods, they are based on thecalculus theorem that says that when a function is continuous, and changes signs on aninterval, it is zero on that interval.

For this particular problem there already coded in matlab a mle method called gamfit,that also provides a confidence interval.

For general optimization, the function in Matlab is fmin for one variable, and fmins

you could also look at how to use optimize in Splus.

2

Page 3: 9 Maximum Likelihood Estimation - Stanford Universitystatweb.stanford.edu/~susan/courses/s200/lectures/lect11.pdfIf the distribution is discrete, fwill be the frequency distribution

9.1 Maximum Likelihood of Multinomial Cell Probabilities

X1, X2, . . . , Xm are counts in cells/ boxes 1 up to m, each box has a different probability(think of the boxes being bigger or smaller) and we fix the number of balls that fall tobe n:x1 + x2 + · · · + xm = n. The probability of each box is pi, with also a constraint:p1 + p2 + · · · + pm = 1, this is a case in which the X ′is are NOT independent, the jointprobability of a vector x1, x2, . . . , xm is called the multinomial ,and has the form:

f(x1, x2, . . . , xm|p1, . . . , pm) =n!∏xi!

∏pxii =

(n

x1, x2, xm

)px1

1 px11 · · · pxmm

Each box taken separately against all the other boxes is a binomial, this is an extensionthereof. (look at page 72)

We study the log-likelihood of this :

l(p1, p2, ..., pm) = logn!−m∑i=1

logxi! +m∑i=1

xilogpi

However we can’t just go ahead and maximise this we have to take the constraint into accountso we have to use the Lagrange multipliers again.

We use

L(p1, p2, ..., pm, λ) = l(p1, p2, ..., pm) + λ(1−m∑i

pi)

By posing all the derivatives to be 0, we get the most natural estimate

p̂i =xin

Maximising log likelihood, with and without constraints, can be an unsolvable problemin closed form, then we have to use iterative procedures.

I explained about how the parametris bootstrap was often the only way to study thesampling distribution of the mle.

3