12. Principles of Parameter Estimationeng.uok.ac.ir/mfathi/Courses/Stochastic Process/Ch. 8...

12. Principles of Parameter Estimation

• consider the problem of estimating an unknown parameter of interest from a few of its noisy observations.

• Observations (measurement) are made on data that contain the desired parameter and undesired noise. Thus, for example, observations can be represented as

1

example, observations can be represented as

PILLAI

.,,2,1 , ninX ii L=+= θ

Here θ represents the unknown desired parameter, and represent random variables that may be dependent or independent from observation to observation.

nini ,,2,1 , L=

Given n observations the estimation problem is to obtain the “best” estimator for the unknown parameter θ in terms of these observations.

Let us denote by the estimator for θ . Obviously is a function of only the observations. “Best estimator” in what sense? Various optimization strategies can be used to define the term “best”.

, , , , 2211 nn xXxXxX === L

)(ˆ Xθ )(ˆ Xθ

2

the term “best”.

PILLAI

Ideal solution would be when the estimate coincides with the unknown θ . This of course may not be possible, and almost always any estimate will result in an error given by

(12-3).)(ˆ θθ −= Xe

)(ˆ Xθ

Nonlinear estimators

ˆ ( ) [ | ] ( | )MMSE X E X f X dθ θ θ θ θ= = ∫

3

∫

ˆ ( ) arg max ( | )ML X f Xθ

θ θ=

Nonlinear estimators

ˆ ( ) arg max ( | )X f Xθ θ=

4

ˆ ( ) arg max ( | )MAP X f Xθ

θ θ=

Some definitions• Unbiased estimator

– An estimator is unbiased if

Otherwise the estimator is biased estimator.

• Consistent estimator

ˆ[ ]E θ θ=

• Consistent estimator– An estimator is consistent if

– n: the number of observations

• Best estimator– An estimator is called the best estimator if it minimizes

the error

5

2ˆ 0 when .nθσ → → ∞

ˆ[( )]e E θ θ= −

Homework

• If random variables Xare independent • If random variables Xi are independent with common mean µ and variance ϭ2 , show that sample mean is an unbiased and consistent estimator of mean. Moreover, ̄v is an unbiased estimator of variance.

6

MMSE: an application

7

•ML assumes has something to do with the unknown parameter θ . More precisely, the joint p.d.f, depends on θ.

•ML assumes that the given sample data set is representative of the population and chooses that value for θ that most likely caused the observed data to occur.

nXXX , ,, 21 L

); , ,,( 21 θnX xxxf L

),; , ,,( 21 θnX xxxf L

Maximum Likelihood (ML)

8

for θ that most likely caused the observed data to occur.

•If are given, is a function of θalone, and the value of θ that maximizes the above p.d.f is the most likely value for θ , and it is chosen as the ML estimate for θ.

nxxx , ,, 21 L ); , ,,( 21 θnX xxxf L

)(ˆ XMLθ

)(ˆ XMLθ

); , ,,( 21 θnX xxxf L

θPILLAI

Given the joint p.d.frepresents the likelihood function, and the ML estimate can be determined either from the likelihood equation

or using the log-likelihood function (sup in (12-4) represents the supremumoperation)

, , , , 2211 nn xXxXxX === L ); , ,,( 21 θnX xxxf L

); , ,,( sup 21ˆ

θθ

nX xxxfML

L (12-4)

).; , ,,(log); , ,,( θθ xxxfxxxL LL = (12-5)∆

9

If is differentiable and a supremumexists in (12-5), then that must satisfy the equation

).; , ,,(log); , ,,( 2121 θθ nXn xxxfxxxL LL = (12-5)

); , ,,( 21 θnxxxL L

∆

MLθ̂

.0); , ,,(log

ˆ

21 =∂

∂

= ML

nX xxxf

θθθθL

(12-6)

PILLAI

Example 12.1: Let represent nobservations where θ is the unknown parameter of interest, and are zero mean independent normal r.vs with common variance Determine the ML estimate for θ. Solution: Since are independent r.vs and θ is an unknown constant, we have s are independent normal random variables. Thus the likelihood function takes the form

,1 , niwX ii →=+= θ

.2σ,1 , niwi →=

iX

iw

.);(); , ,,( ∏=n

xfxxxf θθ (12-7)

10

Moreover, each is Gaussian with mean θ and variance (Why?). Thus

Substituting (12-8) into (12-7) we get the likelihood function to be

.);(); , ,,(1

21 ∏=

=i

iXnX xfxxxfi

θθL (12-7)

.2

1);(

22 2/)(

2

σθ

πσθ −−= i

i

xiX exf

iX 2σ

(12-8)

PILLAI

.)2(

1);,,,( 1

22 2/)(

2/221

∑= =

−−n

iix

nnX exxxfσθ

πσθL (12-9)

It is easier to work with the log-likelihood function in this case. From (12-9)

and taking derivative with respect to θ as in (12-6), we get

)(); , ,,(ln =−=∂∑

n xxxxf θθL

(12-10)

(12-11)

,2

)()2ln(

2);,,,(ln);(

12

22

21 ∑=

−−==n

i

inX

xnxxxfXL

σθπσθθ L

);( θXL

11

or

Thus (12-12) represents the ML estimate for θ , which happens to be a linear estimator (linear function of the data) in this case.

,02

)(2

); , ,,(ln

ˆ12

ˆ

21 =−=∂

∂

===∑

MLML

n

i

inX xxxxf

θθθθ σθ

θθL

.1

)(ˆ1∑

=

=n

iiML X

nXθ

(12-11)

(12-12)

PILLAI

Notice that the estimator is a r.v. Taking its expected value, we get

i.e., the expected value of the estimator does not differ from the desired parameter, and hence there is no bias between the two. Such estimators are known as unbiased estimators. Thus (12-12) represents an unbiased estimator for θ .

,)(1

)](ˆ[1

θθ == ∑=

n

iiML XE

nxE (12-13)

12

Moreover the variance of the estimator is given by

The later terms are zeros since and are independent r.vs.

.))(()(1

1])ˆ[()ˆ(

1 ,11

22

2

12

2

−−+−=

−=−=

∑ ∑∑

∑

= ≠==

=

n

i

n

jijji

n

ii

n

iiMLML

XXEXEn

XEn

EVar

θθθ

θθθθ

iX jX

PILLAI

Then

Thus

another desired property. We say such estimators (that satisfy (12-15)) are consistent estimators.

(12-14).)(1

)ˆ(2

2

2

12 nn

nXVar

nVar

n

iiML

σσθ === ∑=

, as 0)ˆ( ∞→→ nVar MLθ (12-15)

13

Next two examples show that ML estimator can be highly nonlinear.

Example 12.2: Let be independent, identically distributed uniform random variables in the interval with common p.d.f

nXXX , ,, 21 L

),0( θ

,0 ,1

);( θθ

θ <<= iiX xxfi

PILLAI

(12-16)

where θ is an unknown parameter. Find the ML estimate for θ. Solution: The likelihood function in this case is given by

From (12-17), the likelihood function in this case is maximized by the minimum value of θ , and since

we get

.) , ,,max(0 ,1

1 ,0 ,1

); , ,,(

21

2211

θθ

θθ

θ

≤≤=

→=≤<====

nn

innnX

xxx

nixxXxXxXf

L

L

(12-17)

), , ,,max( XXX L≥θ

14

we get

to be the ML estimate for θ . Notice that (18) represents a nonlinear function of the observations. To determine whether (12-18) represents an unbiased estimate for θ , we need to evaluate its mean. To accomplish that in this case, it is easier to determine its p.d.f and proceed directly. Let

), , ,,max( 21 nXXX L≥θ

) , ,,max()(ˆ21 nML XXXX L=θ (12-18)

PILLAI

(12-19)) , ,,max( 21 nXXXZ L=

with as in (12-16). Then

so that

iX

,0 ,)()(

),,,(]) , ,,[max()(

11

2121

θθ

<<

==≤=

≤≤≤=≤=

∏∏==

zz

zFzXP

zXzXzXPzXXXPzFnn

iX

n

ii

nnZ

i

LL

(12-20)

,0 ,)(

1

<<=

−

θθ

znz

zf n

n

Z(12-21)

15

Using (12-21), we get

In this case and hence the ML estimator is not an unbiased estimator for θ . However, from (12-22) as

.otherwise,0

)(

= θzf nZ

. )/11(1

)( )()](ˆ[1

0

0 nn

ndzz

ndzzfzZEXE

n

nn

nZML +=

+====

+

∫∫θ

θθ

θθ

θθ(12-22)

,)](ˆ[ θθ ≠XE ML

∞→n

PILLAI

i.e., the ML estimator is an asymptotically unbiased estimator. From (12-21), we also get

so that

(12-23),)/11(

lim)](ˆ[lim θθθ =+

=∞→∞→ n

XEn

MLn

2)()(

2

0

1

0

22

+=== ∫∫

+

n

ndzz

ndzzfzZE n

nZ

θθ

θθ (12-24)

ˆ2222

22 =−=−= nnn θθθθ

16

Once again as implying that the estimator in (12-18) is a consistent estimator.

Example 12.3: Let be i.i.d Gamma random variables with unknown parameters α and β . Determine the ML estimator for α and β .

.)2()1()1(2

)]([)()](ˆ[22

22

++=

+−

+=−=

nn

n

n

n

n

nZEZEXVar ML

θθθθ

0)](ˆ[ →XVar MLθ ,∞→n

nXXX , ,, 21 L

PILLAI

(12-25)

Solution: Here and

This gives the log-likelihood function to be

Differentiating L with respect to α and β we get

,0≥ix

. ))((

),; , ,,(1

121

1∏=

−− ∑

Γ= =

n

i

x

in

n

nX

n

ii

exxxxfβ

αα

αββαL (12-26)

.log)1()(loglog

),; , ,,(log),; , ,,(

11

2121

∑∑==

−

−+Γ−=

=n

ii

n

ii

nXn

xxnn

xxxfxxxL

βααβα

βαβα LL

(12-27)

17

Differentiating L with respect to α and β we get

Thus from (12-29)

,0log)()(

logˆ,ˆ,1

=+Γ ′Γ

−=∂∂

==∑

βαβα

αα

βα

n

iix

nn

L (12-28)

.0ˆ,ˆ,1

=−=∂∂

==∑

βαβαβα

β

n

iix

nL(12-29)

,1

ˆ)(ˆ

1∑

=

=n

ii

MLML

xn

Xαβ (12-30)

PILLAI

and substituting (12-30) into (12-28), it gives

Notice that (12-31) is highly nonlinear in

In general the (log)-likelihood function can have more than one solution, or no solutions at all. Further, the (log)-likelihood function may not be even differentiable, or it can

.11

log)ˆ(

)ˆ(ˆlog

11∑∑

==

−

=ΓΓ′

−n

ii

n

ii

ML

MLML x

nx

nααα

.ˆ MLα

(12-31)

18

likelihood function may not be even differentiable, or it can be extremely complicated to solve explicitly (see example 12.3, equation (12-31)).

PILLAI

Cramer - Rao Bound: Variance of any unbiased estimator based on observations for θ is lower bounded by

This important result states that the right side of (12-32) acts

θ̂

nn xXxXxX === , ,, 2211 L

.

);,,,(ln

1

);,,,(ln

1)ˆ(

221

22

21

∂∂

−=

∂

∂≥

θθ

θθ

θnXnX xxxf

ExxxfE

VarLL

(12-32)

19

This important result states that the right side of (12-32) acts as a lower bound on the variance of all unbiased estimator for θ.

PILLAI

Naturally any unbiased estimator whose variance coincideswith that in (12-32), must be the best. There are no bettersolutions! Such estimates are known as efficient estimators.Let us examine whether (12-12) represents an efficient estimator. Towards this using (12-11)

and

;)(1

);,,,(ln2

14

2

21

−=

∂∂

∑=

n

ii

nX Xxxxf θ

σθθL (12-33)

20

and

and substituting this into the first form on the right side of(12-32), we obtain the Cramer - Rao lower bound for thisproblem to be

,1

)])([(])[(1

);,,,(ln

21

24

1 ,11

24

2

21

σσ

σ

θθθσθ

θ

n

XXEXExxxf

E

n

i

n

i

n

jijji

n

ii

nX

==

−−+−=

∂∂

∑

∑ ∑∑

=

= ≠==

L

(12-34)

PILLAI

But from (12-14) the variance of the ML estimator in (12-12)is the same as (12-35), implying that (12-12) indeed representsan efficient estimator in this case, the best of all possibilities!

It is possible that in certain cases there are no unbiased estimators that are efficient. In that case, the best estimator

.2

n

σ(12-35)

21

PILLAI

estimators that are efficient. In that case, the best estimator will be an unbiased estimator with the lowest possible Variance.

What if the parameter of interest is a r.v witha-priori p.d.f How does one obtain a good estimatefor θ based on the observations

One technique is to use the observations to compute itsa-posteriori probability density function Of course, we can use the Bayes’ theorem in (11.22) to

?)(θθf

? , ,, 2211 nn xXxXxX === L

). , ,,|( 21| nX xxxf Lθθ

MAP estimator

22

Of course, we can use the Bayes’ theorem in (11.22) toobtain this a-posteriori p.d.f. This gives

Notice that (12-36) is only a function of θ , since represent given observations. Once again, we can look for

.) , ,,(

)()| , ,,(), ,,|(

21

21|21|

nX

nXnX xxxf

fxxxfxxxf

L

LL

θθθ θθ

θ = (12-36)

nxxx , ,, 21 L

PILLAI

the most probable value of θ suggested by the abovea-posteriori p.d.f. Naturally, the most likely value for θ isthat corresponding to the maximum of the a-posteriori p.d.f(see Fig. 12.2). This estimator - maximum of the a-posteriorip.d.f is known as the MAP estimator for θ .

23

PILLAI

MAPθ̂

) , ,,|( 21 nxxxf Lθ

θ

Fig. 12.2

12. Principles of Parameter Estimationeng.uok.ac.ir/mfathi/Courses/Stochastic Process/Ch. 8...

Documents

Transcript of 12. Principles of Parameter Estimationeng.uok.ac.ir/mfathi/Courses/Stochastic Process/Ch. 8...