15. Minimum Variance Unbiased Estimation - ECE 830, Spring 2014

15. Minimum Variance Unbiased EstimationECE 830, Spring 2014

1 / 28

Bias-Variance Trade-Off

Recall thatMSE(θ) = Bias2(θ) + Var(θ).

In general, the minimum MSE estimator has non-zero bias andnon-zero variance.

We can reduce bias only at a potential increase in variance.

Conversely, modifying the estimator to reduce the variance maylead to an increase in bias.

2 / 28

Example:

Let

xn = A+ wn

wn ∼ N(0, σ2

)A =

α

N

N∑n=1

xn

where α is an arbitrary constant. If

SN ≡1

N

N∑n=1

xn,

then

A =

αSN

SN ∼

N(A,

σ2

N

).

3 / 28

Example: (cont.)

Let’s find the value of α that minimizes the MSE.

Var(A)

= Var (αSN ) =

α2

Var (SN ) =

α2σ2

N

Bias(A)

= E[A]−A = E [

α

SN ]−A =

αA

−A =

(α− 1)A

Thus the MSE is

MSE(A)

=

(α− 1)2A2 +α2σ2

N

4 / 28

Aside: alternatively, we could have computed the MSE as follows

E[xixj ] =

{A2 + σ2 , i = jA2 , i 6= j

MSE(A)

= E[(A−A

)2]= E

[A2]− 2E

[A]A+A2

= α2E

1

N2

N∑i,j=1

xixj

− 2αE

[1

N

N∑n=1

xn

]A+A2

= α2 1

N2

N∑i,j=1

E[xixj ]− 2α1

N

N∑n=1

E[xn]A+A2

= α2

(A2 +

σ2

N

)− 2αA2 +A2

=α2σ2

N︸︷︷︸Var(A)

+ (α− 1)2A2︸︷︷︸

Bias2(A)

5 / 28

So how practical is the MSE as a design criterion?

In the previous example, the MSE is minimized when

dMSE(A)

dα=

2ασ2

N+ 2 (α− 1)A2 = 0

⇒ α∗ =

A2

A2 + σ2/N

The optimal (in an MSE sense) value α∗ depends on the unknownparameter A! Therefore, the estimator is not realizable. Thisphenomenon occurs for many classes of problems.

We need an alternative to direct MSE minimization.

6 / 28

Note that in the above example, the problematic dependence onthe parameter (A) enters through the Bias component of the MSE.This occurs in many situations. Thus a reasonable alternative is to

constrain the estimator to be unbiased, and then find the estimatorthat produces the minimum variance (and hence provides the

minimum MSE among all unbiased estimators).

Note: Sometimes no unbiased estimator exists and we cannotproceed at all in this direction.

Definition: Minimum Variance Unbiased Estimator

θ is a minimum variance unbiased estimator (MVUE) for θ if

1. Eθ = θ ∀ θ ∈ Θ

2. If Eθ0 = θ ∀ θ ∈ Θ, then Var(θ)≤ Var

(θ0

)∀ θ ∈ Θ.

7 / 28

Existence of the Minimum Variance UnbiasedEstimator (MVUE)

Does an MVUE estimator exist? Suppose there exist threeunbiased estimators:

θ1, θ2, θ3

Two possibilities exist.

θ3 is MVUE no MVUE exists!

8 / 28

Example:

Suppose we observe a single scalar realization x of

X ∼ Unif (0, 1/θ) , θ > 0.

An unbiased estimator of θ does not exist. To see this, note that

p (x|θ) = θ · I[0,1/θ] (x) .

If θ is unbiased, then

∀θ > 0, θ = E[θ]

=

∫ 1/θ

0θ (x) θdx

=⇒

∫ 1/θ

0θ (x) dx =

1

∀ θ > 0

=⇒

θ (1/θ) =

0

∀ θ > 0 (FTC)

But if this is true for all θ, then we have θ (x) = 0, which is not anunbiased estimator.

9 / 28

Finding the MVUE Estimator

There is no simple, general procedure for finding the MVUEestimator. In the next several lectures we will discuss severalapproaches:

1. Find a sufficient statistic and apply the Rao-Blackwell theorem

2. Determine the so-called Cramer-Rao Lower Bound (CRLB)and verify that the estimator achieves it.

3. Further restrict the estimator to a class of estimators (e.g.,linear or polynomial functions of the data)

10 / 28

Recipe for finding a MVUE

(1) Find a complete sufficient statistic t = T (X).

(2) Find any unbiased estimator θ0 and set

θ(X) := E[θ0(X)|t = T (X)]

or find a function g such that

θ(X) = g(T (X))

is unbiased.

These notes answer the following questions:

1. What is a sufficient statistic?

2. What is a complete sufficient statistic?

3. What does step (2) do above?

4. Is this estimator unique?

5. How do we know it’s the MVUE?11 / 28

Definition: Sufficient statistic

Let X be an N -dimensional random vector and let θ denote ap-dimensional parameter of the distribution of X. The statistict := T (X) is a sufficient statistic for θ if and only if the conditionaldistribution of X given T (X) is independent of θ.

See lecture 4 for more information on Sufficient Statistics and howto find them.

12 / 28

Minimal and Complete Sufficient Statistics

Definition: Minimal Sufficient Statistic

A sufficient statistic t is said to be minimal if the dimension of tcannot be reduced and still be sufficient.

Definition: Complete sufficient statistic

A sufficient statistic t := T (X) is complete if for all real-valuedfunctions φ which satisfy

(E[φ(t)|θ] = 0∀θ)

we have(P[φ(t) = 0|θ] = 1∀θ)

Under very general conditions, if t is a complete sufficient statistic,then t is minimal.

13 / 28

Example: Bernoulli trials

Consider N independent Bernoulli trials

xiiid∼ Bernoulli(θ), θ ∈ [0, 1].

Recall k =∑N

n=1 xi is sufficient for θ. Now suppose E[φ(k)|θ] = 0for all θ. But

E[φ(k)|θ] =

N∑k=0

φ(k)

(N

k

)θk(1− θ)N−k

=

poly(θ)

where poly(θ) is an N th degree polynomial. Then

poly(θ) = 0∀θ ∈ [0, 1]

=⇒ poly(θ) is the zero polynomial

=⇒ φ(k)

= 0∀k

=⇒

k is complete

14 / 28

Rao-Blackwell Theorem

Rao-Blackwell Theorem

Let Y ,Z be random variables and define the function

g(z) := E[Y |Z = z].

ThenE[g(Z)] = E[Y ]

andVar(g(Z)) ≤ Var(Y )

with equality iff Y = g(Z) almost surely.

Note that this version of Rao-Blackwell is quite general and hasnothing to do with estimation of parameters. However, we canapply it to parameter estimation as follows.

15 / 28

Consider X ∼ p(x|θ). Let θ1 be an unbiased estimator of θ and lett = T (x) be a sufficient statistic for θ. Apply Rao-Blackwell with

Y := θ1(x)

Z := t = T (x).

Consider the new estimator

θ2(x) = g(T (x)) = E[θ1(X)|T (X) = t].

Then we may conclude:

1. θ2 is unbiased

2. Var(θ2) ≤ Var(θ1)

In words, if θ1 is any unbiased estimator, then smoothing θ1 withrespect to a sufficient statistic decreases the variance whilepreserving unbiasedness.

Therefore, we can restrict our search for the MVUE to functions ofa sufficient statistic.

16 / 28

The Rao-Blackwell Theorem

Rao-Blackwell Theorem, special case

Let X be a random variable with pdf p(X|θ) and let t(X) be asufficient statistic. Let θ1(x) be an estimator of θ and define

θ2(t) := E[θ1(X)|t(X)

].

ThenE[θ2(T )] = E[θ1(X)]

andVar(θ2(T )) ≤ Var(θ1(X))

with equality iff θ1(X) ≡ θ2(t(X)) with probability one (almostsurely).

17 / 28

Rao-Blackwell Theorem in Action

Suppose we observe 2 independent realizations from a N (µ, σ2)distribution. Denote these observations x1 and x2, withX = [x1, x2]

T . Consider the simple estimator of µ:

µ =x1

E[µ] =

µ; it’s unbiased

Var [µ] =

Var [x1] = σ2

The MSE is therefore:

E[(µ− µ)2

]= σ2

Intuitively, we expect that the sample mean should be a betterestimator since

µ =1

2(x1 + x2)

averages the two observations together.

18 / 28

Is this the best possible estimator?

Let’s find a sufficient statistic for µ:

p(x1, x2) =1

2πσ2e−(x1−µ)

2/2σ2e−(x2−µ)

2/2σ2

=

1

2πσ2e−

12σ2

((x1−µ)2+(x2−µ)2)

=

1

2πσ2e−

12σ2

(x21+x22)︸︷︷︸

a(X)

e−1

2σ2(−2(x1+x2)µ+2µ2)︸︷︷︸

bµ(t)

t = x1 + x2

So by Fisher-Neymen, t is a sufficient statistic for µ.

19 / 28

The Rao-Blackwell Theorem states that:

µ∗ = E[µ|t]

is as good as or better than µ in terms of estimator variance. (SeeScharf p94.) What is µ∗? First we need to compute the mean ofthe conditional density p(µ|t) or p(x1|t)

p(x1|t) =p(x1, t)

p(t)

p(x1, t) =

1

2πσ2e

−(x1−µ)2

2σ2 e−(t−x1−µ)

2

2σ2

p(t) =

1√2π(2σ2)

e−(t−2µ)2/4σ2

E(t) =

2µ

Var(t) =

2σ2

20 / 28

p(x1|t) =1

2πσ2

1√4πσ2

exp

[−1

2σ2

((x1 − µ)2 + (t− x1 − µ)2 − (t− 2µ)2/2

)]=

1√πσ2

exp

[ −12σ2

(x21 − 2µx1 + µ2 + t2 − 2x1t+ x21 − 2µt+

+2µx1 + µ2 − t2/2 + 4µt/2− 4µ2/2) ]

=1√πσ2

exp

[−1

2σ2

(2x21 − 2x1t+ t2/2

)]=

1√πσ2

exp

[−(x1 − t/2)2

σ2

]⇒ x1|t ∼

N (t/2, σ2/2)

µ∗ =E[µ|t] =

t/2 =1

2(x1 + x2)

Var(µ∗) =

σ2/2

⇒ MSE(µ∗) =

σ2/2

21 / 28

The Lehmann-Scheffe Theorem

The Rao-Blackwell Theorem tells us how to decrease the varianceof an unbiased estimator. But when can we know that we get aMVUE?Answer: When t is a complete sufficient statistic.

Lehmann-Scheffe Theorem

If t is complete, there is at most one unbiased estimator that is afunction of t.

22 / 28

ProofSuppose

E[θ1] = E[θ2] = θ

θ1(X) := g1(T (X))

θ2(X) := g2(T (X)).

Defineφ(t) := g1(t)− g2(t).

Then

E[φ(t)] =

E[g1(t)− g2(t)] = E[θ1(X)− θ2(X)] = 0.

By definition of completeness, we have

P(φ(t) = 0) = 1∀θ.

In other wordsθ1 = θ2 with probability 1.

23 / 28

Recipe for finding a MVUE

This result suggests the following method for finding a MVUE:

(1) Find a complete sufficient statistic t = T (X).

(2) Find any unbiased estimator θ0 and set

θ(X) := E[θ0(X)|t = T (X)]

or find a function g such that

θ(X) = g(T (X))

is unbiased.

24 / 28

Rao-Blackwell and Complete Suff. Stats.

Theorem

If θ is constructed by the recipe above, then θ is the unique MVUE.

Proof: Note that in either construction, θ is a function of t. Letθ1 be any unbiased estimator. We must show that

Var(θ) ≤ Var(θ1).

Defineθ2(X) := E[θ1(X)|t = T (X)].

By Rao-Blackwell, it suffices to show

Var(θ) ≤ Var(θ2).

25 / 28

Proof (cont.)

But θ and θ2 are both unbiased and functions of a completesufficient statistic

=⇒ θ = θ2 with probability 1 byLehmann-Scheffe.

To show uniqueness, in the above argument supposeVar(θ1) = Var(θ). Then the Rao-Blackwell bound holds withequality

=⇒ θ1 = θ2 with prob 1

=⇒ θ1 = θ with prob 1

because θ2 = θ with probability 1.

26 / 28

Example: Uniform distribution.

Suppose X = [x1 · · ·xN ]T where

xiiid∼ Unif[0, θ], i = 1, . . . , N.

What is an unbiased estimator of θ?

θ1 =2

N

N∑i=1

xi

is unbiased. However, it is not MVUE.

27 / 28

Example: (cont.)

From the Fisher-Neyman factorization theorem,

p(X|θ) =

N∏i=1

1

θI[0,θ](xi)

=1

θNI[maxi xi,∞)(θ)︸︷︷︸

bθ(t)

· I(−∞,mini xi](0)︸︷︷︸a(X)

we see thatT = max

ixi

is a sufficient statistic. It is left as an exercize to show that T is infact complete. Since θ1 is not a function of T , it is not MVUE.However,

θ2(X) = E[θ1(X)|t = T (X)]

is the MVUE.

28 / 28

15. Minimum Variance Unbiased Estimation - ECE 830, Spring 2014

Documents

Transcript of 15. Minimum Variance Unbiased Estimation - ECE 830, Spring 2014