15. Minimum Variance Unbiased Estimation - ECE 830, Spring 2014
Transcript of 15. Minimum Variance Unbiased Estimation - ECE 830, Spring 2014
15. Minimum Variance Unbiased EstimationECE 830, Spring 2014
1 / 28
Bias-Variance Trade-Off
Recall thatMSE(θ) = Bias2(θ) + Var(θ).
In general, the minimum MSE estimator has non-zero bias andnon-zero variance.
We can reduce bias only at a potential increase in variance.
Conversely, modifying the estimator to reduce the variance maylead to an increase in bias.
2 / 28
Example:
Let
xn = A+ wn
wn ∼ N(0, σ2
)A =
α
N
N∑n=1
xn
where α is an arbitrary constant. If
SN ≡1
N
N∑n=1
xn,
then
A =
αSN
SN ∼
N(A,
σ2
N
).
3 / 28
Example: (cont.)
Let’s find the value of α that minimizes the MSE.
Var(A)
= Var (αSN ) =
α2
Var (SN ) =
α2σ2
N
Bias(A)
= E[A]−A = E [
α
SN ]−A =
αA
−A =
(α− 1)A
Thus the MSE is
MSE(A)
=
(α− 1)2A2 +α2σ2
N
4 / 28
Aside: alternatively, we could have computed the MSE as follows
E[xixj ] =
{A2 + σ2 , i = jA2 , i 6= j
MSE(A)
= E[(A−A
)2]= E
[A2]− 2E
[A]A+A2
= α2E
1
N2
N∑i,j=1
xixj
− 2αE
[1
N
N∑n=1
xn
]A+A2
= α2 1
N2
N∑i,j=1
E[xixj ]− 2α1
N
N∑n=1
E[xn]A+A2
= α2
(A2 +
σ2
N
)− 2αA2 +A2
=α2σ2
N︸ ︷︷ ︸Var(A)
+ (α− 1)2A2︸ ︷︷ ︸
Bias2(A)
5 / 28
So how practical is the MSE as a design criterion?
In the previous example, the MSE is minimized when
dMSE(A)
dα=
2ασ2
N+ 2 (α− 1)A2 = 0
⇒ α∗ =
A2
A2 + σ2/N
The optimal (in an MSE sense) value α∗ depends on the unknownparameter A! Therefore, the estimator is not realizable. Thisphenomenon occurs for many classes of problems.
We need an alternative to direct MSE minimization.
6 / 28
Note that in the above example, the problematic dependence onthe parameter (A) enters through the Bias component of the MSE.This occurs in many situations. Thus a reasonable alternative is to
constrain the estimator to be unbiased, and then find the estimatorthat produces the minimum variance (and hence provides the
minimum MSE among all unbiased estimators).
Note: Sometimes no unbiased estimator exists and we cannotproceed at all in this direction.
Definition: Minimum Variance Unbiased Estimator
θ is a minimum variance unbiased estimator (MVUE) for θ if
1. Eθ = θ ∀ θ ∈ Θ
2. If Eθ0 = θ ∀ θ ∈ Θ, then Var(θ)≤ Var
(θ0
)∀ θ ∈ Θ.
7 / 28
Existence of the Minimum Variance UnbiasedEstimator (MVUE)
Does an MVUE estimator exist? Suppose there exist threeunbiased estimators:
θ1, θ2, θ3
Two possibilities exist.
θ3 is MVUE no MVUE exists!
8 / 28
Example:
Suppose we observe a single scalar realization x of
X ∼ Unif (0, 1/θ) , θ > 0.
An unbiased estimator of θ does not exist. To see this, note that
p (x|θ) = θ · I[0,1/θ] (x) .
If θ is unbiased, then
∀θ > 0, θ = E[θ]
=
∫ 1/θ
0θ (x) θdx
=⇒
∫ 1/θ
0θ (x) dx =
1
∀ θ > 0
=⇒
θ (1/θ) =
0
∀ θ > 0 (FTC)
But if this is true for all θ, then we have θ (x) = 0, which is not anunbiased estimator.
9 / 28
Finding the MVUE Estimator
There is no simple, general procedure for finding the MVUEestimator. In the next several lectures we will discuss severalapproaches:
1. Find a sufficient statistic and apply the Rao-Blackwell theorem
2. Determine the so-called Cramer-Rao Lower Bound (CRLB)and verify that the estimator achieves it.
3. Further restrict the estimator to a class of estimators (e.g.,linear or polynomial functions of the data)
10 / 28
Recipe for finding a MVUE
(1) Find a complete sufficient statistic t = T (X).
(2) Find any unbiased estimator θ0 and set
θ(X) := E[θ0(X)|t = T (X)]
or find a function g such that
θ(X) = g(T (X))
is unbiased.
These notes answer the following questions:
1. What is a sufficient statistic?
2. What is a complete sufficient statistic?
3. What does step (2) do above?
4. Is this estimator unique?
5. How do we know it’s the MVUE?11 / 28
Definition: Sufficient statistic
Let X be an N -dimensional random vector and let θ denote ap-dimensional parameter of the distribution of X. The statistict := T (X) is a sufficient statistic for θ if and only if the conditionaldistribution of X given T (X) is independent of θ.
See lecture 4 for more information on Sufficient Statistics and howto find them.
12 / 28
Minimal and Complete Sufficient Statistics
Definition: Minimal Sufficient Statistic
A sufficient statistic t is said to be minimal if the dimension of tcannot be reduced and still be sufficient.
Definition: Complete sufficient statistic
A sufficient statistic t := T (X) is complete if for all real-valuedfunctions φ which satisfy
(E[φ(t)|θ] = 0∀θ)
we have(P[φ(t) = 0|θ] = 1∀θ)
Under very general conditions, if t is a complete sufficient statistic,then t is minimal.
13 / 28
Example: Bernoulli trials
Consider N independent Bernoulli trials
xiiid∼ Bernoulli(θ), θ ∈ [0, 1].
Recall k =∑N
n=1 xi is sufficient for θ. Now suppose E[φ(k)|θ] = 0for all θ. But
E[φ(k)|θ] =
N∑k=0
φ(k)
(N
k
)θk(1− θ)N−k
=
poly(θ)
where poly(θ) is an N th degree polynomial. Then
poly(θ) = 0∀θ ∈ [0, 1]
=⇒ poly(θ) is the zero polynomial
=⇒ φ(k)
= 0∀k
=⇒
k is complete
14 / 28
Rao-Blackwell Theorem
Rao-Blackwell Theorem
Let Y ,Z be random variables and define the function
g(z) := E[Y |Z = z].
ThenE[g(Z)] = E[Y ]
andVar(g(Z)) ≤ Var(Y )
with equality iff Y = g(Z) almost surely.
Note that this version of Rao-Blackwell is quite general and hasnothing to do with estimation of parameters. However, we canapply it to parameter estimation as follows.
15 / 28
Consider X ∼ p(x|θ). Let θ1 be an unbiased estimator of θ and lett = T (x) be a sufficient statistic for θ. Apply Rao-Blackwell with
Y := θ1(x)
Z := t = T (x).
Consider the new estimator
θ2(x) = g(T (x)) = E[θ1(X)|T (X) = t].
Then we may conclude:
1. θ2 is unbiased
2. Var(θ2) ≤ Var(θ1)
In words, if θ1 is any unbiased estimator, then smoothing θ1 withrespect to a sufficient statistic decreases the variance whilepreserving unbiasedness.
Therefore, we can restrict our search for the MVUE to functions ofa sufficient statistic.
16 / 28
The Rao-Blackwell Theorem
Rao-Blackwell Theorem, special case
Let X be a random variable with pdf p(X|θ) and let t(X) be asufficient statistic. Let θ1(x) be an estimator of θ and define
θ2(t) := E[θ1(X)|t(X)
].
ThenE[θ2(T )] = E[θ1(X)]
andVar(θ2(T )) ≤ Var(θ1(X))
with equality iff θ1(X) ≡ θ2(t(X)) with probability one (almostsurely).
17 / 28
Rao-Blackwell Theorem in Action
Suppose we observe 2 independent realizations from a N (µ, σ2)distribution. Denote these observations x1 and x2, withX = [x1, x2]
T . Consider the simple estimator of µ:
µ =x1
E[µ] =
µ; it’s unbiased
Var [µ] =
Var [x1] = σ2
The MSE is therefore:
E[(µ− µ)2
]= σ2
Intuitively, we expect that the sample mean should be a betterestimator since
µ =1
2(x1 + x2)
averages the two observations together.
18 / 28
Is this the best possible estimator?
Let’s find a sufficient statistic for µ:
p(x1, x2) =1
2πσ2e−(x1−µ)
2/2σ2e−(x2−µ)
2/2σ2
=
1
2πσ2e−
12σ2
((x1−µ)2+(x2−µ)2)
=
1
2πσ2e−
12σ2
(x21+x22)︸ ︷︷ ︸
a(X)
e−1
2σ2(−2(x1+x2)µ+2µ2)︸ ︷︷ ︸
bµ(t)
t = x1 + x2
So by Fisher-Neymen, t is a sufficient statistic for µ.
19 / 28
The Rao-Blackwell Theorem states that:
µ∗ = E[µ|t]
is as good as or better than µ in terms of estimator variance. (SeeScharf p94.) What is µ∗? First we need to compute the mean ofthe conditional density p(µ|t) or p(x1|t)
p(x1|t) =p(x1, t)
p(t)
p(x1, t) =
1
2πσ2e
−(x1−µ)2
2σ2 e−(t−x1−µ)
2
2σ2
p(t) =
1√2π(2σ2)
e−(t−2µ)2/4σ2
E(t) =
2µ
Var(t) =
2σ2
20 / 28
p(x1|t) =1
2πσ2
1√4πσ2
exp
[−1
2σ2
((x1 − µ)2 + (t− x1 − µ)2 − (t− 2µ)2/2
)]=
1√πσ2
exp
[ −12σ2
(x21 − 2µx1 + µ2 + t2 − 2x1t+ x21 − 2µt+
+2µx1 + µ2 − t2/2 + 4µt/2− 4µ2/2) ]
=1√πσ2
exp
[−1
2σ2
(2x21 − 2x1t+ t2/2
)]=
1√πσ2
exp
[−(x1 − t/2)2
σ2
]⇒ x1|t ∼
N (t/2, σ2/2)
µ∗ =E[µ|t] =
t/2 =1
2(x1 + x2)
Var(µ∗) =
σ2/2
⇒ MSE(µ∗) =
σ2/2
21 / 28
The Lehmann-Scheffe Theorem
The Rao-Blackwell Theorem tells us how to decrease the varianceof an unbiased estimator. But when can we know that we get aMVUE?Answer: When t is a complete sufficient statistic.
Lehmann-Scheffe Theorem
If t is complete, there is at most one unbiased estimator that is afunction of t.
22 / 28
ProofSuppose
E[θ1] = E[θ2] = θ
θ1(X) := g1(T (X))
θ2(X) := g2(T (X)).
Defineφ(t) := g1(t)− g2(t).
Then
E[φ(t)] =
E[g1(t)− g2(t)] = E[θ1(X)− θ2(X)] = 0.
By definition of completeness, we have
P(φ(t) = 0) = 1∀θ.
In other wordsθ1 = θ2 with probability 1.
23 / 28
Recipe for finding a MVUE
This result suggests the following method for finding a MVUE:
(1) Find a complete sufficient statistic t = T (X).
(2) Find any unbiased estimator θ0 and set
θ(X) := E[θ0(X)|t = T (X)]
or find a function g such that
θ(X) = g(T (X))
is unbiased.
24 / 28
Rao-Blackwell and Complete Suff. Stats.
Theorem
If θ is constructed by the recipe above, then θ is the unique MVUE.
Proof: Note that in either construction, θ is a function of t. Letθ1 be any unbiased estimator. We must show that
Var(θ) ≤ Var(θ1).
Defineθ2(X) := E[θ1(X)|t = T (X)].
By Rao-Blackwell, it suffices to show
Var(θ) ≤ Var(θ2).
25 / 28
Proof (cont.)
But θ and θ2 are both unbiased and functions of a completesufficient statistic
=⇒ θ = θ2 with probability 1 byLehmann-Scheffe.
To show uniqueness, in the above argument supposeVar(θ1) = Var(θ). Then the Rao-Blackwell bound holds withequality
=⇒ θ1 = θ2 with prob 1
=⇒ θ1 = θ with prob 1
because θ2 = θ with probability 1.
26 / 28
Example: Uniform distribution.
Suppose X = [x1 · · ·xN ]T where
xiiid∼ Unif[0, θ], i = 1, . . . , N.
What is an unbiased estimator of θ?
θ1 =2
N
N∑i=1
xi
is unbiased. However, it is not MVUE.
27 / 28
Example: (cont.)
From the Fisher-Neyman factorization theorem,
p(X|θ) =
N∏i=1
1
θI[0,θ](xi)
=1
θNI[maxi xi,∞)(θ)︸ ︷︷ ︸
bθ(t)
· I(−∞,mini xi](0)︸ ︷︷ ︸a(X)
we see thatT = max
ixi
is a sufficient statistic. It is left as an exercize to show that T is infact complete. Since θ1 is not a function of T , it is not MVUE.However,
θ2(X) = E[θ1(X)|t = T (X)]
is the MVUE.
28 / 28