Mathematical Statistics 2 - Seoul National University · Mathematical Statistics 2 Chapter 7: Su...
Transcript of Mathematical Statistics 2 - Seoul National University · Mathematical Statistics 2 Chapter 7: Su...
1/56
Mathematical Statistics, Fall 2015
Chapter 7: Sufficiency and Comparison
Byeong U. Park
Department of Statistics, Seoul National University
2/56
7.1 Optimality
7.2 Sufficiency and Completeness
7.3 Exponential Family
3/56
Comparison of Estimators
Let X1, . . . , Xn be a random sample from a population with pdf
f(·; θ), θ ∈ Θ ⊂ Rd. Below, we write X = (X1, . . . , Xn). How to compare
different estimators of a parameter of interest, say g(θ) ∈ R?
I Loss function: L(·, ·) : Θ×A → R+, where A ⊂ R is an action space for
the estimation of g(θ) such that L(θ, g(θ)) = 0. For example,
L(θ, a) = (a− g(θ))2, L(θ, a) = |a− g(θ)|.
I Risk function: For an estimator δ(X) of g(θ),
R(θ, δ) = EθL(θ, δ(X)).
In the case of the squared error loss, the risk R(θ, δ) is the mean squared
error of δ(X), i.e., R(θ, δ) = Eθ(δ(X)− g(θ))2.
4/56
Difficulty with Uniform Comparison
I One would prefer δ1 to δ2 if
R(θ, δ1) ≤ R(θ, δ2) for all θ ∈ Θ, and R(θ, δ1) < R(θ, δ2) for some θ ∈ Θ.
I The difficulty is that there exists no estimator that is best in this sense,
i.e., there exists no δ0 such that
R(θ, δ0) ≤ R(θ, δ) for all δ and for all θ ∈ Θ.
I Why? If it exists, then R(θ, δ0)θ∈Θ≡ 0, which leads to a contradiction.
5/56
Optimal Estimation
I Restricted class of estimators: One may find an estimator δ0 in the class
of unbiased estimators of g(θ) such that
Eθ(δ0(X)− g(θ))2 ≤ Eθ(δ(X)− g(θ))2 for all θ ∈ Θ
for any unbiased estimator δ(X). If exists, such an estimator is called
UMVUE (Uniformly Minimum Variance Unbiased Estimator).
I Global measures of performance: The estimator that minimizes the
maximum risk r(δ) = maxθ∈Θ R(θ, δ) is called a minimax estimator. The
estimator that minimizes the average risk
r(δ;π) =
∫Θ
R(θ, δ)π(θ) dµ(θ)
for a weight function π is called a Bayes estimator.
6/56
Example: Minimax Estimation
Let X1, . . . , Xn be a random sample from N(µ, σ2), θ = (µ, σ2) ∈ R× R+.
Consider the loss function L(θ, a) = (a/σ2 − 1)2 for the estimation of
g(θ) = σ2. Suppose we want to find an estimator that minimizes the maximum
risk among all estimators in the class{δc : δc(x) = c
∑ni=1(xi − x)2, c > 0
}.
I Let S2 =∑ni=1(Xi − X)2. Since S2/σ2 d
= χ2(n− 1), it follows that
R(θ, δc) = Eθ(cS2/σ2 − 1)2
= c2Var(χ2(n− 1)
)+(cE(χ2(n− 1))− 1
)2= c2 · 2(n− 1) +
(c(n− 1)− 1
)2θ≡ max
θR(θ, δc).
I Since arg minc>0 maxθ R(θ, δc) = 1/(n+ 1), we get
σ2 =∑ni=1(Xi − X)2/(n+ 1) as the minimax estimator among the class.
7/56
7.1 Optimality
7.2 Sufficiency and Completeness
7.3 Exponential Family
8/56
The Idea of Sufficiency
Suppose that we observe X1 and X2 that are i.i.d. Bernoulli(θ) random
variables, where 0 < θ < 1.
I The distribution of Y = X1 +X2:
Pθ(Y = 0) = (1− θ)2, Pθ(Y = 1) = 2θ(1− θ), Pθ(Y = 2) = θ2.
I When we observe Y , we may produce (X∗1 , X∗2 ), without knowledge of the
true θ, that has the same distribution of the original (X1, X2), as follows:
(i) Put (X∗1 , X∗2 ) = (0, 0) when Y = 0; (ii) put (X∗1 , X
∗2 ) = (1, 1) when
Y = 2; (iii) conduct a randomized experiment and put (X∗1 , X∗2 ) = (1, 0)
and (X∗1 , X∗2 ) = (0, 1), each with probability 1/2 when Y = 1.
I Thus, for any estimator δ(X1, X2) of g(θ) one may find an estimator that
depends only on Y rather than (X1, X2) but has the same risk as
δ(X1, X2).
9/56
Sufficient Statistic
Let X1, . . . , Xn be a random sample from a population with pdf
f(·; θ), θ ∈ Θ ⊂ Rd. We write X = (X1, . . . , Xn) below.
I Sufficient statistic for θ ∈ Θ: A statistic Y = u(X) is called a sufficient
statistic if the conditional distribution of X given Y does not depend on
θ ∈ Θ, i.e., if
Pθ1(X ∈ A|Y = y)θ1,θ2∈Θ≡ Pθ2(X ∈ A|Y = y)
for all A and for all y.
I This definition is more general than those in the textbook.
I Read Remark 7.2.1 on p.391.
10/56
Factorization Theorem
A statistic Y = u(X) is a sufficient statistic for θ ∈ Θ if and only if there exist
functions f1 and f2 such that
n∏i=1
f(xi; θ) = f1(u(x), θ) · f2(x) for all x and for all θ ∈ Θ.
Proof: We give a proof for the case of discrete Xi. For the necessity part,
n∏i=1
f(xi; θ) = Pθ(X = x, Y = u(x)
)= P
(X = x|Y = u(x)
)· Pθ
(Y = u(x)
).
The first component on the RHS of the second equation does not depend on
θ ∈ Θ. For the sufficiency part,
Pθ(X = x|Y = y) =Pθ(X = x, u(X) = y)
Pθ(u(X) = y)=Pθ(X = x)I(u(x) = y)∑
z:u(z)=y Pθ(X = z)
=f2(x)I(u(x) = y)∑
z:u(z)=y f2(z).
11/56
Sufficient Statistic: Examples
I Bernoulli(θ), θ ∈ (0, 1):
n∏i=1
f(xi; θ) =n∏i=1
θxi(1− θ)1−xi = θ∑xi(1− θ)n−
∑xi ,
so that Y =∑ni=1 Xi is a sufficient statistic for θ ∈ (0, 1)
I Gamma(2, θ), θ > 0:n∏i=1
f(xi; θ) =
n∏i=1
θ−2 · xi · e−xi/θ · I(0,∞)(xi)
= θ−2n exp
(−
n∑i=1
xi/θ
)·
(n∏i=1
xi
)I(0,∞)(minxi),
so that Y =∑ni=1 Xi is a sufficient statistic for θ > 0.
12/56
Sufficient Statistic: Examples
I Gamma(α, β), θ = (α, β) ∈ R+ × R+:n∏i=1
f(xi; θ) =n∏i=1
Γ(α)−1β−2 · xα−1i · e−xi/β · I(0,∞)(xi)
= Γ(α)−nβ−2n
(n∏i=1
xi
)α−1
exp
(−
n∑i=1
xi/β
)×I(0,∞)(minxi),
so that Y = (∏ni=1 Xi,
∑ni=1 Xi) is a sufficient statistic for θ ∈ R+ × R+.
I Uniform(θ1 − θ2, θ1 + θ2), θ = (θ1, θ2) ∈ R× R+: Assume n ≥ 2.
n∏i=1
f(xi; θ) = (2θ2)−nI(θ1−θ2,∞)(x(1))I(−∞,θ1+θ2)(x(n)),
so that Y = (X(1), X(n)) is a sufficient statistic for θ ∈ R× R+.
I Exp(θ, 1), θ ∈ R: Y = X(1) is a sufficient statistic for θ ∈ R.
13/56
Minimal Sufficient Statistic
I There are many sufficient statistics for a given model. As an extreme
example, X = (X1, . . . , Xn) itself is also a sufficient statistic. As another
example, consider the case where X1, X2, X3 are i.i.d. Bernoulli(θ)
random variables and θ ∈ (0, 1). For latter model, sufficient statistics
include (i) (X1, X2, X3); (ii) (X1 +X2, X3); (iii) (X1 +X3, X2); (iv)
(X1, X2 +X3); (v) X1 +X2 +X3.
I Minimal sufficient statistic: A sufficient statistic is called a minimal
sufficient statistic (MSS) if it is a function of any sufficient statistic.
14/56
Properties of Sufficient Statistic
I Any statistic that is a 1-1 function of a sufficient statistic for θ ∈ Θ is also
a sufficient statistic for θ ∈ Θ.
Proof: Let Y be a sufficient statistic for θ ∈ Θ and let W = g(Y ) for a
known 1-1 function g. Then, the event (W = w) equals (Y = g−1(w)) for
any w, so that for any A, w and θ ∈ Θ it holds that
Pθ(X ∈ A|W = w) = P (X ∈ A|Y = g−1(w)),
the latter not depending on θ ∈ Θ.
I Any statistic that is a 1-1 function of MSS is also an MSS.
I MLE and SS: The unique MLE of θ, when it exists, is a function of any
sufficient statistic for θ ∈ Θ.
15/56
Properties of Sufficient Statistic
Proof: Let u(X) be a sufficient statistic for θ ∈ Θ. Then, for the unique
MLE θ,
θ = arg maxθ∈Θ
n∏i=1
f(xi; θ)
= arg maxθ∈Θ
f1
(θ, u(x)
)f2(x)
= arg maxθ∈Θ
f1
(θ, u(x)
)= ( function of u(x)).
I Existence of MSS: If the MLE of θ ∈ Θ is unique and it is a sufficient
statistic for θ ∈ Θ, then it is a minimal sufficient statistic for θ ∈ Θ.
Proof: Immediate from the second property.
I Suppose that there exists θ0 ∈ Θ such that supp(f(·; θ)) ⊂ supp(f(·; θ0))
for all θ ∈ Θ. Then, the statistic T (X), as a (random) function defined on
Θ in such a way that T (X)(θ) =∏ni=1
(f(Xi, θ)
/f(Xi; θ0)
), is an MSS.
16/56
MSS: An Example
Let X1, . . . , Xn (n ≥ 2) be a random sample from Uniform(θ1 − θ2, θ1 + θ2),
θ = (θ1, θ2) ∈ R× R+. For this model, Y = (X(1), X(n)) is an MSS.
Proof: The sufficiency was established before. Let the model is reparametrized
by η = (η1, η2), where η1 = θ1 − θ2 and η2 = θ1 + θ2. Then,
n∏i=1
f(xi; η) = (η2 − η1)−nI(−∞,x(1)](η1)I[x(n),∞)(η2).
Clearly, (η1, η2) = (X(1), X(n)) is the unique MLE, so that (θ1, θ2) defined by
θ1 = (X(1) +X(n))/2, θ2 = (X(n) −X(1))/2
is the unique MLE of (θ1, θ2) =((η1 + η2)/2, (η2 − η1)/2
). Since (θ1, θ2) is a
1-1 function of the SS (X(1), X(n)), it is also an SS and thus an MSS. This
establishes that (X(1), X(n)) is an MSS since it is a 1-1 function of (θ1, θ2).
17/56
Rao-Blackwell Theorem
I Rao-Blackwell theorem: Let X1, . . . , Xn be a random sample from a
population with pdf f(·; θ), θ ∈ Θ ⊂ Rd. Let Y = u(X) be a sufficient
statistic. Then, for any estimator of η(X) of η = g(θ) with finite second
moment,
η∗(Y ) = E(η(X)|Y )
is a statistic with the properties that (i) Eθ(η∗) = Eθ(η); (ii)
varθ(η∗) ≤ varθ(η); (iii) MSEθ(η
∗) ≤ MSEθ(η).
I The Rao-Blackwell theorem tells that one may have a better estimator by
conditioning on a sufficient statistic, in terms of MSE. By the theorem, if
η is an unbiased estimator of η, then η∗ is also an unbiased estimator but
with a smaller variance.
I Proof of the theorem: var(W ) = var(E(W |V )) + E(var(W |V )).
18/56
Rao-Blackwell theorem
space of random variables
space of functions of V
W
E(W )
space of constants
E(W |V )
19/56
Example: Rao-Blackwellization
Let X1, . . . , Xn (n ≥ 2) be a random sample from U [0, θ], θ > 0. Take
θ = 2X as an unbiased estimator of θ. We know that X(n) is a sufficient
statistic for θ > 0. By Rao-Blackwell theorem, θ∗ ≡ E(2X|X(n)) is an UE of θ
with variance less than or equal to that of θ. For 1 ≤ r ≤ n− 1, we note that
pdfX(r)|X(n)(x|y) =
(n− 1)!
(r − 1)!(n− r − 1)!
(x
y
)r−1(1− x
y
)n−r−11
yI(0,y)(x)
and that, recalling the pdf of Beta(r + 1, n− r + 1),∫ y
0
x · (n− 1)!
(r − 1)!(n− r − 1)!
(x
y
)r−1(1− x
y
)n−r−11
ydx
=ry
n
∫ 1
0
n!
r!(n− r − 1)!tr(1− t)n−r−1 dt
= (r/n)y.
20/56
Example: Rao-Blackwellization
Thus, we get
2E(X|X(n) = y) = 2n−1
(y +
n−1∑r=1
E(X(r)|X(n) = y)
)
= 2n−1
(y +
n−1∑r=1
r
n· y
)
=n+ 1
ny.
Indeed, θ∗ = (n+ 1)X(n)/n and
varθ(θ∗) =
(n+ 1
n
)2
· varθ(X(n))
=1
n(n+ 2)θ2
<1
3nθ2 = varθ(θ).
21/56
Uniformly Minimum Variance Unbiased Estimator
We have seen that taking conditional expectation, on a sufficient statistic, of a
given unbiased estimator always improves the estimator in terms of variance.
I UMVUE: An estimator η of η = g(θ) is called the uniformly minimum
variance unbiased estimator if it itself is unbiased and varθ(η) ≤ varθ(η)
for all θ ∈ Θ and for any unbiased estimator η of η.
I Uniqueness of UMVUE: If there exists an unbiased estimator with finite
variance, then UMVUE is unique: Let η1 and η2 be UMVUE of η. Since
varθ(η1) = varθ(η2) ≤ varθ((η1 + η2)/2) <∞ for all θ ∈ Θ, we get
varθ(η1 − η2) ≤ 0 (and thus = 0) for all θ ∈ Θ.
This implies Pθ(η1 − η2 = c) = 1 for all θ ∈ Θ for some constant c. That
constant equals zero since both η1 and η2 are unbiased.
22/56
Complete Statistic
Let X1, . . . , Xn be a random sample from a population with pdf f(·; θ),
θ ∈ Θ ⊂ Rd. The following notion of completeness facilitates the derivation of
UMVUE.
I Complete statistic for θ ∈ Θ: A statistic Y = u(X) is called a complete
statistic for θ ∈ Θ and {pdfY (·; θ) : θ ∈ Θ} is called a complete family of
distributions if
Eθ ϕ(Y ) = 0 for all θ ∈ Θ implies Pθ(ϕ(Y ) = 0) = 1 for all θ ∈ Θ.
I A complete statistic Y is “complete” in the sense that any non-constant
function of Y has a non-constant expected value (as a function of θ).
I Complete sufficient statistic: A statistic is called a complete sufficient
statistic (CSS) for θ ∈ Θ if it is sufficient and complete for θ ∈ Θ.
23/56
Rao-Blackwell-Lehmann-Scheffe Theorem
Let X1, . . . , Xn be a random sample from a population with pdf f(·; θ),
θ ∈ Θ ⊂ Rd. Let Y = u(X) be a CSS for θ ∈ Θ. Assume that there exists an
unbiased estimator with finite variance. Then, (i) for any unbiased estimator η0
of η = g(θ) with finite variance, η = E(η0|Y ) is the UMVUE; (ii) any function
of Y , say ϕ(Y ), is the UMVUE if it is unbiased.
Proof: (i) Clearly, η is an UE. For an arbitrary unbiased estimator η with finite
variance,
varθ(η) ≥ varθ(E(η|Y ))θ≡ varθ(η),
the inequality holding due to the Rao-Blackwell theorem and the equality
holding due to the completeness of Y [Pθ(η = E(η|Y ))θ≡ 1].
(ii) Replace η by ϕ(Y ) in the proof of (i).
Remark: UE that is a function of a CSS is unique and it is the UMVUE.
24/56
Methods of Finding UMVUE
I Method 1: Rao-Blackwellization with a CSS.
(1) Find a CSS Y = u(X).
(2) Find an ‘easy’ UE η0.
(3) Compute the conditional expectation E(η0|Y ).
I Method 2: Trial and error.
(1) Find a CSS Y = u(X).
(2) Solve Eθ ϕ(Y )θ≡ g(θ) with respect to ϕ,
or try some ϕ(Y ) and check unbiasedness.
25/56
CSS and UMVUE: Bernoulli Model
Let X1, . . . , Xn be a random sample from Bernoulli(θ), θ ∈ [0, 1].
I Finding a CSS: By Factorization theorem, Y =∑ni=1 Xi is an SS. To
check whether it is also complete for θ ∈ [0, 1], suppose that Eθ ϕ(Y ) = 0
for all θ ∈ [0, 1] for a function ϕ. Then, since Yd= Binomial(n, θ), we have
n∑y=0
ϕ(y)
(n
y
)θy(1− θ)n−y θ≡ 0
⇒n∑y=0
ϕ(y)
(n
y
)λy
λ>0≡ 0
⇒ ϕ(y)
(n
y
)y≡ 0 (uniqueness of polynomial coefficients)
⇒ ϕ(y)y≡ 0.
I Estimation of the mean: Since Eθ(Y/n)θ≡ θ, we conclude that η = X is
the UMVUE of θ.
26/56
CSS and UMVUE: Bernoulli Model
I Estimation of the variance (Method 1): For the estimation of
η = θ(1− θ), consider η0 = X1(1−X2) as an UE of η. Then, we get
E(η0|Y = y) = P(X1 = 1, X2 = 0
∣∣∣ n∑i=1
Xi = y)
=
(n− 2
y − 1
)/(ny
)= y(n− y)/n(n− 1),
so that η = nX(1− X)/(n− 1) is the UMVUE.
I Estimation of the variance (Method 2): Try the MLE of η:
ηMLE = X(1− X). We get
Eθ(X(1− X)) = EθX −[var(X) +
(Eθ(X)
)2]= (n− 1)θ(1− θ)/n.
Thus, η = nX(1− X)/(n− 1) is an UE that is a function of CSS,
concluding that it is the UMVUE.
27/56
CSS and UMVUE: Poisson Model
Let X1, . . . , Xn be a random sample from Poisson(θ), θ > 0.
I Finding a CSS: By Factorization theorem, Y =∑ni=1 Xi is an SS. To
check whether it is also complete for θ > 0, suppose that Eθ ϕ(Y ) = 0 for
all θ > 0 for a function ϕ. Then, since Yd= Poisson(nθ), we have
∞∑y=0
ϕ(y)(nθ)ye−nθ
y!
θ>0≡ 0
⇒∞∑y=0
ϕ(y)
y!λy
λ>0≡ 0
⇒ ϕ(y)
y!
y≡ 0 (uniqueness of power series coefficients)
⇒ ϕ(y)y≡ 0.
I Estimation of the mean: Since Eθ(Y/n)θ≡ θ, we conclude that η = X is
the UMVUE of θ.
28/56
CSS and UMVUE: Poisson Model
I Estimation of η = e−2θ: We solve Eθ ϕ(Y )θ≡ e−2θ with respect to ϕ.
Note that∞∑y=0
ϕ(y)(nθ)ye−nθ
y!
θ≡ e−2θ
⇔∞∑y=0
ϕ(y)
(n
n− 2
)y((n− 2)θ)y
y!
θ≡ e(n−2)θ
⇔∞∑y=0
ϕ(y)
(n
n− 2
)y((n− 2)θ)y
y!
θ≡∞∑y=0
((n− 2)θ)y
y!.
This gives ϕ(y) = (1− 2/n)y, so that the UMVUE of η is given by
η =
(n− 2
n
)nX.
I Remark: As n ↑ ∞, the UMVUE η is approximated by the MLE e−2X , so
one may expect that the UMVUE would behave nicely for large n.
29/56
CSS and UMVUE: Exponential Model
Let X1, . . . , Xn be a random sample from Exponential(θ), θ > 0 with pdf
f(x1; θ) = θ−1e−x1/θ. By Factorization theorem, Y =∑ni=1 Xi is an SS. To
check whether it is also complete for θ > 0, suppose that Eθ ϕ(Y ) = 0 for all
θ > 0 for a function ϕ. Then, since Yd= Gamma(n, θ), we have∫ ∞
0
ϕ(y)yn−1
Γ(n)θne−y/θ dy
θ>0≡ 0
⇒∫ ∞
0
ϕ(y) yn−1e−λy dyλ>0≡ 0
⇒ ϕ(y) yn−1 y>0≡ 0 (uniqueness of Laplace transform)
⇒ ϕ(y)y>0≡ 0.
Since Eθ(Y/n)θ≡ θ, we conclude that η = X is the UMVUE of θ.
30/56
CSS and UMVUE: Uniform[0, θ] Model
Let X1, . . . , Xn be a random sample from Uniform[0, θ], θ > 0. By
Factorization theorem, it can be verified that Y = X(n) is an SS. To check
whether it is also complete for θ > 0, suppose that Eθ ϕ(Y ) = 0 for all θ > 0
for a function ϕ. Then, since the pdf of Y is given by
pdfY (y; θ) = nθ−nyn−1I[0,θ](y), we have∫ θ
0
ϕ(y)nθ−nyn−1 dyθ>0≡ 0
⇒∫ θ
0
ϕ(y) yn−1 dyθ>0≡ 0
⇒ ϕ(θ) θn−1 θ>0≡ 0 (Fundamental Theorem of Calculus)
⇒ ϕ(y)y>0≡ 0.
Since Eθ(Y )θ≡ nθ/(n+ 1), we conclude that η = (n+ 1)X(n)/n is the
UMVUE of θ.
31/56
CSS and UMVUE: Uniform[−θ, θ] Model
Let X1, . . . , Xn (n ≥ 2) be a random sample from Uniform[−θ, θ], θ > 0.
I By Factorization theorem, it can be verified that (X(1), X(n)) is an SS.
But, it is not a complete statistic for θ > 0. One may find a non-trivial
function of (X(1), X(n)) such that Eθ ϕ(X(1), X(n)) equals identically zero.
For example, ϕ(X(1), X(n)) = X(n)/X(1) − an, where an = E(U(n)/U(1))
for a random sample (Ui : 1 ≤ i ≤ n) from Uniform[−1, 1].
I In fact, for this model, Y = max1≤i≤n |Xi| is a CSS and
η = (n+ 1) max1≤i≤n |Xi|/n is the UMVUE of θ. To see this, note that
the joint density of X at x equals (2θ)−nI[0,θ](max1≤i≤n |xi|), which tells
that Y is an SS. Since |Xi| are i.i.d. Uniform[0, θ], one may then show
that Y is also complete and thus η is the UMVUE.
32/56
Ancillary Statistic
Let X1, . . . , Xn be a random sample from a population with pdf f(·; θ),
θ ∈ Θ ⊂ Rd.
I Ancillary statistic: A statistic Z = v(X) is called an ancillary statistic for
θ ∈ Θ if Pθ(Z ∈ A) does not depend on θ ∈ Θ for all A, i.e.,
Pθ1(Z ∈ A)θ1,θ2∈Θ≡ Pθ2(Z ∈ A)
for all A.
I Basu’s Theorem (Independence of CSS and AS): If Y = u(X) is a CSS
and Z = v(X) is an AS for θ ∈ Θ, then Y and Z are independent under
Pθ for all θ ∈ Θ, i.e.,
Pθ(Y ∈ A,Z ∈ B)A,B≡ Pθ(Y ∈ A)P (Z ∈ B) for all θ ∈ Θ.
33/56
Proof of Basu’s Theorem
Let A and B be arbitrary sets. Since Y is a sufficient statistic for θ ∈ Θ,
ϕ(y) ≡ P (Z ∈ B|Y = y) does not depend on θ ∈ Θ for all y. Also, we note
that Eθ(ϕ(Y )) = P (Z ∈ B), which does not depend on θ either because of the
ancillarity of Z. Thus,
Eθ (ϕ(Y )− P (Z ∈ B))θ≡ 0.
Due to the completeness of Y , this entails ϕ(y)y≡ P (Z ∈ B), so that
Pθ(Y ∈ A,Z ∈ B) =
∫A
P (Z ∈ B|Y = y) · pdfY (y; θ) dµ(y)
= P (Z ∈ B) · Pθ(Y ∈ A)
for all θ ∈ Θ.
34/56
Ancillary Statistic: Examples
I N(θ, 1), θ ∈ R:
(X1 − X, . . . , Xn − X)d≡ (Z1 − Z, . . . , Zn − Z)
for Zi being i.i.d. from N(0, 1).
I Exp(θ, 1), θ ∈ R:
(X1 −X(1), . . . , Xn −X(n))d≡ (Z1 − Z(1), . . . , Zn − Z(n))
for Zi being i.i.d. from Exp(0, 1).
I Gamma(α, β), β > 0 with α known:(X1∑n+1i=1 Xi
, . . . ,Xn∑n+1i=1 Xi
)d≡
(Z1∑n+1i=1 Zi
, . . . ,Zn∑n+1i=1 Zi
)
for Zi being i.i.d. from Gamma(α, 1).
35/56
Ancillary Statistic: Examples
I Uniform(0, θ), θ > 0: X(n)/X(1)d≡ Z(n)/Z(1) for Zi being i.i.d. from
Uniform(0, 1).
I N(µ, σ2), (µ, σ2) ∈ R× R+:(X1 − XSX
, . . . ,Xn − XSX
)d≡(Z1 − ZSZ
, . . . ,Zn − ZSZ
)for Zi being i.i.d. from N(0, 1), where S2
X =∑ni=1(Xi − X)2/(n− 1) and
S2Z =
∑ni=1(Zi − Z)2/(n− 1).
I Exp(µ, σ), (µ, σ) ∈ R× R+:
X1 −X(1)∑ni=1(Xi −X(1))
d≡Z1 − Z(1)∑ni=1(Zi − Z(1))
for Zi being i.i.d. from Exp(0, 1).
36/56
Basu’s Theorem: Examples
I Let X1, . . . , Xn be a random sample from Exp(µ, σ), µ ∈ R, σ ∈ R+.
Then, X(1) and∑ni=1(Xi −X(1)) are independent.
Proof: Fix σ = σ0. Then, X(1) is a CSS for µ ∈ R and∑ni=1(Xi −X(1))
is an AS for µ ∈ R. Thus, they are independent under Pµ,σ0 for all µ ∈ R.
Since this holds for any choice of σ0, they are independent under the
entire model.
I Let X1, . . . , Xn+1 be a random sample from Gamma(α, β), α > 0, β > 0.
Then,∑n+1i=1 Xi is independent of
Z ≡(
X1
X1 + · · ·+Xn+1, . . . ,
XnX1 + · · ·+Xn+1
).
Proof: Fix α = α0. Then,∑n+1i=1 Xi is a CSS for β > 0 (Check this!).
Since Z is an AS for β > 0, the two statistics are independent under Pα0,β
for all β > 0, and thus under Pα,β for all α > 0 and β > 0.
37/56
Use Of Ancillary Statistic To Find UMVUE
Let X1, . . . , Xn be a random sample from Exp(θ), θ > 0. For a given a > 0 we
want to find the UMVUE of (the system reliability) η = Pθ(X1 > a) = e−a/θ.
We have seen that Y =∑ni=1 Xi is a CSS for θ > 0. An easy UE of η is
η0 = I(a,∞)(X1). Thus, η = E(η0|Y ) is the UMVUE. Now, we note that
X1/Yd= Beta(1, n− 1) is an AS, so that it is independent of Y . From this we
get that
E(η0 |Y = y) = P (X1 > a |Y = y)
= P (X1/Y > a/y |Y = y) = P (X1/Y > a/y)
=
∫ 1
a/y
Γ(n)
Γ(1)Γ(n− 1)z0(1− z)n−2 dz
= (1− a/y)n−1I(a,∞)(y).
Thus, η = (1− a/Y )n−1I(a,∞)(Y ) is the UMVUE.
38/56
7.1 Optimality
7.2 Sufficiency and Completeness
7.3 Exponential Family
39/56
Exponential Family
A family of distributions {f(·; θ) : θ ∈ Θ} for Θ ⊂ Rd is called exponential
family if
(i) the support of the density f(·; θ) does not depend on θ ∈ Θ;
(ii) the density has the following form:
f(x; θ) = exp(η(θ)>T (x)−B(θ)) · h(x)
for some known functions η = (η1, . . . , ηk)>, T = (T1, . . . , Tk)>, B and h.
An exponential family is called k-parameter regular exponential family if
(iii) η(Θ) ≡ {η(θ) : θ ∈ Θ} ⊂ Rk contains a k-dimensional open rectangle.
40/56
Canonical Form: Reparametrization by Natural Parameter
I The d-variate real valued function B depends on θ only through η ≡ η(θ),
i.e., B(θ) = A(η(θ)) for some k-variate real valued function A:
1 =
∫X
exp(η(θ)>T (x)−B(θ))h(x) dµ(x)
= e−B(θ)
∫X
exp(η(θ)>T (x))h(x) dµ(x)
let= e−B(θ) · eA(η(θ)).
I This means that the density f(·; θ) actually depends on η:
f(x; η) = exp(η>T (x)−A(η))h(x).
The parameter η is called a natural parameter.
I Natural parameter space: The maximum possible set of η is given by{η ∈ Rk :
∫X
exp(η>T (x))h(x) dµ(x) <∞}.
41/56
Exponential Family: Examples
I Bernoulli(θ), 0 < θ < 1: For x ∈ {0, 1},
f(x; θ) = exp
(x log
(θ
1− θ
)+ log(1− θ)
)= exp(ηx−A(η)),
where η = log(θ/(1− θ)) and A(η) = log(1 + eη). Thus, it is a
1-parameter regular exponential family.
I N(µ, σ2), θ ≡ (µ, σ2) ∈ R× R+: For x ∈ R,
f(x; θ) = exp
(− 1
2σ2x2 +
µ
σ2x− µ2
2σ2− 1
2log(2πσ2)
)= exp (η1T1(x) + η2T2(x)−A(η1, η2)) · (2π)−1/2,
where η1 = −1/(2σ2), η2 = µ/σ2, A(η1, η2) = −η1η22/4 + log(−η1/2)/2
and T1(x) = x2, T2(x) = x. Thus, it is a 2-parameter regular exponential
family.
42/56
Exponential Family: Examples
I Two independent normal populations with a common mean: Let X and Y
be independent and have N(µ, σ21) and N(µ, σ2
2) distributions,
respectively, for θ ≡ (µ, σ21 , σ
22) ∈ R× R2
+. In this case, for (x, y) ∈ R2,
f(x, y; θ) = exp
(− 1
2σ21
x2 +µ
σ21
x− 1
2σ22
y2 +µ
σ22
y
− µ2
2σ21
− µ2
2σ22
− 1
2log(2πσ2
1)− 1
2log(2πσ2
2)
)= exp (η1T1(x, y) + η2T2(x, y)x+ η3T3(x, y)
+η4T4(x, y)−A(η1, . . . , η4)) · h(x, y),
where η1 = −1/(2σ21), η3 = −1/(2σ2
2), η2 = µ/σ21 and η4 = µ/σ2
2 . In
fact, this is not a 4-parameter regular exponential family since
η4 = (η3/η1) · η2.
43/56
Random Sample from Exponential Family
Let X1, . . . , Xn be a random sample from an exponential family of pdf’s
f(·, θ), θ ∈ Θ with f(x; θ) = exp(η(θ)>T (x)−B(θ)) · h(x), where η(θ) is a
k-vector. Let X denote the common support of f(·; θ). Then,
(1) the joint densities of (X1, . . . , Xn) also form an exponential family with
Xn as the common support and
n∏i=1
f(xi; θ) = exp
(η(θ)>
n∑i=1
T (xi)− nB(θ)
)·n∏i=1
h(xi)
as the joint density;
(2) If η(Θ) contains a k-dimensional open rectangle, then Y =∑ni=1 T (Xi) is
a CSS for θ ∈ Θ.
44/56
Proof of (2)
A proof is given only for the case where Xi are discrete random variables.
Clearly, Y is an SS. Let η = η(θ). Then, with∑∗y denoting the sum over all
(x1, . . . , xn) with∑ni=1 T (xi) = y, we get
pdfY (y; η) =∑∗y exp
(η>
n∑i=1
T (xi)− nA(η)
)·n∏i=1
h(xi)
= exp(η>y − nA(η)
)·∑∗y
n∏i=1
h(xi)
let= exp
(η>y −A∗(η)
)h∗(y).
Thus, it follows that Eθ ϕ(Y )θ≡ 0 implies∑
all y
ϕ(y)h∗(y) · eη>y η≡ 0
⇒ ϕ(y) = 0 for all y with h∗(y) > 0,
where the second implication is from the uniqueness of Laplace transform.
45/56
MGF/CGF of T (X) in Exponential Family
Let X be a random variable having a pdf f(·, η), η ∈ N ⊂ Rk. Assume
f(x; η) = exp(η>T (x)−A(η)) · h(x) and that N contains a k-dimensional
open rectangle. Then,
(3) the cumulant generating function of T (X) is given by
cgfT (X)(u; η) ≡ logEηeu>T (X) = A(η + u)−A(η)
for all η ∈ Int(N );
(4) the mean and variance of T (X) under Pη with η ∈ Int(N ) are then given
by
Eη T (X) = A(η), varη(T (X)) = A(η).
46/56
Proofs of (3) and (4)
To prove (3), let η ∈ Int(N ). We can find a small ε such that η + u for all u
with ‖u‖ ≤ ε belong to N . For such u, we get
Eηeu>T (X) =
∫e(η+u)>T (x)−A(η)h(x) dµ(x)
= eA(η+u)−A(η)
∫e(η+u)>T (x)−A(η+u)h(x) dµ(x)
= eA(η+u)−A(η).
The fact (4) is immediate from the fact that
Eη T (X) =∂
∂ucgfT (X)(u; η)
∣∣∣u=0
,
varη(T (X)) =∂2
∂uu>cgfT (X)(u; η)
∣∣∣u=0
.
47/56
Differentiation Under Integral Sign
Let X be a random variable having a pdf f(·, η), η ∈ N ⊂ Rk. Assume
f(x; η) = exp(η>T (x)−A(η)) · h(x) and that N contains a k-dimensional
open rectangle. Suppose that Eη φ(X) exists for all η ∈ N . Then,
(5) Eη φ(X) as a function of η is infinitely many times differentiable at
η ∈ Int(N ), and the differentiation can be made under the integral sign.
For example,
∂
∂ηEη φ(X) =
∫Xφ(x)
∂
∂ηf(x; η) dµ(x).
Remark: Applying the theorem to φ ≡ 1 gives
0 = Eη(T (X)− A(η)
)and 0 = Eη
((T (X)− A(η))2 − A(η)
),
so that Eη T (X) = A(η) and varη(T (X)) = A(η).
48/56
MLE and Exponential Family
Let X1, . . . , Xn be a random sample from an exponential family of pdf’s
f(·, η), η ∈ N ⊂ Rk with f(x; η) = exp(η>T (x)−A(η)) · h(x). Assume that
N contains a k-dimensional open rectangle. Then,
(6) (i) the log-likelihood is strictly concave, and the unique MLE of η is
determined by the likelihood equation
n−1n∑i=1
T (xi) = A(η),
provided that it has a solution η ∈ N ;
(ii) the Fisher information is given by
I1(η) = A(η).
49/56
Proof of (6): Positive Definiteness of A(η)
Suppose that there exists c ≡ (c1, . . . , ck)> 6= 0 in Rk such that c>A(η0)c = 0
for some η0 ∈ N . Then, Pη0(c>T (X) = c>Eη0T (X)) = 1. This implies that
there exists a constant c0 such that
c>T (x) = c0 a.e. [µ] for x ∈ X = {x : h(x) > 0}.
Without loss of generality, assume c1 6= 0. Then,
T1(x) = c0/c1 − (c2/c1)T2 − · · · − (ck/c1)Tk
a.e. [µ] for x ∈ X = {x : h(x) > 0}, which gives that, for all η ∈ Rk,
η>T (x) = η>M>T−1(x) + (c0/c1)η1
a.e. [µ] for x ∈ X = {x : h(x) > 0},
50/56
Proof of (6): Continued
where
M> =
−(c2/c1) · · · −(ck/c1)
1 · · · 0
.... . .
...
0 · · · 1
, T−1(x) =
T2(x)
T3(x)
...
Tk(x)
.
Thus, it holds that for a.e. [µ]
f(x; η) = exp(
(Mη)>T−1(x) + (c0/c1)η1 −A(η))· h(x).
This implies
(c0/c1)η1 −A(η) = − log
∫X
exp((Mη)>T−1(x))h(x) dµ(x)let= −B(Mη)
so that f(x; η) = exp[(Mη)>T−1(x)−B(Mη)] · h(x). Clearly, η is not
identifiable. Contradiction.
51/56
Multinomial Experiments
Let Xi = (Xi,1, · · · , Xi,k−1)> be i.i.d. Multinomial(1, p),
p ≡ (p1, . . . , pk−1)>, pj > 0, p1 + · · ·+ pk−1 < 1.
I Let pk = 1− p1 − · · · − pk−1. Then, the common density of Xi is given by
f(x; p) = exp(x1 log(p1/pk) + · · ·+ xk−1 log(pk−1/pk) + log pk
),
so that the distributions of Xi form a (k − 1)-parameter regular
exponential family.
I Y =∑ni=1 Xi = (
∑ni=1 Xi,1, . . . ,
∑ni=1 Xi,k−1)> is a CSS for p.
I MLE of η: The MLE of η ≡ (log(p1/pk), . . . , log(pk−1/pk))>let= h(p)
solves the equation
Y/n = EηX1, i.e., Y/n = h−1(η).
Thus, the MLE of η is given by η = h(Y/n).
52/56
Multinomial Experiments
I MLE of p: The MLE of p is then p = h−1(η) = Y/n.
I UMVUE of p: Y/n is an UE of p and is a function of the CSS Y , so that
p = Y/n is also the UMVUE of p.
I UMVUE of Σ ≡ diag(p)− pp>: Here, an estimator Σ of Σ is called the
UMVUE of Σ if varp(ΣUE)− varp(Σ) is nonnegative definite for all ΣUE
and for all p, with Σ and ΣUE being the vectorized versions. Note that
ΣMLE = diag(Y/n)− (Y/n)(Y/n)>.
Computing the expected value of ΣMLE, we get
Ep(ΣMLE) = diag(p)− varp(Y/n)− Ep(Y/n)Ep(Y/n)>
= diag(p)− n−1Σ− pp> = (1− 1/n)Σ.
Thus, ΣUMVUE = n · ΣMLE/(n− 1).
53/56
Multivariate Normal Population
Let Xi = (Xi,1, · · · , Xi,k)> (n ≥ 2) be i.i.d. Normal(µ,Σ), µ ∈ Rk and Σ in
the set of k × k positive definite matrices.
I With θ ≡ (µ,Σ) ∈ Rd for d = k + k(k + 1)/2,
f(x; θ) = det(2πΣ)−1/2 exp(−(x− µ)tΣ−1(x− µ)/2
)= exp
(−tr(Σ−1xx>)/2 + µ>Σ−1x
−µ>Σ−1µ/2− log(detπΣ)).
I Y =(∑n
i=1 Xi,∑ni=1 XiX
>i
)is a CSS for θ, where
∑ni=1 XiX
>i is
understood to be a k(k + 1)/2-vector.
I MLE of η ≡ (Σ−1µ, Σ−1): It is the solution of
Y/n = Eη(X1, X1X>1 ), i.e., Y/n = (µ,Σ + µµ>)
let= g(µ,Σ).
Let h(µ,Σ) = (Σ−1µ,Σ−1). Then, ηMLE = h ◦ g−1(Y/n).
54/56
Multivariate Normal Population
I MLE of µ and Σ: The MLE of (µ,Σ) = h−1(η) is then
(µMLE, ΣMLE) = h−1 ◦ h ◦ g−1(Y/n) = g−1(Y/n).
By the definition of the function g : Rd → Rd, this means
n−1n∑i=1
Xi = µMLE and n−1n∑i=1
XiX>i = ΣMLE + µMLEµMLE>,
so that µMLE = X and ΣMLE = n−1∑ni=1(Xi − X)(Xi − X)>.
I UMVUE of µ and Σ: Since(X, (n− 1)−1∑n
i=1(Xi − X)(Xi − X)>)
is
a 1-1 function of Y , it is also a CSS for (µ,Σ). Since X and
(n− 1)−1∑ni=1(Xi − X)(Xi − X)> are UEs of µ and Σ, respectively,
they are the UMVUE of the respective parameters.
55/56
UMVUE of Normal Probabilities
Let X1, . . . , Xn (n ≥ 2) be a random sample from N(θ, 1), θ ∈ R. We want to
find the UMVUE of the normal probabilities η = Pθ(X1 ≤ c) = Φ(c− θ). Note
that X is a CSS for this model. An easy UE of η is I(−∞,c](X1). Thus, by
R-B-L-S Theorem, the UMVUE of η is given by
η = E(I(−∞,c](X1) | X) = P (X1 ≤ c | X). We compute P (X1 ≤ c | X = y)
below. Recall that (X1 − X, . . . , Xn − X) is an ancillary statistic and thus
independent of X (Basu Theorem).
P (X1 ≤ c | X = y) = P (X1 − X ≤ c− y|X = y)
= P (X1 − X ≤ c− y)
= Φ
(√n
n− 1(c− y)
).
Thus, the UMVUE of η = Φ(c− θ) is given by
η = Φ
(√n
n− 1(c− X)
).
56/56
Independence of Normal Sample Mean and Variance
Let X1, . . . , Xn (n ≥ 2) be a random sample from
N(µ, σ2), θ ≡ (µ, σ2) ∈ R× R+. Then, X and∑ni=1(Xi − X)2 are
independent under Pθ for all θ ∈ R× R+.
Proof: Fix σ2 = σ20 . Then, X is a CSS and (X1 − X, . . . , Xn − X) is an
ancillary statistic for the submodel Θ(σ20) ≡ {(µ, σ2
0) : µ ∈ R}. Thus, they are
independent under Pθ for all θ ∈ Θ(σ20), due to the Basu’s theorem. Since the
choice σ20 is arbitrary within R+, this implies that X and
(X1 − X, . . . , Xn − X) are independent under Pθ for all
θ ∈⋃
σ20∈R+
Θ(σ20) = R× R+.
The independence of X and∑ni=1(Xi − X)2 is immediate from this.