Mathematical Statistics 2 - Seoul National University · Mathematical Statistics 2 Chapter 7: Su...

1/56

Mathematical Statistics, Fall 2015

Chapter 7: Sufficiency and Comparison

Byeong U. Park

Department of Statistics, Seoul National University

2/56

7.1 Optimality

7.2 Sufficiency and Completeness

7.3 Exponential Family

3/56

Comparison of Estimators

Let X1, . . . , Xn be a random sample from a population with pdf

f(·; θ), θ ∈ Θ ⊂ Rd. Below, we write X = (X1, . . . , Xn). How to compare

different estimators of a parameter of interest, say g(θ) ∈ R?

I Loss function: L(·, ·) : Θ×A → R+, where A ⊂ R is an action space for

the estimation of g(θ) such that L(θ, g(θ)) = 0. For example,

L(θ, a) = (a− g(θ))2, L(θ, a) = |a− g(θ)|.

I Risk function: For an estimator δ(X) of g(θ),

R(θ, δ) = EθL(θ, δ(X)).

In the case of the squared error loss, the risk R(θ, δ) is the mean squared

error of δ(X), i.e., R(θ, δ) = Eθ(δ(X)− g(θ))2.

4/56

Difficulty with Uniform Comparison

I One would prefer δ1 to δ2 if

R(θ, δ1) ≤ R(θ, δ2) for all θ ∈ Θ, and R(θ, δ1) < R(θ, δ2) for some θ ∈ Θ.

I The difficulty is that there exists no estimator that is best in this sense,

i.e., there exists no δ0 such that

R(θ, δ0) ≤ R(θ, δ) for all δ and for all θ ∈ Θ.

I Why? If it exists, then R(θ, δ0)θ∈Θ≡ 0, which leads to a contradiction.

5/56

Optimal Estimation

I Restricted class of estimators: One may find an estimator δ0 in the class

of unbiased estimators of g(θ) such that

Eθ(δ0(X)− g(θ))2 ≤ Eθ(δ(X)− g(θ))2 for all θ ∈ Θ

for any unbiased estimator δ(X). If exists, such an estimator is called

UMVUE (Uniformly Minimum Variance Unbiased Estimator).

I Global measures of performance: The estimator that minimizes the

maximum risk r(δ) = maxθ∈Θ R(θ, δ) is called a minimax estimator. The

estimator that minimizes the average risk

r(δ;π) =

∫Θ

R(θ, δ)π(θ) dµ(θ)

for a weight function π is called a Bayes estimator.

6/56

Example: Minimax Estimation

Let X1, . . . , Xn be a random sample from N(µ, σ2), θ = (µ, σ2) ∈ R× R+.

Consider the loss function L(θ, a) = (a/σ2 − 1)2 for the estimation of

g(θ) = σ2. Suppose we want to find an estimator that minimizes the maximum

risk among all estimators in the class{δc : δc(x) = c

∑ni=1(xi − x)2, c > 0

}.

I Let S2 =∑ni=1(Xi − X)2. Since S2/σ2 d

= χ2(n− 1), it follows that

R(θ, δc) = Eθ(cS2/σ2 − 1)2

= c2Var(χ2(n− 1)

)+(cE(χ2(n− 1))− 1

)2= c2 · 2(n− 1) +

(c(n− 1)− 1

)2θ≡ max

θR(θ, δc).

I Since arg minc>0 maxθ R(θ, δc) = 1/(n+ 1), we get

σ2 =∑ni=1(Xi − X)2/(n+ 1) as the minimax estimator among the class.

7/56

7.1 Optimality



8/56

The Idea of Sufficiency

Suppose that we observe X1 and X2 that are i.i.d. Bernoulli(θ) random

variables, where 0 < θ < 1.

I The distribution of Y = X1 +X2:

Pθ(Y = 0) = (1− θ)2, Pθ(Y = 1) = 2θ(1− θ), Pθ(Y = 2) = θ2.

I When we observe Y , we may produce (X∗1 , X∗2 ), without knowledge of the

true θ, that has the same distribution of the original (X1, X2), as follows:

(i) Put (X∗1 , X∗2 ) = (0, 0) when Y = 0; (ii) put (X∗1 , X

∗2 ) = (1, 1) when

Y = 2; (iii) conduct a randomized experiment and put (X∗1 , X∗2 ) = (1, 0)

and (X∗1 , X∗2 ) = (0, 1), each with probability 1/2 when Y = 1.

I Thus, for any estimator δ(X1, X2) of g(θ) one may find an estimator that

depends only on Y rather than (X1, X2) but has the same risk as

δ(X1, X2).

9/56

Sufficient Statistic

Let X1, . . . , Xn be a random sample from a population with pdf

f(·; θ), θ ∈ Θ ⊂ Rd. We write X = (X1, . . . , Xn) below.

I Sufficient statistic for θ ∈ Θ: A statistic Y = u(X) is called a sufficient

statistic if the conditional distribution of X given Y does not depend on

θ ∈ Θ, i.e., if

Pθ1(X ∈ A|Y = y)θ1,θ2∈Θ≡ Pθ2(X ∈ A|Y = y)

for all A and for all y.

I This definition is more general than those in the textbook.

I Read Remark 7.2.1 on p.391.

10/56

Factorization Theorem

A statistic Y = u(X) is a sufficient statistic for θ ∈ Θ if and only if there exist

functions f1 and f2 such that

n∏i=1

f(xi; θ) = f1(u(x), θ) · f2(x) for all x and for all θ ∈ Θ.

Proof: We give a proof for the case of discrete Xi. For the necessity part,

n∏i=1

f(xi; θ) = Pθ(X = x, Y = u(x)

)= P

(X = x|Y = u(x)

)· Pθ

(Y = u(x)

).

The first component on the RHS of the second equation does not depend on

θ ∈ Θ. For the sufficiency part,

Pθ(X = x|Y = y) =Pθ(X = x, u(X) = y)

Pθ(u(X) = y)=Pθ(X = x)I(u(x) = y)∑

z:u(z)=y Pθ(X = z)

=f2(x)I(u(x) = y)∑

z:u(z)=y f2(z).

11/56

Sufficient Statistic: Examples

I Bernoulli(θ), θ ∈ (0, 1):

n∏i=1

f(xi; θ) =n∏i=1

θxi(1− θ)1−xi = θ∑xi(1− θ)n−

∑xi ,

so that Y =∑ni=1 Xi is a sufficient statistic for θ ∈ (0, 1)

I Gamma(2, θ), θ > 0:n∏i=1

f(xi; θ) =

n∏i=1

θ−2 · xi · e−xi/θ · I(0,∞)(xi)

= θ−2n exp

(−

n∑i=1

xi/θ

)·

(n∏i=1

xi

)I(0,∞)(minxi),

so that Y =∑ni=1 Xi is a sufficient statistic for θ > 0.

12/56

Sufficient Statistic: Examples

I Gamma(α, β), θ = (α, β) ∈ R+ × R+:n∏i=1

f(xi; θ) =n∏i=1

Γ(α)−1β−2 · xα−1i · e−xi/β · I(0,∞)(xi)

= Γ(α)−nβ−2n

(n∏i=1

xi

)α−1

exp

(−

n∑i=1

xi/β

)×I(0,∞)(minxi),

so that Y = (∏ni=1 Xi,

∑ni=1 Xi) is a sufficient statistic for θ ∈ R+ × R+.

I Uniform(θ1 − θ2, θ1 + θ2), θ = (θ1, θ2) ∈ R× R+: Assume n ≥ 2.

n∏i=1

f(xi; θ) = (2θ2)−nI(θ1−θ2,∞)(x(1))I(−∞,θ1+θ2)(x(n)),

so that Y = (X(1), X(n)) is a sufficient statistic for θ ∈ R× R+.

I Exp(θ, 1), θ ∈ R: Y = X(1) is a sufficient statistic for θ ∈ R.

13/56

Minimal Sufficient Statistic

I There are many sufficient statistics for a given model. As an extreme

example, X = (X1, . . . , Xn) itself is also a sufficient statistic. As another

example, consider the case where X1, X2, X3 are i.i.d. Bernoulli(θ)

random variables and θ ∈ (0, 1). For latter model, sufficient statistics

include (i) (X1, X2, X3); (ii) (X1 +X2, X3); (iii) (X1 +X3, X2); (iv)

(X1, X2 +X3); (v) X1 +X2 +X3.

I Minimal sufficient statistic: A sufficient statistic is called a minimal

sufficient statistic (MSS) if it is a function of any sufficient statistic.

14/56

Properties of Sufficient Statistic

I Any statistic that is a 1-1 function of a sufficient statistic for θ ∈ Θ is also

a sufficient statistic for θ ∈ Θ.

Proof: Let Y be a sufficient statistic for θ ∈ Θ and let W = g(Y ) for a

known 1-1 function g. Then, the event (W = w) equals (Y = g−1(w)) for

any w, so that for any A, w and θ ∈ Θ it holds that

Pθ(X ∈ A|W = w) = P (X ∈ A|Y = g−1(w)),

the latter not depending on θ ∈ Θ.

I Any statistic that is a 1-1 function of MSS is also an MSS.

I MLE and SS: The unique MLE of θ, when it exists, is a function of any

sufficient statistic for θ ∈ Θ.

15/56

Properties of Sufficient Statistic

Proof: Let u(X) be a sufficient statistic for θ ∈ Θ. Then, for the unique

MLE θ,

θ = arg maxθ∈Θ

n∏i=1

f(xi; θ)

= arg maxθ∈Θ

f1

(θ, u(x)

)f2(x)

= arg maxθ∈Θ

f1

(θ, u(x)

)= ( function of u(x)).

I Existence of MSS: If the MLE of θ ∈ Θ is unique and it is a sufficient

statistic for θ ∈ Θ, then it is a minimal sufficient statistic for θ ∈ Θ.

Proof: Immediate from the second property.

I Suppose that there exists θ0 ∈ Θ such that supp(f(·; θ)) ⊂ supp(f(·; θ0))

for all θ ∈ Θ. Then, the statistic T (X), as a (random) function defined on

Θ in such a way that T (X)(θ) =∏ni=1

(f(Xi, θ)

/f(Xi; θ0)

), is an MSS.

16/56

MSS: An Example

Let X1, . . . , Xn (n ≥ 2) be a random sample from Uniform(θ1 − θ2, θ1 + θ2),

θ = (θ1, θ2) ∈ R× R+. For this model, Y = (X(1), X(n)) is an MSS.

Proof: The sufficiency was established before. Let the model is reparametrized

by η = (η1, η2), where η1 = θ1 − θ2 and η2 = θ1 + θ2. Then,

n∏i=1

f(xi; η) = (η2 − η1)−nI(−∞,x(1)](η1)I[x(n),∞)(η2).

Clearly, (η1, η2) = (X(1), X(n)) is the unique MLE, so that (θ1, θ2) defined by

θ1 = (X(1) +X(n))/2, θ2 = (X(n) −X(1))/2

is the unique MLE of (θ1, θ2) =((η1 + η2)/2, (η2 − η1)/2

). Since (θ1, θ2) is a

1-1 function of the SS (X(1), X(n)), it is also an SS and thus an MSS. This

establishes that (X(1), X(n)) is an MSS since it is a 1-1 function of (θ1, θ2).

17/56

Rao-Blackwell Theorem

I Rao-Blackwell theorem: Let X1, . . . , Xn be a random sample from a

population with pdf f(·; θ), θ ∈ Θ ⊂ Rd. Let Y = u(X) be a sufficient

statistic. Then, for any estimator of η(X) of η = g(θ) with finite second

moment,

η∗(Y ) = E(η(X)|Y )

is a statistic with the properties that (i) Eθ(η∗) = Eθ(η); (ii)

varθ(η∗) ≤ varθ(η); (iii) MSEθ(η

∗) ≤ MSEθ(η).

I The Rao-Blackwell theorem tells that one may have a better estimator by

conditioning on a sufficient statistic, in terms of MSE. By the theorem, if

η is an unbiased estimator of η, then η∗ is also an unbiased estimator but

with a smaller variance.

I Proof of the theorem: var(W ) = var(E(W |V )) + E(var(W |V )).

18/56

Rao-Blackwell theorem

space of random variables

space of functions of V

W

E(W )

space of constants

E(W |V )

19/56

Example: Rao-Blackwellization

Let X1, . . . , Xn (n ≥ 2) be a random sample from U [0, θ], θ > 0. Take

θ = 2X as an unbiased estimator of θ. We know that X(n) is a sufficient

statistic for θ > 0. By Rao-Blackwell theorem, θ∗ ≡ E(2X|X(n)) is an UE of θ

with variance less than or equal to that of θ. For 1 ≤ r ≤ n− 1, we note that

pdfX(r)|X(n)(x|y) =

(n− 1)!

(r − 1)!(n− r − 1)!

(x

y

)r−1(1− x

y

)n−r−11

yI(0,y)(x)

and that, recalling the pdf of Beta(r + 1, n− r + 1),∫ y

0

x · (n− 1)!

(r − 1)!(n− r − 1)!

(x

y

)r−1(1− x

y

)n−r−11

ydx

=ry

n

∫ 1

0

n!

r!(n− r − 1)!tr(1− t)n−r−1 dt

= (r/n)y.

20/56

Example: Rao-Blackwellization

Thus, we get

2E(X|X(n) = y) = 2n−1

(y +

n−1∑r=1

E(X(r)|X(n) = y)

)

= 2n−1

(y +

n−1∑r=1

r

n· y

)

=n+ 1

ny.

Indeed, θ∗ = (n+ 1)X(n)/n and

varθ(θ∗) =

(n+ 1

n

)2

· varθ(X(n))

=1

n(n+ 2)θ2

<1

3nθ2 = varθ(θ).

21/56

Uniformly Minimum Variance Unbiased Estimator

We have seen that taking conditional expectation, on a sufficient statistic, of a

given unbiased estimator always improves the estimator in terms of variance.

I UMVUE: An estimator η of η = g(θ) is called the uniformly minimum

variance unbiased estimator if it itself is unbiased and varθ(η) ≤ varθ(η)

for all θ ∈ Θ and for any unbiased estimator η of η.

I Uniqueness of UMVUE: If there exists an unbiased estimator with finite

variance, then UMVUE is unique: Let η1 and η2 be UMVUE of η. Since

varθ(η1) = varθ(η2) ≤ varθ((η1 + η2)/2) <∞ for all θ ∈ Θ, we get

varθ(η1 − η2) ≤ 0 (and thus = 0) for all θ ∈ Θ.

This implies Pθ(η1 − η2 = c) = 1 for all θ ∈ Θ for some constant c. That

constant equals zero since both η1 and η2 are unbiased.

22/56

Complete Statistic

Let X1, . . . , Xn be a random sample from a population with pdf f(·; θ),

θ ∈ Θ ⊂ Rd. The following notion of completeness facilitates the derivation of

UMVUE.

I Complete statistic for θ ∈ Θ: A statistic Y = u(X) is called a complete

statistic for θ ∈ Θ and {pdfY (·; θ) : θ ∈ Θ} is called a complete family of

distributions if

Eθ ϕ(Y ) = 0 for all θ ∈ Θ implies Pθ(ϕ(Y ) = 0) = 1 for all θ ∈ Θ.

I A complete statistic Y is “complete” in the sense that any non-constant

function of Y has a non-constant expected value (as a function of θ).

I Complete sufficient statistic: A statistic is called a complete sufficient

statistic (CSS) for θ ∈ Θ if it is sufficient and complete for θ ∈ Θ.

23/56

Rao-Blackwell-Lehmann-Scheffe Theorem


θ ∈ Θ ⊂ Rd. Let Y = u(X) be a CSS for θ ∈ Θ. Assume that there exists an

unbiased estimator with finite variance. Then, (i) for any unbiased estimator η0

of η = g(θ) with finite variance, η = E(η0|Y ) is the UMVUE; (ii) any function

of Y , say ϕ(Y ), is the UMVUE if it is unbiased.

Proof: (i) Clearly, η is an UE. For an arbitrary unbiased estimator η with finite

variance,

varθ(η) ≥ varθ(E(η|Y ))θ≡ varθ(η),

the inequality holding due to the Rao-Blackwell theorem and the equality

holding due to the completeness of Y [Pθ(η = E(η|Y ))θ≡ 1].

(ii) Replace η by ϕ(Y ) in the proof of (i).

Remark: UE that is a function of a CSS is unique and it is the UMVUE.

24/56

Methods of Finding UMVUE

I Method 1: Rao-Blackwellization with a CSS.

(1) Find a CSS Y = u(X).

(2) Find an ‘easy’ UE η0.

(3) Compute the conditional expectation E(η0|Y ).

I Method 2: Trial and error.

(1) Find a CSS Y = u(X).

(2) Solve Eθ ϕ(Y )θ≡ g(θ) with respect to ϕ,

or try some ϕ(Y ) and check unbiasedness.

25/56

CSS and UMVUE: Bernoulli Model

Let X1, . . . , Xn be a random sample from Bernoulli(θ), θ ∈ [0, 1].

I Finding a CSS: By Factorization theorem, Y =∑ni=1 Xi is an SS. To

check whether it is also complete for θ ∈ [0, 1], suppose that Eθ ϕ(Y ) = 0

for all θ ∈ [0, 1] for a function ϕ. Then, since Yd= Binomial(n, θ), we have

n∑y=0

ϕ(y)

(n

y

)θy(1− θ)n−y θ≡ 0

⇒n∑y=0

ϕ(y)

(n

y

)λy

λ>0≡ 0

⇒ ϕ(y)

(n

y

)y≡ 0 (uniqueness of polynomial coefficients)

⇒ ϕ(y)y≡ 0.

I Estimation of the mean: Since Eθ(Y/n)θ≡ θ, we conclude that η = X is

the UMVUE of θ.

26/56

CSS and UMVUE: Bernoulli Model

I Estimation of the variance (Method 1): For the estimation of

η = θ(1− θ), consider η0 = X1(1−X2) as an UE of η. Then, we get

E(η0|Y = y) = P(X1 = 1, X2 = 0

∣∣∣ n∑i=1

Xi = y)

=

(n− 2

y − 1

)/(ny

)= y(n− y)/n(n− 1),

so that η = nX(1− X)/(n− 1) is the UMVUE.

I Estimation of the variance (Method 2): Try the MLE of η:

ηMLE = X(1− X). We get

Eθ(X(1− X)) = EθX −[var(X) +

(Eθ(X)

)2]= (n− 1)θ(1− θ)/n.

Thus, η = nX(1− X)/(n− 1) is an UE that is a function of CSS,

concluding that it is the UMVUE.

27/56

CSS and UMVUE: Poisson Model

Let X1, . . . , Xn be a random sample from Poisson(θ), θ > 0.

I Finding a CSS: By Factorization theorem, Y =∑ni=1 Xi is an SS. To

check whether it is also complete for θ > 0, suppose that Eθ ϕ(Y ) = 0 for

all θ > 0 for a function ϕ. Then, since Yd= Poisson(nθ), we have

∞∑y=0

ϕ(y)(nθ)ye−nθ

y!

θ>0≡ 0

⇒∞∑y=0

ϕ(y)

y!λy

λ>0≡ 0

⇒ ϕ(y)

y!

y≡ 0 (uniqueness of power series coefficients)

⇒ ϕ(y)y≡ 0.

I Estimation of the mean: Since Eθ(Y/n)θ≡ θ, we conclude that η = X is

the UMVUE of θ.

28/56

CSS and UMVUE: Poisson Model

I Estimation of η = e−2θ: We solve Eθ ϕ(Y )θ≡ e−2θ with respect to ϕ.

Note that∞∑y=0

ϕ(y)(nθ)ye−nθ

y!

θ≡ e−2θ

⇔∞∑y=0

ϕ(y)

(n

n− 2

)y((n− 2)θ)y

y!

θ≡ e(n−2)θ

⇔∞∑y=0

ϕ(y)

(n

n− 2

)y((n− 2)θ)y

y!

θ≡∞∑y=0

((n− 2)θ)y

y!.

This gives ϕ(y) = (1− 2/n)y, so that the UMVUE of η is given by

η =

(n− 2

n

)nX.

I Remark: As n ↑ ∞, the UMVUE η is approximated by the MLE e−2X , so

one may expect that the UMVUE would behave nicely for large n.

29/56

CSS and UMVUE: Exponential Model

Let X1, . . . , Xn be a random sample from Exponential(θ), θ > 0 with pdf

f(x1; θ) = θ−1e−x1/θ. By Factorization theorem, Y =∑ni=1 Xi is an SS. To

check whether it is also complete for θ > 0, suppose that Eθ ϕ(Y ) = 0 for all

θ > 0 for a function ϕ. Then, since Yd= Gamma(n, θ), we have∫ ∞

0

ϕ(y)yn−1

Γ(n)θne−y/θ dy

θ>0≡ 0

⇒∫ ∞

0

ϕ(y) yn−1e−λy dyλ>0≡ 0

⇒ ϕ(y) yn−1 y>0≡ 0 (uniqueness of Laplace transform)

⇒ ϕ(y)y>0≡ 0.

Since Eθ(Y/n)θ≡ θ, we conclude that η = X is the UMVUE of θ.

30/56

CSS and UMVUE: Uniform[0, θ] Model

Let X1, . . . , Xn be a random sample from Uniform[0, θ], θ > 0. By

Factorization theorem, it can be verified that Y = X(n) is an SS. To check

whether it is also complete for θ > 0, suppose that Eθ ϕ(Y ) = 0 for all θ > 0

for a function ϕ. Then, since the pdf of Y is given by

pdfY (y; θ) = nθ−nyn−1I[0,θ](y), we have∫ θ

0

ϕ(y)nθ−nyn−1 dyθ>0≡ 0

⇒∫ θ

0

ϕ(y) yn−1 dyθ>0≡ 0

⇒ ϕ(θ) θn−1 θ>0≡ 0 (Fundamental Theorem of Calculus)

⇒ ϕ(y)y>0≡ 0.

Since Eθ(Y )θ≡ nθ/(n+ 1), we conclude that η = (n+ 1)X(n)/n is the

UMVUE of θ.

31/56

CSS and UMVUE: Uniform[−θ, θ] Model

Let X1, . . . , Xn (n ≥ 2) be a random sample from Uniform[−θ, θ], θ > 0.

I By Factorization theorem, it can be verified that (X(1), X(n)) is an SS.

But, it is not a complete statistic for θ > 0. One may find a non-trivial

function of (X(1), X(n)) such that Eθ ϕ(X(1), X(n)) equals identically zero.

For example, ϕ(X(1), X(n)) = X(n)/X(1) − an, where an = E(U(n)/U(1))

for a random sample (Ui : 1 ≤ i ≤ n) from Uniform[−1, 1].

I In fact, for this model, Y = max1≤i≤n |Xi| is a CSS and

η = (n+ 1) max1≤i≤n |Xi|/n is the UMVUE of θ. To see this, note that

the joint density of X at x equals (2θ)−nI[0,θ](max1≤i≤n |xi|), which tells

that Y is an SS. Since |Xi| are i.i.d. Uniform[0, θ], one may then show

that Y is also complete and thus η is the UMVUE.

32/56

Ancillary Statistic


θ ∈ Θ ⊂ Rd.

I Ancillary statistic: A statistic Z = v(X) is called an ancillary statistic for

θ ∈ Θ if Pθ(Z ∈ A) does not depend on θ ∈ Θ for all A, i.e.,

Pθ1(Z ∈ A)θ1,θ2∈Θ≡ Pθ2(Z ∈ A)

for all A.

I Basu’s Theorem (Independence of CSS and AS): If Y = u(X) is a CSS

and Z = v(X) is an AS for θ ∈ Θ, then Y and Z are independent under

Pθ for all θ ∈ Θ, i.e.,

Pθ(Y ∈ A,Z ∈ B)A,B≡ Pθ(Y ∈ A)P (Z ∈ B) for all θ ∈ Θ.

33/56

Proof of Basu’s Theorem

Let A and B be arbitrary sets. Since Y is a sufficient statistic for θ ∈ Θ,

ϕ(y) ≡ P (Z ∈ B|Y = y) does not depend on θ ∈ Θ for all y. Also, we note

that Eθ(ϕ(Y )) = P (Z ∈ B), which does not depend on θ either because of the

ancillarity of Z. Thus,

Eθ (ϕ(Y )− P (Z ∈ B))θ≡ 0.

Due to the completeness of Y , this entails ϕ(y)y≡ P (Z ∈ B), so that

Pθ(Y ∈ A,Z ∈ B) =

∫A

P (Z ∈ B|Y = y) · pdfY (y; θ) dµ(y)

= P (Z ∈ B) · Pθ(Y ∈ A)

for all θ ∈ Θ.

34/56

Ancillary Statistic: Examples

I N(θ, 1), θ ∈ R:

(X1 − X, . . . , Xn − X)d≡ (Z1 − Z, . . . , Zn − Z)

for Zi being i.i.d. from N(0, 1).

I Exp(θ, 1), θ ∈ R:

(X1 −X(1), . . . , Xn −X(n))d≡ (Z1 − Z(1), . . . , Zn − Z(n))

for Zi being i.i.d. from Exp(0, 1).

I Gamma(α, β), β > 0 with α known:(X1∑n+1i=1 Xi

, . . . ,Xn∑n+1i=1 Xi

)d≡

(Z1∑n+1i=1 Zi

, . . . ,Zn∑n+1i=1 Zi

)

for Zi being i.i.d. from Gamma(α, 1).

35/56

Ancillary Statistic: Examples

I Uniform(0, θ), θ > 0: X(n)/X(1)d≡ Z(n)/Z(1) for Zi being i.i.d. from

Uniform(0, 1).

I N(µ, σ2), (µ, σ2) ∈ R× R+:(X1 − XSX

, . . . ,Xn − XSX

)d≡(Z1 − ZSZ

, . . . ,Zn − ZSZ

)for Zi being i.i.d. from N(0, 1), where S2

X =∑ni=1(Xi − X)2/(n− 1) and

S2Z =

∑ni=1(Zi − Z)2/(n− 1).

I Exp(µ, σ), (µ, σ) ∈ R× R+:

X1 −X(1)∑ni=1(Xi −X(1))

d≡Z1 − Z(1)∑ni=1(Zi − Z(1))

for Zi being i.i.d. from Exp(0, 1).

36/56

Basu’s Theorem: Examples

I Let X1, . . . , Xn be a random sample from Exp(µ, σ), µ ∈ R, σ ∈ R+.

Then, X(1) and∑ni=1(Xi −X(1)) are independent.

Proof: Fix σ = σ0. Then, X(1) is a CSS for µ ∈ R and∑ni=1(Xi −X(1))

is an AS for µ ∈ R. Thus, they are independent under Pµ,σ0 for all µ ∈ R.

Since this holds for any choice of σ0, they are independent under the

entire model.

I Let X1, . . . , Xn+1 be a random sample from Gamma(α, β), α > 0, β > 0.

Then,∑n+1i=1 Xi is independent of

Z ≡(

X1

X1 + · · ·+Xn+1, . . . ,

XnX1 + · · ·+Xn+1

).

Proof: Fix α = α0. Then,∑n+1i=1 Xi is a CSS for β > 0 (Check this!).

Since Z is an AS for β > 0, the two statistics are independent under Pα0,β

for all β > 0, and thus under Pα,β for all α > 0 and β > 0.

37/56

Use Of Ancillary Statistic To Find UMVUE

Let X1, . . . , Xn be a random sample from Exp(θ), θ > 0. For a given a > 0 we

want to find the UMVUE of (the system reliability) η = Pθ(X1 > a) = e−a/θ.

We have seen that Y =∑ni=1 Xi is a CSS for θ > 0. An easy UE of η is

η0 = I(a,∞)(X1). Thus, η = E(η0|Y ) is the UMVUE. Now, we note that

X1/Yd= Beta(1, n− 1) is an AS, so that it is independent of Y . From this we

get that

E(η0 |Y = y) = P (X1 > a |Y = y)

= P (X1/Y > a/y |Y = y) = P (X1/Y > a/y)

=

∫ 1

a/y

Γ(n)

Γ(1)Γ(n− 1)z0(1− z)n−2 dz

= (1− a/y)n−1I(a,∞)(y).

Thus, η = (1− a/Y )n−1I(a,∞)(Y ) is the UMVUE.

38/56

7.1 Optimality



39/56

Exponential Family

A family of distributions {f(·; θ) : θ ∈ Θ} for Θ ⊂ Rd is called exponential

family if

(i) the support of the density f(·; θ) does not depend on θ ∈ Θ;

(ii) the density has the following form:

f(x; θ) = exp(η(θ)>T (x)−B(θ)) · h(x)

for some known functions η = (η1, . . . , ηk)>, T = (T1, . . . , Tk)>, B and h.

An exponential family is called k-parameter regular exponential family if

(iii) η(Θ) ≡ {η(θ) : θ ∈ Θ} ⊂ Rk contains a k-dimensional open rectangle.

40/56

Canonical Form: Reparametrization by Natural Parameter

I The d-variate real valued function B depends on θ only through η ≡ η(θ),

i.e., B(θ) = A(η(θ)) for some k-variate real valued function A:

1 =

∫X

exp(η(θ)>T (x)−B(θ))h(x) dµ(x)

= e−B(θ)

∫X

exp(η(θ)>T (x))h(x) dµ(x)

let= e−B(θ) · eA(η(θ)).

I This means that the density f(·; θ) actually depends on η:

f(x; η) = exp(η>T (x)−A(η))h(x).

The parameter η is called a natural parameter.

I Natural parameter space: The maximum possible set of η is given by{η ∈ Rk :

∫X

exp(η>T (x))h(x) dµ(x) <∞}.

41/56

Exponential Family: Examples

I Bernoulli(θ), 0 < θ < 1: For x ∈ {0, 1},

f(x; θ) = exp

(x log

(θ

1− θ

)+ log(1− θ)

)= exp(ηx−A(η)),

where η = log(θ/(1− θ)) and A(η) = log(1 + eη). Thus, it is a

1-parameter regular exponential family.

I N(µ, σ2), θ ≡ (µ, σ2) ∈ R× R+: For x ∈ R,

f(x; θ) = exp

(− 1

2σ2x2 +

µ

σ2x− µ2

2σ2− 1

2log(2πσ2)

)= exp (η1T1(x) + η2T2(x)−A(η1, η2)) · (2π)−1/2,

where η1 = −1/(2σ2), η2 = µ/σ2, A(η1, η2) = −η1η22/4 + log(−η1/2)/2

and T1(x) = x2, T2(x) = x. Thus, it is a 2-parameter regular exponential

family.

42/56

Exponential Family: Examples

I Two independent normal populations with a common mean: Let X and Y

be independent and have N(µ, σ21) and N(µ, σ2

2) distributions,

respectively, for θ ≡ (µ, σ21 , σ

22) ∈ R× R2

+. In this case, for (x, y) ∈ R2,

f(x, y; θ) = exp

(− 1

2σ21

x2 +µ

σ21

x− 1

2σ22

y2 +µ

σ22

y

− µ2

2σ21

− µ2

2σ22

− 1

2log(2πσ2

1)− 1

2log(2πσ2

2)

)= exp (η1T1(x, y) + η2T2(x, y)x+ η3T3(x, y)

+η4T4(x, y)−A(η1, . . . , η4)) · h(x, y),

where η1 = −1/(2σ21), η3 = −1/(2σ2

2), η2 = µ/σ21 and η4 = µ/σ2

2 . In

fact, this is not a 4-parameter regular exponential family since

η4 = (η3/η1) · η2.

43/56

Random Sample from Exponential Family

Let X1, . . . , Xn be a random sample from an exponential family of pdf’s

f(·, θ), θ ∈ Θ with f(x; θ) = exp(η(θ)>T (x)−B(θ)) · h(x), where η(θ) is a

k-vector. Let X denote the common support of f(·; θ). Then,

(1) the joint densities of (X1, . . . , Xn) also form an exponential family with

Xn as the common support and

n∏i=1

f(xi; θ) = exp

(η(θ)>

n∑i=1

T (xi)− nB(θ)

)·n∏i=1

h(xi)

as the joint density;

(2) If η(Θ) contains a k-dimensional open rectangle, then Y =∑ni=1 T (Xi) is

a CSS for θ ∈ Θ.

44/56

Proof of (2)

A proof is given only for the case where Xi are discrete random variables.

Clearly, Y is an SS. Let η = η(θ). Then, with∑∗y denoting the sum over all

(x1, . . . , xn) with∑ni=1 T (xi) = y, we get

pdfY (y; η) =∑∗y exp

(η>

n∑i=1

T (xi)− nA(η)

)·n∏i=1

h(xi)

= exp(η>y − nA(η)

)·∑∗y

n∏i=1

h(xi)

let= exp

(η>y −A∗(η)

)h∗(y).

Thus, it follows that Eθ ϕ(Y )θ≡ 0 implies∑

all y

ϕ(y)h∗(y) · eη>y η≡ 0

⇒ ϕ(y) = 0 for all y with h∗(y) > 0,

where the second implication is from the uniqueness of Laplace transform.

45/56

MGF/CGF of T (X) in Exponential Family

Let X be a random variable having a pdf f(·, η), η ∈ N ⊂ Rk. Assume

f(x; η) = exp(η>T (x)−A(η)) · h(x) and that N contains a k-dimensional

open rectangle. Then,

(3) the cumulant generating function of T (X) is given by

cgfT (X)(u; η) ≡ logEηeu>T (X) = A(η + u)−A(η)

for all η ∈ Int(N );

(4) the mean and variance of T (X) under Pη with η ∈ Int(N ) are then given

by

Eη T (X) = A(η), varη(T (X)) = A(η).

46/56

Proofs of (3) and (4)

To prove (3), let η ∈ Int(N ). We can find a small ε such that η + u for all u

with ‖u‖ ≤ ε belong to N . For such u, we get

Eηeu>T (X) =

∫e(η+u)>T (x)−A(η)h(x) dµ(x)

= eA(η+u)−A(η)

∫e(η+u)>T (x)−A(η+u)h(x) dµ(x)

= eA(η+u)−A(η).

The fact (4) is immediate from the fact that

Eη T (X) =∂

∂ucgfT (X)(u; η)

∣∣∣u=0

,

varη(T (X)) =∂2

∂uu>cgfT (X)(u; η)

∣∣∣u=0

.

47/56

Differentiation Under Integral Sign

Let X be a random variable having a pdf f(·, η), η ∈ N ⊂ Rk. Assume

f(x; η) = exp(η>T (x)−A(η)) · h(x) and that N contains a k-dimensional

open rectangle. Suppose that Eη φ(X) exists for all η ∈ N . Then,

(5) Eη φ(X) as a function of η is infinitely many times differentiable at

η ∈ Int(N ), and the differentiation can be made under the integral sign.

For example,

∂

∂ηEη φ(X) =

∫Xφ(x)

∂

∂ηf(x; η) dµ(x).

Remark: Applying the theorem to φ ≡ 1 gives

0 = Eη(T (X)− A(η)

)and 0 = Eη

((T (X)− A(η))2 − A(η)

),

so that Eη T (X) = A(η) and varη(T (X)) = A(η).

48/56

MLE and Exponential Family

Let X1, . . . , Xn be a random sample from an exponential family of pdf’s

f(·, η), η ∈ N ⊂ Rk with f(x; η) = exp(η>T (x)−A(η)) · h(x). Assume that

N contains a k-dimensional open rectangle. Then,

(6) (i) the log-likelihood is strictly concave, and the unique MLE of η is

determined by the likelihood equation

n−1n∑i=1

T (xi) = A(η),

provided that it has a solution η ∈ N ;

(ii) the Fisher information is given by

I1(η) = A(η).

49/56

Proof of (6): Positive Definiteness of A(η)

Suppose that there exists c ≡ (c1, . . . , ck)> 6= 0 in Rk such that c>A(η0)c = 0

for some η0 ∈ N . Then, Pη0(c>T (X) = c>Eη0T (X)) = 1. This implies that

there exists a constant c0 such that

c>T (x) = c0 a.e. [µ] for x ∈ X = {x : h(x) > 0}.

Without loss of generality, assume c1 6= 0. Then,

T1(x) = c0/c1 − (c2/c1)T2 − · · · − (ck/c1)Tk

a.e. [µ] for x ∈ X = {x : h(x) > 0}, which gives that, for all η ∈ Rk,

η>T (x) = η>M>T−1(x) + (c0/c1)η1

a.e. [µ] for x ∈ X = {x : h(x) > 0},

50/56

Proof of (6): Continued

where

M> =

−(c2/c1) · · · −(ck/c1)

1 · · · 0

.... . .

...

0 · · · 1

, T−1(x) =

T2(x)

T3(x)

...

Tk(x)

.

Thus, it holds that for a.e. [µ]

f(x; η) = exp(

(Mη)>T−1(x) + (c0/c1)η1 −A(η))· h(x).

This implies

(c0/c1)η1 −A(η) = − log

∫X

exp((Mη)>T−1(x))h(x) dµ(x)let= −B(Mη)

so that f(x; η) = exp[(Mη)>T−1(x)−B(Mη)] · h(x). Clearly, η is not

identifiable. Contradiction.

51/56

Multinomial Experiments

Let Xi = (Xi,1, · · · , Xi,k−1)> be i.i.d. Multinomial(1, p),

p ≡ (p1, . . . , pk−1)>, pj > 0, p1 + · · ·+ pk−1 < 1.

I Let pk = 1− p1 − · · · − pk−1. Then, the common density of Xi is given by

f(x; p) = exp(x1 log(p1/pk) + · · ·+ xk−1 log(pk−1/pk) + log pk

),

so that the distributions of Xi form a (k − 1)-parameter regular

exponential family.

I Y =∑ni=1 Xi = (

∑ni=1 Xi,1, . . . ,

∑ni=1 Xi,k−1)> is a CSS for p.

I MLE of η: The MLE of η ≡ (log(p1/pk), . . . , log(pk−1/pk))>let= h(p)

solves the equation

Y/n = EηX1, i.e., Y/n = h−1(η).

Thus, the MLE of η is given by η = h(Y/n).

52/56

Multinomial Experiments

I MLE of p: The MLE of p is then p = h−1(η) = Y/n.

I UMVUE of p: Y/n is an UE of p and is a function of the CSS Y , so that

p = Y/n is also the UMVUE of p.

I UMVUE of Σ ≡ diag(p)− pp>: Here, an estimator Σ of Σ is called the

UMVUE of Σ if varp(ΣUE)− varp(Σ) is nonnegative definite for all ΣUE

and for all p, with Σ and ΣUE being the vectorized versions. Note that

ΣMLE = diag(Y/n)− (Y/n)(Y/n)>.

Computing the expected value of ΣMLE, we get

Ep(ΣMLE) = diag(p)− varp(Y/n)− Ep(Y/n)Ep(Y/n)>

= diag(p)− n−1Σ− pp> = (1− 1/n)Σ.

Thus, ΣUMVUE = n · ΣMLE/(n− 1).

53/56

Multivariate Normal Population

Let Xi = (Xi,1, · · · , Xi,k)> (n ≥ 2) be i.i.d. Normal(µ,Σ), µ ∈ Rk and Σ in

the set of k × k positive definite matrices.

I With θ ≡ (µ,Σ) ∈ Rd for d = k + k(k + 1)/2,

f(x; θ) = det(2πΣ)−1/2 exp(−(x− µ)tΣ−1(x− µ)/2

)= exp

(−tr(Σ−1xx>)/2 + µ>Σ−1x

−µ>Σ−1µ/2− log(detπΣ)).

I Y =(∑n

i=1 Xi,∑ni=1 XiX

>i

)is a CSS for θ, where

∑ni=1 XiX

>i is

understood to be a k(k + 1)/2-vector.

I MLE of η ≡ (Σ−1µ, Σ−1): It is the solution of

Y/n = Eη(X1, X1X>1 ), i.e., Y/n = (µ,Σ + µµ>)

let= g(µ,Σ).

Let h(µ,Σ) = (Σ−1µ,Σ−1). Then, ηMLE = h ◦ g−1(Y/n).

54/56

Multivariate Normal Population

I MLE of µ and Σ: The MLE of (µ,Σ) = h−1(η) is then

(µMLE, ΣMLE) = h−1 ◦ h ◦ g−1(Y/n) = g−1(Y/n).

By the definition of the function g : Rd → Rd, this means

n−1n∑i=1

Xi = µMLE and n−1n∑i=1

XiX>i = ΣMLE + µMLEµMLE>,

so that µMLE = X and ΣMLE = n−1∑ni=1(Xi − X)(Xi − X)>.

I UMVUE of µ and Σ: Since(X, (n− 1)−1∑n

i=1(Xi − X)(Xi − X)>)

is

a 1-1 function of Y , it is also a CSS for (µ,Σ). Since X and

(n− 1)−1∑ni=1(Xi − X)(Xi − X)> are UEs of µ and Σ, respectively,

they are the UMVUE of the respective parameters.

55/56

UMVUE of Normal Probabilities

Let X1, . . . , Xn (n ≥ 2) be a random sample from N(θ, 1), θ ∈ R. We want to

find the UMVUE of the normal probabilities η = Pθ(X1 ≤ c) = Φ(c− θ). Note

that X is a CSS for this model. An easy UE of η is I(−∞,c](X1). Thus, by

R-B-L-S Theorem, the UMVUE of η is given by

η = E(I(−∞,c](X1) | X) = P (X1 ≤ c | X). We compute P (X1 ≤ c | X = y)

below. Recall that (X1 − X, . . . , Xn − X) is an ancillary statistic and thus

independent of X (Basu Theorem).

P (X1 ≤ c | X = y) = P (X1 − X ≤ c− y|X = y)

= P (X1 − X ≤ c− y)

= Φ

(√n

n− 1(c− y)

).

Thus, the UMVUE of η = Φ(c− θ) is given by

η = Φ

(√n

n− 1(c− X)

).

56/56

Independence of Normal Sample Mean and Variance

Let X1, . . . , Xn (n ≥ 2) be a random sample from

N(µ, σ2), θ ≡ (µ, σ2) ∈ R× R+. Then, X and∑ni=1(Xi − X)2 are

independent under Pθ for all θ ∈ R× R+.

Proof: Fix σ2 = σ20 . Then, X is a CSS and (X1 − X, . . . , Xn − X) is an

ancillary statistic for the submodel Θ(σ20) ≡ {(µ, σ2

0) : µ ∈ R}. Thus, they are

independent under Pθ for all θ ∈ Θ(σ20), due to the Basu’s theorem. Since the

choice σ20 is arbitrary within R+, this implies that X and

(X1 − X, . . . , Xn − X) are independent under Pθ for all

θ ∈⋃

σ20∈R+

Θ(σ20) = R× R+.

The independence of X and∑ni=1(Xi − X)2 is immediate from this.

Mathematical Statistics 2 - Seoul National University · Mathematical Statistics 2 Chapter 7: Su...

Documents

Transcript of Mathematical Statistics 2 - Seoul National University · Mathematical Statistics 2 Chapter 7: Su...