BS2a - University of Oxford

58
Lecture 2 - Sufficiency, Factorization Theorem, Minimal sufficiency Jonathan Marchini (University of Oxford) BS2a MT 2013 1 / 15

Transcript of BS2a - University of Oxford

Lecture 2 - Sufficiency, Factorization Theorem, Minimal sufficiency

Jonathan Marchini (University of Oxford) BS2a MT 2013 1 / 15

Sufficient statistics

Let X1, . . . ,Xn be a random sample from f (x ; θ).

Definition 2 A statistic T (X1, . . . ,Xn) is a function of the data that doesnot depend on unknown parameters.

Definition 3 A statistic T (X1, . . . ,Xn) is said to be sufficient for θ if theconditional distribution of X1, . . . ,Xn, given T , does not depend on θ.That is,

g(x | t , θ) = g(x | t)

Comment The definition says that a sufficient statistic T contains allthe information there is in the sample about θ.

Jonathan Marchini (University of Oxford) BS2a MT 2013 2 / 15

Sufficient statistics

Let X1, . . . ,Xn be a random sample from f (x ; θ).

Definition 2 A statistic T (X1, . . . ,Xn) is a function of the data that doesnot depend on unknown parameters.

Definition 3 A statistic T (X1, . . . ,Xn) is said to be sufficient for θ if theconditional distribution of X1, . . . ,Xn, given T , does not depend on θ.That is,

g(x | t , θ) = g(x | t)

Comment The definition says that a sufficient statistic T contains allthe information there is in the sample about θ.

Jonathan Marchini (University of Oxford) BS2a MT 2013 2 / 15

Sufficient statistics

Let X1, . . . ,Xn be a random sample from f (x ; θ).

Definition 2 A statistic T (X1, . . . ,Xn) is a function of the data that doesnot depend on unknown parameters.

Definition 3 A statistic T (X1, . . . ,Xn) is said to be sufficient for θ if theconditional distribution of X1, . . . ,Xn, given T , does not depend on θ.That is,

g(x | t , θ) = g(x | t)

Comment The definition says that a sufficient statistic T contains allthe information there is in the sample about θ.

Jonathan Marchini (University of Oxford) BS2a MT 2013 2 / 15

Sufficient statistics

Let X1, . . . ,Xn be a random sample from f (x ; θ).

Definition 2 A statistic T (X1, . . . ,Xn) is a function of the data that doesnot depend on unknown parameters.

Definition 3 A statistic T (X1, . . . ,Xn) is said to be sufficient for θ if theconditional distribution of X1, . . . ,Xn, given T , does not depend on θ.That is,

g(x | t , θ) = g(x | t)

Comment The definition says that a sufficient statistic T contains allthe information there is in the sample about θ.

Jonathan Marchini (University of Oxford) BS2a MT 2013 2 / 15

Example 7

n independent trials where the probability of success is p.

Let X1, . . . ,Xn be indicator variables which are 1 or 0 depending if thetrial is a success or failure.Let T =

∑ni=1 Xi . The conditional distribution of X1, . . . ,Xn given T = t

is

g(x1, . . . , xn | t ,p) =f (x1, . . . , xn, t | p)

h(t | p)=

∏ni=1 pxi (1− p)1−xi(n

t

)pt(1− p)n−t

=pt(1− p)n−t(nt

)pt(1− p)n−t

=

(nt

)−1

,

not depending on p, so T is sufficient for p.Comment Make sense, since no information in the order.

Jonathan Marchini (University of Oxford) BS2a MT 2013 3 / 15

Example 7

n independent trials where the probability of success is p.Let X1, . . . ,Xn be indicator variables which are 1 or 0 depending if thetrial is a success or failure.

Let T =∑n

i=1 Xi . The conditional distribution of X1, . . . ,Xn given T = tis

g(x1, . . . , xn | t ,p) =f (x1, . . . , xn, t | p)

h(t | p)=

∏ni=1 pxi (1− p)1−xi(n

t

)pt(1− p)n−t

=pt(1− p)n−t(nt

)pt(1− p)n−t

=

(nt

)−1

,

not depending on p, so T is sufficient for p.Comment Make sense, since no information in the order.

Jonathan Marchini (University of Oxford) BS2a MT 2013 3 / 15

Example 7

n independent trials where the probability of success is p.Let X1, . . . ,Xn be indicator variables which are 1 or 0 depending if thetrial is a success or failure.Let T =

∑ni=1 Xi .

The conditional distribution of X1, . . . ,Xn given T = tis

g(x1, . . . , xn | t ,p) =f (x1, . . . , xn, t | p)

h(t | p)=

∏ni=1 pxi (1− p)1−xi(n

t

)pt(1− p)n−t

=pt(1− p)n−t(nt

)pt(1− p)n−t

=

(nt

)−1

,

not depending on p, so T is sufficient for p.Comment Make sense, since no information in the order.

Jonathan Marchini (University of Oxford) BS2a MT 2013 3 / 15

Example 7

n independent trials where the probability of success is p.Let X1, . . . ,Xn be indicator variables which are 1 or 0 depending if thetrial is a success or failure.Let T =

∑ni=1 Xi . The conditional distribution of X1, . . . ,Xn given T = t

is

g(x1, . . . , xn | t ,p) =f (x1, . . . , xn, t | p)

h(t | p)

=

∏ni=1 pxi (1− p)1−xi(n

t

)pt(1− p)n−t

=pt(1− p)n−t(nt

)pt(1− p)n−t

=

(nt

)−1

,

not depending on p, so T is sufficient for p.Comment Make sense, since no information in the order.

Jonathan Marchini (University of Oxford) BS2a MT 2013 3 / 15

Example 7

n independent trials where the probability of success is p.Let X1, . . . ,Xn be indicator variables which are 1 or 0 depending if thetrial is a success or failure.Let T =

∑ni=1 Xi . The conditional distribution of X1, . . . ,Xn given T = t

is

g(x1, . . . , xn | t ,p) =f (x1, . . . , xn, t | p)

h(t | p)=

∏ni=1 pxi (1− p)1−xi(n

t

)pt(1− p)n−t

=pt(1− p)n−t(nt

)pt(1− p)n−t

=

(nt

)−1

,

not depending on p, so T is sufficient for p.Comment Make sense, since no information in the order.

Jonathan Marchini (University of Oxford) BS2a MT 2013 3 / 15

Example 7

n independent trials where the probability of success is p.Let X1, . . . ,Xn be indicator variables which are 1 or 0 depending if thetrial is a success or failure.Let T =

∑ni=1 Xi . The conditional distribution of X1, . . . ,Xn given T = t

is

g(x1, . . . , xn | t ,p) =f (x1, . . . , xn, t | p)

h(t | p)=

∏ni=1 pxi (1− p)1−xi(n

t

)pt(1− p)n−t

=pt(1− p)n−t(nt

)pt(1− p)n−t

=

(nt

)−1

,

not depending on p, so T is sufficient for p.Comment Make sense, since no information in the order.

Jonathan Marchini (University of Oxford) BS2a MT 2013 3 / 15

Example 7

n independent trials where the probability of success is p.Let X1, . . . ,Xn be indicator variables which are 1 or 0 depending if thetrial is a success or failure.Let T =

∑ni=1 Xi . The conditional distribution of X1, . . . ,Xn given T = t

is

g(x1, . . . , xn | t ,p) =f (x1, . . . , xn, t | p)

h(t | p)=

∏ni=1 pxi (1− p)1−xi(n

t

)pt(1− p)n−t

=pt(1− p)n−t(nt

)pt(1− p)n−t

=

(nt

)−1

,

not depending on p, so T is sufficient for p.

Comment Make sense, since no information in the order.

Jonathan Marchini (University of Oxford) BS2a MT 2013 3 / 15

Example 7

n independent trials where the probability of success is p.Let X1, . . . ,Xn be indicator variables which are 1 or 0 depending if thetrial is a success or failure.Let T =

∑ni=1 Xi . The conditional distribution of X1, . . . ,Xn given T = t

is

g(x1, . . . , xn | t ,p) =f (x1, . . . , xn, t | p)

h(t | p)=

∏ni=1 pxi (1− p)1−xi(n

t

)pt(1− p)n−t

=pt(1− p)n−t(nt

)pt(1− p)n−t

=

(nt

)−1

,

not depending on p, so T is sufficient for p.Comment Make sense, since no information in the order.

Jonathan Marchini (University of Oxford) BS2a MT 2013 3 / 15

Theorem 1 : Factorization Criterion

T (X1, . . . ,Xn) is a sufficient statistic for θ if and only if there exist twonon-negative functions K1,K2 such that the likelihood function L(θ; x)can be written

L(θ; x) = K1[t(x1, . . . , xn); θ]K2[x1, . . . , xn],

where K1 depends only on the sample through T , and K2 does notdepend on θ.

Jonathan Marchini (University of Oxford) BS2a MT 2013 4 / 15

Proof - for discrete random variables

1. Assume that T is sufficient, then the distribution of the sample is

L(θ; x) = f (x |θ) =

f (x , t |θ) = g(x | t , θ)h(t | θ)

T is sufficient which implies

g(x | t , θ) = g(x | t)

h(t | θ) depends on x through t(x) only so

L(θ; x) = g(x | t)h(t | θ)

We set L(θ; x) = K1K2, where K1 ≡ h, K2 ≡ g.

Jonathan Marchini (University of Oxford) BS2a MT 2013 5 / 15

Proof - for discrete random variables

1. Assume that T is sufficient, then the distribution of the sample is

L(θ; x) = f (x |θ) = f (x , t |θ) =

g(x | t , θ)h(t | θ)

T is sufficient which implies

g(x | t , θ) = g(x | t)

h(t | θ) depends on x through t(x) only so

L(θ; x) = g(x | t)h(t | θ)

We set L(θ; x) = K1K2, where K1 ≡ h, K2 ≡ g.

Jonathan Marchini (University of Oxford) BS2a MT 2013 5 / 15

Proof - for discrete random variables

1. Assume that T is sufficient, then the distribution of the sample is

L(θ; x) = f (x |θ) = f (x , t |θ) = g(x | t , θ)h(t | θ)

T is sufficient which implies

g(x | t , θ) = g(x | t)

h(t | θ) depends on x through t(x) only so

L(θ; x) = g(x | t)h(t | θ)

We set L(θ; x) = K1K2, where K1 ≡ h, K2 ≡ g.

Jonathan Marchini (University of Oxford) BS2a MT 2013 5 / 15

Proof - for discrete random variables

1. Assume that T is sufficient, then the distribution of the sample is

L(θ; x) = f (x |θ) = f (x , t |θ) = g(x | t , θ)h(t | θ)

T is sufficient which implies

g(x | t , θ) = g(x | t)

h(t | θ) depends on x through t(x) only so

L(θ; x) = g(x | t)h(t | θ)

We set L(θ; x) = K1K2, where K1 ≡ h, K2 ≡ g.

Jonathan Marchini (University of Oxford) BS2a MT 2013 5 / 15

Proof - for discrete random variables

1. Assume that T is sufficient, then the distribution of the sample is

L(θ; x) = f (x |θ) = f (x , t |θ) = g(x | t , θ)h(t | θ)

T is sufficient which implies

g(x | t , θ) = g(x | t)

h(t | θ) depends on x through t(x) only so

L(θ; x) = g(x | t)h(t | θ)

We set L(θ; x) = K1K2, where K1 ≡ h, K2 ≡ g.

Jonathan Marchini (University of Oxford) BS2a MT 2013 5 / 15

Proof - for discrete random variables

1. Assume that T is sufficient, then the distribution of the sample is

L(θ; x) = f (x |θ) = f (x , t |θ) = g(x | t , θ)h(t | θ)

T is sufficient which implies

g(x | t , θ) = g(x | t)

h(t | θ) depends on x through t(x) only so

L(θ; x) = g(x | t)h(t | θ)

We set L(θ; x) = K1K2, where K1 ≡ h, K2 ≡ g.

Jonathan Marchini (University of Oxford) BS2a MT 2013 5 / 15

2. Suppose L(θ; x) = f (x | θ) = K1[t ; θ]K2[x ].

Then

h(t | θ) =∑

{x :T (x)=t}

f (x , t | θ) =∑

{x :T (x)=t}

L(θ; x)

= K1[t ; θ]∑

{x :T (x)=t}

K2(x).

Thus

g(x | t , θ) = f (x , t | θ)h(t | θ)

=L(θ; x)h(t | θ)

=K2[x ]∑

{x :T (x)=t} K2(x),

not depending on θ. (K1 cancels out in numerator and denominator.)

Jonathan Marchini (University of Oxford) BS2a MT 2013 6 / 15

2. Suppose L(θ; x) = f (x | θ) = K1[t ; θ]K2[x ].Then

h(t | θ) =∑

{x :T (x)=t}

f (x , t | θ) =∑

{x :T (x)=t}

L(θ; x)

= K1[t ; θ]∑

{x :T (x)=t}

K2(x).

Thus

g(x | t , θ) = f (x , t | θ)h(t | θ)

=L(θ; x)h(t | θ)

=K2[x ]∑

{x :T (x)=t} K2(x),

not depending on θ. (K1 cancels out in numerator and denominator.)

Jonathan Marchini (University of Oxford) BS2a MT 2013 6 / 15

2. Suppose L(θ; x) = f (x | θ) = K1[t ; θ]K2[x ].Then

h(t | θ) =∑

{x :T (x)=t}

f (x , t | θ) =∑

{x :T (x)=t}

L(θ; x)

= K1[t ; θ]∑

{x :T (x)=t}

K2(x).

Thus

g(x | t , θ) = f (x , t | θ)h(t | θ)

=L(θ; x)h(t | θ)

=K2[x ]∑

{x :T (x)=t} K2(x),

not depending on θ. (K1 cancels out in numerator and denominator.)

Jonathan Marchini (University of Oxford) BS2a MT 2013 6 / 15

2. Suppose L(θ; x) = f (x | θ) = K1[t ; θ]K2[x ].Then

h(t | θ) =∑

{x :T (x)=t}

f (x , t | θ) =∑

{x :T (x)=t}

L(θ; x)

= K1[t ; θ]∑

{x :T (x)=t}

K2(x).

Thus

g(x | t , θ) = f (x , t | θ)h(t | θ)

=L(θ; x)h(t | θ)

=K2[x ]∑

{x :T (x)=t} K2(x),

not depending on θ. (K1 cancels out in numerator and denominator.)

Jonathan Marchini (University of Oxford) BS2a MT 2013 6 / 15

Minimal sufficiency

In general, we can envisage a sequence of functionsT1(X ),T2(X ),T3(X ), . . . such that Tj(X ) is a function of Tj−1(X ) (j ≥ 2)where moving along the sequence gains progressive reductions ofdata, and if Tj(X ) is sufficient for θ, then so is Ti(X ) i < j . Typicallythere will be a last member Tk (X ) that is sufficient, with Ti(X ) notsufficient for i > k . Then Tk (X ) is minimal sufficient

Example 7 (cont.) Consider n = 3 Bernoulli trials1 T1(X ) = (X1,X2,X3) (the individual sequences of trials)2 T2(X ) = (X1,

∑3i=1 Xi) (the 1st random variable and the total sum).

3 T3(X ) =∑3

i=1 Xi (the total sum)4 T4(X ) = I(T3(X ) = 0) (I is indicator function; Exercise Prove T4

not sufficient)

Definition 4 A statistic is minimal sufficient if it can be expressed as afunction of every other sufficient statistic.

Jonathan Marchini (University of Oxford) BS2a MT 2013 7 / 15

Minimal sufficiency

In general, we can envisage a sequence of functionsT1(X ),T2(X ),T3(X ), . . . such that Tj(X ) is a function of Tj−1(X ) (j ≥ 2)where moving along the sequence gains progressive reductions ofdata, and if Tj(X ) is sufficient for θ, then so is Ti(X ) i < j . Typicallythere will be a last member Tk (X ) that is sufficient, with Ti(X ) notsufficient for i > k . Then Tk (X ) is minimal sufficient

Example 7 (cont.) Consider n = 3 Bernoulli trials1 T1(X ) = (X1,X2,X3) (the individual sequences of trials)

2 T2(X ) = (X1,∑3

i=1 Xi) (the 1st random variable and the total sum).3 T3(X ) =

∑3i=1 Xi (the total sum)

4 T4(X ) = I(T3(X ) = 0) (I is indicator function; Exercise Prove T4not sufficient)

Definition 4 A statistic is minimal sufficient if it can be expressed as afunction of every other sufficient statistic.

Jonathan Marchini (University of Oxford) BS2a MT 2013 7 / 15

Minimal sufficiency

In general, we can envisage a sequence of functionsT1(X ),T2(X ),T3(X ), . . . such that Tj(X ) is a function of Tj−1(X ) (j ≥ 2)where moving along the sequence gains progressive reductions ofdata, and if Tj(X ) is sufficient for θ, then so is Ti(X ) i < j . Typicallythere will be a last member Tk (X ) that is sufficient, with Ti(X ) notsufficient for i > k . Then Tk (X ) is minimal sufficient

Example 7 (cont.) Consider n = 3 Bernoulli trials1 T1(X ) = (X1,X2,X3) (the individual sequences of trials)2 T2(X ) = (X1,

∑3i=1 Xi) (the 1st random variable and the total sum).

3 T3(X ) =∑3

i=1 Xi (the total sum)4 T4(X ) = I(T3(X ) = 0) (I is indicator function; Exercise Prove T4

not sufficient)

Definition 4 A statistic is minimal sufficient if it can be expressed as afunction of every other sufficient statistic.

Jonathan Marchini (University of Oxford) BS2a MT 2013 7 / 15

Minimal sufficiency

In general, we can envisage a sequence of functionsT1(X ),T2(X ),T3(X ), . . . such that Tj(X ) is a function of Tj−1(X ) (j ≥ 2)where moving along the sequence gains progressive reductions ofdata, and if Tj(X ) is sufficient for θ, then so is Ti(X ) i < j . Typicallythere will be a last member Tk (X ) that is sufficient, with Ti(X ) notsufficient for i > k . Then Tk (X ) is minimal sufficient

Example 7 (cont.) Consider n = 3 Bernoulli trials1 T1(X ) = (X1,X2,X3) (the individual sequences of trials)2 T2(X ) = (X1,

∑3i=1 Xi) (the 1st random variable and the total sum).

3 T3(X ) =∑3

i=1 Xi (the total sum)

4 T4(X ) = I(T3(X ) = 0) (I is indicator function; Exercise Prove T4not sufficient)

Definition 4 A statistic is minimal sufficient if it can be expressed as afunction of every other sufficient statistic.

Jonathan Marchini (University of Oxford) BS2a MT 2013 7 / 15

Minimal sufficiency

In general, we can envisage a sequence of functionsT1(X ),T2(X ),T3(X ), . . . such that Tj(X ) is a function of Tj−1(X ) (j ≥ 2)where moving along the sequence gains progressive reductions ofdata, and if Tj(X ) is sufficient for θ, then so is Ti(X ) i < j . Typicallythere will be a last member Tk (X ) that is sufficient, with Ti(X ) notsufficient for i > k . Then Tk (X ) is minimal sufficient

Example 7 (cont.) Consider n = 3 Bernoulli trials1 T1(X ) = (X1,X2,X3) (the individual sequences of trials)2 T2(X ) = (X1,

∑3i=1 Xi) (the 1st random variable and the total sum).

3 T3(X ) =∑3

i=1 Xi (the total sum)4 T4(X ) = I(T3(X ) = 0) (I is indicator function; Exercise Prove T4

not sufficient)

Definition 4 A statistic is minimal sufficient if it can be expressed as afunction of every other sufficient statistic.

Jonathan Marchini (University of Oxford) BS2a MT 2013 7 / 15

Minimal sufficiency

In general, we can envisage a sequence of functionsT1(X ),T2(X ),T3(X ), . . . such that Tj(X ) is a function of Tj−1(X ) (j ≥ 2)where moving along the sequence gains progressive reductions ofdata, and if Tj(X ) is sufficient for θ, then so is Ti(X ) i < j . Typicallythere will be a last member Tk (X ) that is sufficient, with Ti(X ) notsufficient for i > k . Then Tk (X ) is minimal sufficient

Example 7 (cont.) Consider n = 3 Bernoulli trials1 T1(X ) = (X1,X2,X3) (the individual sequences of trials)2 T2(X ) = (X1,

∑3i=1 Xi) (the 1st random variable and the total sum).

3 T3(X ) =∑3

i=1 Xi (the total sum)4 T4(X ) = I(T3(X ) = 0) (I is indicator function; Exercise Prove T4

not sufficient)

Definition 4 A statistic is minimal sufficient if it can be expressed as afunction of every other sufficient statistic.

Jonathan Marchini (University of Oxford) BS2a MT 2013 7 / 15

Example 7 (cont.) : Minimal sufficiency

n Bernoulli trials with T =∑n

i=1 Xi .

Suppose T above is not minimalsufficient but another statistic U is MS.Then U can be given as afunction of T (and not vis versa or T is MS) and there exist t1 6= t2values of T so that U(t1) = U(t2)(ie T → U is many to one so U → Tis not a function, and we assume for the moment no other t makeU(t) = U(t1)).The event U = u is the event T ∈ {t1, t2}. Let x1, ...xncontain t1 successes. Then

g(x1, . . . , xn|u,p) = g(x1, . . . , xn|t1,p)P(t1|u,p)= g(x1, . . . , xn|t1)P(T = t1|T ∈ {t1, t2})

=pt1(1− p)n−t1(n

t1

)pt1(1− p)n−t1 +

(nt2

)pt2(1− p)n−t2

which depends on p, so U is not sufficient, a contradiction, and henceT must be MS (similar reasoning for multiple ti ).

Jonathan Marchini (University of Oxford) BS2a MT 2013 8 / 15

Example 7 (cont.) : Minimal sufficiency

n Bernoulli trials with T =∑n

i=1 Xi . Suppose T above is not minimalsufficient but another statistic U is MS.

Then U can be given as afunction of T (and not vis versa or T is MS) and there exist t1 6= t2values of T so that U(t1) = U(t2)(ie T → U is many to one so U → Tis not a function, and we assume for the moment no other t makeU(t) = U(t1)).The event U = u is the event T ∈ {t1, t2}. Let x1, ...xncontain t1 successes. Then

g(x1, . . . , xn|u,p) = g(x1, . . . , xn|t1,p)P(t1|u,p)= g(x1, . . . , xn|t1)P(T = t1|T ∈ {t1, t2})

=pt1(1− p)n−t1(n

t1

)pt1(1− p)n−t1 +

(nt2

)pt2(1− p)n−t2

which depends on p, so U is not sufficient, a contradiction, and henceT must be MS (similar reasoning for multiple ti ).

Jonathan Marchini (University of Oxford) BS2a MT 2013 8 / 15

Example 7 (cont.) : Minimal sufficiency

n Bernoulli trials with T =∑n

i=1 Xi . Suppose T above is not minimalsufficient but another statistic U is MS.Then U can be given as afunction of T

(and not vis versa or T is MS) and there exist t1 6= t2values of T so that U(t1) = U(t2)(ie T → U is many to one so U → Tis not a function, and we assume for the moment no other t makeU(t) = U(t1)).The event U = u is the event T ∈ {t1, t2}. Let x1, ...xncontain t1 successes. Then

g(x1, . . . , xn|u,p) = g(x1, . . . , xn|t1,p)P(t1|u,p)= g(x1, . . . , xn|t1)P(T = t1|T ∈ {t1, t2})

=pt1(1− p)n−t1(n

t1

)pt1(1− p)n−t1 +

(nt2

)pt2(1− p)n−t2

which depends on p, so U is not sufficient, a contradiction, and henceT must be MS (similar reasoning for multiple ti ).

Jonathan Marchini (University of Oxford) BS2a MT 2013 8 / 15

Example 7 (cont.) : Minimal sufficiency

n Bernoulli trials with T =∑n

i=1 Xi . Suppose T above is not minimalsufficient but another statistic U is MS.Then U can be given as afunction of T (and not vis versa or T is MS) and there exist t1 6= t2values of T so that U(t1) = U(t2)

(ie T → U is many to one so U → Tis not a function, and we assume for the moment no other t makeU(t) = U(t1)).The event U = u is the event T ∈ {t1, t2}. Let x1, ...xncontain t1 successes. Then

g(x1, . . . , xn|u,p) = g(x1, . . . , xn|t1,p)P(t1|u,p)= g(x1, . . . , xn|t1)P(T = t1|T ∈ {t1, t2})

=pt1(1− p)n−t1(n

t1

)pt1(1− p)n−t1 +

(nt2

)pt2(1− p)n−t2

which depends on p, so U is not sufficient, a contradiction, and henceT must be MS (similar reasoning for multiple ti ).

Jonathan Marchini (University of Oxford) BS2a MT 2013 8 / 15

Example 7 (cont.) : Minimal sufficiency

n Bernoulli trials with T =∑n

i=1 Xi . Suppose T above is not minimalsufficient but another statistic U is MS.Then U can be given as afunction of T (and not vis versa or T is MS) and there exist t1 6= t2values of T so that U(t1) = U(t2)(ie T → U is many to one so U → Tis not a function, and we assume for the moment no other t makeU(t) = U(t1)).

The event U = u is the event T ∈ {t1, t2}. Let x1, ...xncontain t1 successes. Then

g(x1, . . . , xn|u,p) = g(x1, . . . , xn|t1,p)P(t1|u,p)= g(x1, . . . , xn|t1)P(T = t1|T ∈ {t1, t2})

=pt1(1− p)n−t1(n

t1

)pt1(1− p)n−t1 +

(nt2

)pt2(1− p)n−t2

which depends on p, so U is not sufficient, a contradiction, and henceT must be MS (similar reasoning for multiple ti ).

Jonathan Marchini (University of Oxford) BS2a MT 2013 8 / 15

Example 7 (cont.) : Minimal sufficiency

n Bernoulli trials with T =∑n

i=1 Xi . Suppose T above is not minimalsufficient but another statistic U is MS.Then U can be given as afunction of T (and not vis versa or T is MS) and there exist t1 6= t2values of T so that U(t1) = U(t2)(ie T → U is many to one so U → Tis not a function, and we assume for the moment no other t makeU(t) = U(t1)).The event U = u is the event T ∈ {t1, t2}. Let x1, ...xncontain t1 successes.

Then

g(x1, . . . , xn|u,p) = g(x1, . . . , xn|t1,p)P(t1|u,p)= g(x1, . . . , xn|t1)P(T = t1|T ∈ {t1, t2})

=pt1(1− p)n−t1(n

t1

)pt1(1− p)n−t1 +

(nt2

)pt2(1− p)n−t2

which depends on p, so U is not sufficient, a contradiction, and henceT must be MS (similar reasoning for multiple ti ).

Jonathan Marchini (University of Oxford) BS2a MT 2013 8 / 15

Example 7 (cont.) : Minimal sufficiency

n Bernoulli trials with T =∑n

i=1 Xi . Suppose T above is not minimalsufficient but another statistic U is MS.Then U can be given as afunction of T (and not vis versa or T is MS) and there exist t1 6= t2values of T so that U(t1) = U(t2)(ie T → U is many to one so U → Tis not a function, and we assume for the moment no other t makeU(t) = U(t1)).The event U = u is the event T ∈ {t1, t2}. Let x1, ...xncontain t1 successes. Then

g(x1, . . . , xn|u,p) = g(x1, . . . , xn|t1,p)P(t1|u,p)

= g(x1, . . . , xn|t1)P(T = t1|T ∈ {t1, t2})

=pt1(1− p)n−t1(n

t1

)pt1(1− p)n−t1 +

(nt2

)pt2(1− p)n−t2

which depends on p, so U is not sufficient, a contradiction, and henceT must be MS (similar reasoning for multiple ti ).

Jonathan Marchini (University of Oxford) BS2a MT 2013 8 / 15

Example 7 (cont.) : Minimal sufficiency

n Bernoulli trials with T =∑n

i=1 Xi . Suppose T above is not minimalsufficient but another statistic U is MS.Then U can be given as afunction of T (and not vis versa or T is MS) and there exist t1 6= t2values of T so that U(t1) = U(t2)(ie T → U is many to one so U → Tis not a function, and we assume for the moment no other t makeU(t) = U(t1)).The event U = u is the event T ∈ {t1, t2}. Let x1, ...xncontain t1 successes. Then

g(x1, . . . , xn|u,p) = g(x1, . . . , xn|t1,p)P(t1|u,p)= g(x1, . . . , xn|t1)P(T = t1|T ∈ {t1, t2})

=pt1(1− p)n−t1(n

t1

)pt1(1− p)n−t1 +

(nt2

)pt2(1− p)n−t2

which depends on p, so U is not sufficient, a contradiction, and henceT must be MS (similar reasoning for multiple ti ).

Jonathan Marchini (University of Oxford) BS2a MT 2013 8 / 15

Example 7 (cont.) : Minimal sufficiency

n Bernoulli trials with T =∑n

i=1 Xi . Suppose T above is not minimalsufficient but another statistic U is MS.Then U can be given as afunction of T (and not vis versa or T is MS) and there exist t1 6= t2values of T so that U(t1) = U(t2)(ie T → U is many to one so U → Tis not a function, and we assume for the moment no other t makeU(t) = U(t1)).The event U = u is the event T ∈ {t1, t2}. Let x1, ...xncontain t1 successes. Then

g(x1, . . . , xn|u,p) = g(x1, . . . , xn|t1,p)P(t1|u,p)= g(x1, . . . , xn|t1)P(T = t1|T ∈ {t1, t2})

=pt1(1− p)n−t1(n

t1

)pt1(1− p)n−t1 +

(nt2

)pt2(1− p)n−t2

which depends on p, so U is not sufficient, a contradiction, and henceT must be MS (similar reasoning for multiple ti ).

Jonathan Marchini (University of Oxford) BS2a MT 2013 8 / 15

Minimal sufficiency and partitions of the sample space

Intuitively, a minimal sufficient statistic most efficiently captures allpossible information about the parameter θ.Any statistic T (X ) partitions the sample space into subsets and ineach subset T (X ) has constant value.Minimal sufficient statistics correspond to the coarsest possiblepartition of the sample space.In the example of n = 3 Bernoulli trials consider the following 4statistics and the partitions they induce.

Jonathan Marchini (University of Oxford) BS2a MT 2013 9 / 15

Minimal sufficiency and partitions of the sample space

Intuitively, a minimal sufficient statistic most efficiently captures allpossible information about the parameter θ.

Any statistic T (X ) partitions the sample space into subsets and ineach subset T (X ) has constant value.Minimal sufficient statistics correspond to the coarsest possiblepartition of the sample space.In the example of n = 3 Bernoulli trials consider the following 4statistics and the partitions they induce.

Jonathan Marchini (University of Oxford) BS2a MT 2013 9 / 15

Minimal sufficiency and partitions of the sample space

Intuitively, a minimal sufficient statistic most efficiently captures allpossible information about the parameter θ.Any statistic T (X ) partitions the sample space into subsets and ineach subset T (X ) has constant value.

Minimal sufficient statistics correspond to the coarsest possiblepartition of the sample space.In the example of n = 3 Bernoulli trials consider the following 4statistics and the partitions they induce.

Jonathan Marchini (University of Oxford) BS2a MT 2013 9 / 15

Minimal sufficiency and partitions of the sample space

Intuitively, a minimal sufficient statistic most efficiently captures allpossible information about the parameter θ.Any statistic T (X ) partitions the sample space into subsets and ineach subset T (X ) has constant value.Minimal sufficient statistics correspond to the coarsest possiblepartition of the sample space.

In the example of n = 3 Bernoulli trials consider the following 4statistics and the partitions they induce.

Jonathan Marchini (University of Oxford) BS2a MT 2013 9 / 15

Minimal sufficiency and partitions of the sample space

Intuitively, a minimal sufficient statistic most efficiently captures allpossible information about the parameter θ.Any statistic T (X ) partitions the sample space into subsets and ineach subset T (X ) has constant value.Minimal sufficient statistics correspond to the coarsest possiblepartition of the sample space.In the example of n = 3 Bernoulli trials consider the following 4statistics and the partitions they induce.

Jonathan Marchini (University of Oxford) BS2a MT 2013 9 / 15

TTT  

TTH  

THT  

THH  

HTT  

HHT  

HTH  

HHH  

TTT  

TTH  

THT  

THH  

HTT  

HHT  

HTH  

HHH  

TTT  

TTH  

THT  

THH  

HTT  

HHT  

HTH  

HHH  

TTT  

TTH  

THT  

THH  

HTT  

HHT  

HTH  

HHH  

T1(X) = X1,X2,X3( )

T3(X) = Xii=1

3

∑ T4 (X) = I T3(X) = 0( )

T2 (X) = X1, Xii=1

3

∑⎛⎝⎜

⎞⎠⎟

Jonathan Marchini (University of Oxford) BS2a MT 2013 10 / 15

Lemma 1 : Lehmann-Scheffé partitions

Consider the partition of the sample space defined by putting x and yinto the same class of the partition if and only if

L(θ; y)/L(θ; x) = f (y | θ)/f (x | θ) = m(x , y).

Then any statistic corresponding to this partition is minimal sufficient.

Comment This Lemma tells us how to define partitions that correspondto minimal sufficient statistics. It says that ratios of likelihoods of twovalues x and y in the same partition (and hence same statistic value)should not depend on θ.

Jonathan Marchini (University of Oxford) BS2a MT 2013 11 / 15

Lemma 1 : Lehmann-Scheffé partitions

Consider the partition of the sample space defined by putting x and yinto the same class of the partition if and only if

L(θ; y)/L(θ; x) = f (y | θ)/f (x | θ) = m(x , y).

Then any statistic corresponding to this partition is minimal sufficient.

Comment This Lemma tells us how to define partitions that correspondto minimal sufficient statistics. It says that ratios of likelihoods of twovalues x and y in the same partition (and hence same statistic value)should not depend on θ.

Jonathan Marchini (University of Oxford) BS2a MT 2013 11 / 15

Proof (for discrete RVs)

1. Sufficiency.Suppose T is such a statistic

g(x |t , θ) = f (x | θ)f (t | θ)

=f (x | θ)∑

y∈τ f (y | θ), τ = {y : T (y) = t}

=f (x | θ)∑

y∈τ f (x | θ)m(x , y)

=[∑

y∈τm(x , y)

]−1

which does not depend on θ. Hence the partition D is sufficient.

Jonathan Marchini (University of Oxford) BS2a MT 2013 12 / 15

Proof (for discrete RVs)

1. Sufficiency.Suppose T is such a statistic

g(x |t , θ) = f (x | θ)f (t | θ)

=f (x | θ)∑

y∈τ f (y | θ), τ = {y : T (y) = t}

=f (x | θ)∑

y∈τ f (x | θ)m(x , y)

=[∑

y∈τm(x , y)

]−1

which does not depend on θ. Hence the partition D is sufficient.

Jonathan Marchini (University of Oxford) BS2a MT 2013 12 / 15

Proof (for discrete RVs)

1. Sufficiency.Suppose T is such a statistic

g(x |t , θ) = f (x | θ)f (t | θ)

=f (x | θ)∑

y∈τ f (y | θ), τ = {y : T (y) = t}

=f (x | θ)∑

y∈τ f (x | θ)m(x , y)

=[∑

y∈τm(x , y)

]−1

which does not depend on θ. Hence the partition D is sufficient.

Jonathan Marchini (University of Oxford) BS2a MT 2013 12 / 15

2. Minimal sufficiency.Now suppose U is any other sufficient statistic and that U(x) = U(y)for some pair of values (x , y).

If we can show that U(x) = U(y) impliesT (x) = T (y), then the Lehmann-Scheffé partition induced by Tincludes the partition based on any other sufficient statistic.In otherwords, T is a function of every other sufficient statistic, and so must beminimal sufficient.Since U is sufficient we have

L(θ; y)L(θ; x)

=K1[u(y); θ]K2[y ]K1[u(x); θ]K2[x ]

=K2[y ]K2[x ]

which does not depend on θ. So the statistic U produces a partition atleast as fine as that induced by T , and the result is proved.

Jonathan Marchini (University of Oxford) BS2a MT 2013 13 / 15

2. Minimal sufficiency.Now suppose U is any other sufficient statistic and that U(x) = U(y)for some pair of values (x , y). If we can show that U(x) = U(y) impliesT (x) = T (y), then the Lehmann-Scheffé partition induced by Tincludes the partition based on any other sufficient statistic.

In otherwords, T is a function of every other sufficient statistic, and so must beminimal sufficient.Since U is sufficient we have

L(θ; y)L(θ; x)

=K1[u(y); θ]K2[y ]K1[u(x); θ]K2[x ]

=K2[y ]K2[x ]

which does not depend on θ. So the statistic U produces a partition atleast as fine as that induced by T , and the result is proved.

Jonathan Marchini (University of Oxford) BS2a MT 2013 13 / 15

2. Minimal sufficiency.Now suppose U is any other sufficient statistic and that U(x) = U(y)for some pair of values (x , y). If we can show that U(x) = U(y) impliesT (x) = T (y), then the Lehmann-Scheffé partition induced by Tincludes the partition based on any other sufficient statistic.In otherwords, T is a function of every other sufficient statistic, and so must beminimal sufficient.

Since U is sufficient we have

L(θ; y)L(θ; x)

=K1[u(y); θ]K2[y ]K1[u(x); θ]K2[x ]

=K2[y ]K2[x ]

which does not depend on θ. So the statistic U produces a partition atleast as fine as that induced by T , and the result is proved.

Jonathan Marchini (University of Oxford) BS2a MT 2013 13 / 15

2. Minimal sufficiency.Now suppose U is any other sufficient statistic and that U(x) = U(y)for some pair of values (x , y). If we can show that U(x) = U(y) impliesT (x) = T (y), then the Lehmann-Scheffé partition induced by Tincludes the partition based on any other sufficient statistic.In otherwords, T is a function of every other sufficient statistic, and so must beminimal sufficient.Since U is sufficient we have

L(θ; y)L(θ; x)

=K1[u(y); θ]K2[y ]K1[u(x); θ]K2[x ]

=

K2[y ]K2[x ]

which does not depend on θ. So the statistic U produces a partition atleast as fine as that induced by T , and the result is proved.

Jonathan Marchini (University of Oxford) BS2a MT 2013 13 / 15

2. Minimal sufficiency.Now suppose U is any other sufficient statistic and that U(x) = U(y)for some pair of values (x , y). If we can show that U(x) = U(y) impliesT (x) = T (y), then the Lehmann-Scheffé partition induced by Tincludes the partition based on any other sufficient statistic.In otherwords, T is a function of every other sufficient statistic, and so must beminimal sufficient.Since U is sufficient we have

L(θ; y)L(θ; x)

=K1[u(y); θ]K2[y ]K1[u(x); θ]K2[x ]

=K2[y ]K2[x ]

which does not depend on θ.

So the statistic U produces a partition atleast as fine as that induced by T , and the result is proved.

Jonathan Marchini (University of Oxford) BS2a MT 2013 13 / 15

2. Minimal sufficiency.Now suppose U is any other sufficient statistic and that U(x) = U(y)for some pair of values (x , y). If we can show that U(x) = U(y) impliesT (x) = T (y), then the Lehmann-Scheffé partition induced by Tincludes the partition based on any other sufficient statistic.In otherwords, T is a function of every other sufficient statistic, and so must beminimal sufficient.Since U is sufficient we have

L(θ; y)L(θ; x)

=K1[u(y); θ]K2[y ]K1[u(x); θ]K2[x ]

=K2[y ]K2[x ]

which does not depend on θ. So the statistic U produces a partition atleast as fine as that induced by T , and the result is proved.

Jonathan Marchini (University of Oxford) BS2a MT 2013 13 / 15

Sufficiency in an exponential family

Random Sample X1, . . . ,XnLikelihood

L(θ; x) =n∏

i=1

f (xi ; θ)

=n∏

i=1

exp

k∑

j=1

Aj(θ)Bj(xi) + C(xi) + D(θ)

= exp

k∑

j=1

Aj(θ)

(n∑

i=1

Bj(xi)

)+ nD(θ) +

n∑i=1

C(xi)

.

Exponential family form again.

Jonathan Marchini (University of Oxford) BS2a MT 2013 14 / 15

Sufficiency in an exponential family

Suppose the family is in canonical form so φj = Aj(θ), and lettj =

∑ni=1 Bj(xi), C(x) =

∑ni=1 C(xi).

L(θ; x) = exp

k∑

j=1

φj tj + nD(θ) + C(x)

.

By the factorization criterion t1, . . . , tk are sufficient statistics forφ1, . . . , φk . In fact, we do not need canonical form. If

L(θ; x) = exp

k∑

j=1

Aj(θ)tj + nD(θ) + C(x)

is a minimal k -dimensional linear exponential family then (by theregularity conditions above) t1, . . . , tk are minimal sufficient forθ1, . . . , θk . Minimal sufficiency is verified using Lemma 1.

Jonathan Marchini (University of Oxford) BS2a MT 2013 15 / 15