Lecture4 Su cientStatistics. FactorizationTheorem · Su cientStatistics. FactorizationTheorem 1 Su...

Lecture 4Su�cient Statistics. Factorization Theorem

1 Su�cient statisticsLet f(x|θ) with θ ∈ Θ be some parametric family. Let X = (X1, ..., Xn) be a random sample from distributionf(x|θ). Suppose we would like to learn parameter value θ from our sample. The concept of su�cientstatistic allows us to separate information contained in X into two parts. One part contains all the valuableinformation as long as we are concerned with parameter θ, while the other part contains pure noise in thesense that this part has no valuable information. Thus, we can ignore the latter part.

De�nition 1. Statistic T (X) is su�cient for θ if the conditional distribution of X given T (X) does notdepend on θ.

Let T (X) be a su�cient statistic. Consider the pair (X,T (X)). Obviously, (X, T (X)) contains the sameinformation about θ as X alone, since T (X) is a function of X. But if we know T (X), then X itself hasno value for us since its conditional distribution given T (X) is independent of θ. Thus, by observing X (inaddition to T (X)), we cannot say whether one particular value of parameter θ is more likely than another.Therefore, once we know T (X), we can discard X completely.

Example Let X = (X1, ..., Xn) be a random sample from N(µ, σ2). Suppose that σ2 is known. Thus, theonly parameter is µ (θ = µ). We have already seen that T (X) = Xn ∼ N(µ, σ2/n). Let us calculate theconditional distribution of X given T (X) = t. First, note that

∑n n

(xi − µ)2 − n(xn − µ)2 =∑

(xi − x 2n + xn µ) n(xn µ)2

i=1 i=1

− − −n

=∑ n

(x )2i − xn + 2∑

(xi

i=1 i=1

− xn)(xn − µ)

n

=∑

(x 2i

i

− xn)=1

Cite as: Anna Mikusheva, course materials for 14.381 Statistical Methods in Economics, Fall 2011. MIT OpenCourseWare(http://ocw.mit.edu), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY].

http://ocw.mit.edu

Therefore

f (x)fX T (X)(x|T (X) = T (x)) = X

|fT (T (x))

exp{−∑ni=1(xi − µ)2/(2σ2)}/((2π)n/2σn)

=exp{−n(x − µ)2/(2σ2)}/((2π)1/2 1

n σ/n /2)∑n

= exp{− (x − x )2/(2σ2 (i n )}/((2π) n−1)/2σn−1/n1/2)

i=1

which is independent of µ. We conclude that T (X) = Xn is a su�cient statistic for our parametric family.Note, however, that Xn is not su�cient if σ2 is not known.

Example Let X = (X1, ..., Xn) be a random sample from a Poisson(λ) distribution. From Problem Set 1,we know that n

T =∑

i=1 Xi ∼ Poisson(nλ). So

n n n n n

f λ x λn ni xi

X| i=1T (x|T =

∑xi) = (e− λ /xi!)/(e− λ /( xi)!) = ( xi)!/ xi!

i=1 i

∏

=1

∑ ∑

i=1

∑

i=1 i

∏

=1

which is independent of nλ. We conclude that T =

∑i=1 Xi is a su�cient statistic in this case.

2 Factorization TheoremThe Factorization Theorem gives a general approach for how to �nd a su�cient statistic:

Theorem 2 (Factorization Theorem). Let f(x|θ) be the pdf of X. Then T (X) is a su�cient statistic if andonly if there exist functions g(t|θ) and h(x) such that f(x|θ) = g(T (x)|θ)h(x).

Proof. Let l(t|θ) be the pdf of T (X).Suppose T (X) is a su�cient statistic. Then fX T (X)(x|T (X) = T (x)) = f| X(x|θ)/l(T (x)|θ) does not

depend on θ. Denote it by h(x). Then f(x|θ) = l(T (x)|θ)h(x). Denoting l by g yields the result in onedirection.

In the other direction we will give a �sloppy� proof. Denote A(x) = {y : T (y) = T (x)}. Then

l(T (x)|θ) =∫

f(y|θ)dy =∫

g(T (y)|θ)h(y)dy = g(T (x)|θ)∫

h(y)dy.A(x) A(x) A(x)

So

f(x θ)fX T (X)(x X) = T (x))| |T ( =

|l(T (x)|θ)

g(T (x)=

|θ)h(x)g(T (x)|θ) ∫

hA(x)

(y)dy

h(x)= ,

hA(x

(y)dy)

∫

2

which is independent of θ. We conclude that T (X) is a su�cient statistic.

Example Let us show how to use the factorization theorem in practice. Let X1, ..., Xn be a random samplefrom N(µ, σ2) where both µ and σ2 are unknown, i.e. θ = (µ, σ2). Then

n

f(x|θ) = exp{−∑

(xi − µ)2/(2σ2) )=1

}/((2π n/2σn)i

n

= exp{−[∑ n

x2i − 2µ

∑x 2

i + nµ ]/(2σ2)}/((2π)n/2σn).i=1 i=1

Thus, T (X) = (∑n

i=1 X2 ni ,

∑i=1 Xi) is a su�cient statistic (here h(x) = 1 and g is the whole thing). Note

that in this example we actually have a pair of su�cient statistics. In addition, as we have seen before,

n

f(x|θ) = exp{−[∑

(xi − xn)2 + n(x − µ)2]/(2σ2)}/((2π)n/2 nn σ )

i=1

= exp{−[(n− 1)s2n + n(xn − µ)2]/(2σ2)}/((2π)n/2σn).

Thus, T (X) = (X 2n, sn) is another su�cient statistic. Yet another su�cient statistic is T (X) = (X1, ..., Xn).

Note that Xn is not su�cient in this example.

Example A less trivial example: let X1, ..., Xn be a random sample from U [θ, 1 + θ]. Then f(x|θ) = 1 ifθ ≤ mini Xi ≤ maxi Xi ≤ 1 + θ and 0 otherwise. In other words, f(x|θ) = I{θ ≤ X(1)}I{1 + θ ≥ X(n)}. SoT (X) = (X(1), X(n)) is su�cient.

3 Minimal Su�cient StatisticsCould we reduce su�cient statistic T (X) in the previous example even more? Suppose we have two statistics,say, T (X) and T ?(X). We say that T ? is not bigger than T if there exists some function r such thatT ?(X) = r(T (X)). In other words, we can calculate T ?(X) whenever we know T (X). In this case whenT ∗ changes its value, statistic T must change its value as well. In this sense T ∗ does not give less of aninformation reduction than T .

De�nition 3. A su�cient statistic T ?(X) is called minimal if for any su�cient statistic T (X) there existssome function r such that T ?(X) = r(T (X)).

Thus, in some sense, the minimal su�cient statistic gives us the greatest data reduction without a loss ofinformation about parameters. The following theorem gives a characterization of minimal su�cient statistics:

Theorem 4. Let f(x|θ) be the pdf of X and T (X) be such that, for any x, y, statement {f(x|θ)/f(y|θ) doesnot depend on θ} is equivalent to statement {T (x) = T (y)}. Then T (X) is minimal su�cient.

Proof. This is a �sloppy� proof again.

3

First, we show that the statistic described above is in fact a su�cient statistic. Let X be the space ofpossible values of x. We divide it into equivalence classes Ax. For any x, let Ax = {y ∈ X : T (y) = T (x)}.Then, for any x, y, Ax either coincides with Ay or Ax and Ay have no common elements. Thus, we canchoose subset X of X such that ∪x A∈X x = X and Ax has no common elements with Ay for any x, y ∈ X.Then T (x) = T (y) if x, y ∈ X and x = y. Then there is a function g of x ∈ X and θ ∈ Θ such thatf(x|θ) = g(T (x)|θ). Fix some θ ∈ Θ. For any x′ ∈ X , there is x(x′) ∈ X such that x′ ∈ Ax, i.e.T (x) = T (x′). Denote h(x′) = f(x′|θ)/f(x(x′)|θ). Then for any θ′ ∈ Θ, f(x′|θ′)/f(x(x′)|θ′) = h(x′). Sof(x′|θ′) = g(T (x′)|θ′)h(x′). Thus, T (X) is su�cient.

Let us now show that T (X) is actually minimal su�cient in the sense of De�nition 3. Take any othersu�cient statistic, T ?(X). Then there exist functions g? and h? such that f(x|θ) = g?(T ?(x)|θ)h?(x). IfT ?(x) = T ?(y) for some x, y, then

f(x|θ) g?(T ?(x)|θ)h?(x) h?(x)= = ,

f(y|θ) g?(T ?(y)|θ)h?(y) h?(y)

which is independent of θ. Thus T (x) = T (y) as well. So we can de�ne a function r such that T (X) =

r(T ?(X)).

Example Let us now go back to the example with X1, ..., Xn ∼ U [θ, 1 + θ]. Ratio f(x|θ)/f(y|θ) isindependent of θ if and only if x(1) = y(1) and x(n) = y(n) which is the case if and only if T (x) = T (y).Therefore T (X) = (X(1), X(n)) is minimal su�cient.

Example Let X1, ..., Xn be a random sample from the Cauchy distribution with parameter θ, i.e. thedistribution with the pdf f(x|θ) = 1/(π(x − θ)2). Then f(x1, ..., xn|θ) = 1/(πn n

i=1(xi − θ)2). By thetheorem above, T (X) = (X(1), ..., X(n)) is minimal su�cient.

∏

6 6

4

MIT OpenCourseWarehttp://ocw.mit.edu

14.381 Statistical Method in EconomicsFall 2013

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

http://ocw.mit.edu

http://ocw.mit.edu/terms

Lecture4 Su cientStatistics. FactorizationTheorem · Su cientStatistics. FactorizationTheorem 1 Su...

Documents

Transcript of Lecture4 Su cientStatistics. FactorizationTheorem · Su cientStatistics. FactorizationTheorem 1 Su...