Stat 709: Mathematical Statistics Lecture 23

12
beamer-tu-logo Lecture 20: Statistical inference A major approach in statistical analysis does not use any loss-risk. Three components in statistical inference Point estimators (Chapters 3-5) Hypothesis tests (Chapter 6) Confidence sets (Chapter 7) Point estimators Let T (X ) be an estimator of ϑ R Bias: b T (P )= E [T (X )] - ϑ Mean squared error (mse): mse T (P )= E [T (X ) - ϑ ] 2 =[b T (P )] 2 + Var(T (X )). Bias and mse are two common criteria for the performance of point estimators. Read Example 2.26 UW-Madison (Statistics) Stat 709 Lecture 20 2016 1 / 12

Transcript of Stat 709: Mathematical Statistics Lecture 23

Page 1: Stat 709: Mathematical Statistics Lecture 23

beamer-tu-logo

Lecture 20: Statistical inference

A major approach in statistical analysis does not use any loss-risk.

Three components in statistical inferencePoint estimators (Chapters 3-5)Hypothesis tests (Chapter 6)Confidence sets (Chapter 7)

Point estimatorsLet T (X ) be an estimator of ϑ ∈RBias: bT (P) = E [T (X )]−ϑ

Mean squared error (mse):

mseT (P) = E [T (X )−ϑ ]2 = [bT (P)]2 + Var(T (X )).

Bias and mse are two common criteria for the performance of pointestimators.

Read Example 2.26

UW-Madison (Statistics) Stat 709 Lecture 20 2016 1 / 12

Page 2: Stat 709: Mathematical Statistics Lecture 23

beamer-tu-logo

Hypothesis testsTo test the hypotheses

H0 : P ∈P0 versus H1 : P ∈P1,

there are two types of errors we may commit:rejecting H0 when H0 is true (called the type I error)and accepting H0 when H0 is wrong (called the type II error).

A test T : a statistic from X to {0,1}.

Probabilities of making two types of errorsType I error rate:

αT (P) = P(T (X ) = 1) P ∈P0

Type II error rate:

1−αT (P) = P(T (X ) = 0) P ∈P1,

αT (P) is also called the power function of TPower function is αT (θ) if P is in a parametric family indexed by θ .

UW-Madison (Statistics) Stat 709 Lecture 20 2016 2 / 12

Page 3: Stat 709: Mathematical Statistics Lecture 23

beamer-tu-logo

RemarksNote that these are risks of T under the 0-1 loss in statisticaldecision theory.Type I and type II error probabilities cannot be minimizedsimultaneously.These two error probabilities cannot be bounded simultaneouslyby a fixed α ∈ (0,1) when we have a sample of a fixed size.

Significance testsA common approach to finding an “optimal" test is to assign a smallbound α to the type I error rate αT (P), P ∈P0, and then to attempt tominimize the type II error rate 1−αT (P), P ∈P1, subject to

supP∈P0

αT (P)≤ α. (1)

The bound α is called the level of significance.The left-hand side of (1) is called the size of the test T .The level of significance should be positive, otherwise no test satisfies(1) except the silly test T (X )≡ 0 a.s. P.

UW-Madison (Statistics) Stat 709 Lecture 20 2016 3 / 12

Page 4: Stat 709: Mathematical Statistics Lecture 23

beamer-tu-logo

Example 2.28

Let X1, ...,Xn be i.i.d. from the N(µ,σ2) distribution with an unknownµ ∈R and a known σ2.Consider the hypotheses H0 : µ ≤ µ0 versus H1 : µ > µ0, where µ0 is afixed constant.Since the sample mean X̄ is sufficient for µ ∈R, it is reasonable toconsider the following class of tests: Tc(X ) = I(c,∞)(X̄ ), i.e., H0 isrejected (accepted) if X̄ > c (X̄ ≤ c), where c ∈R is a fixed constant.Let Φ be the c.d.f. of N(0,1).By the property of the normal distributions,

αTc (µ) = P(Tc(X ) = 1) = 1−Φ

(√n(c−µ)

σ

).

Figure 2.2 provides an example of a graph of two types of errorprobabilities, with µ0 = 0.Since Φ(t) is an increasing function of t ,

supP∈P0

αTc (µ) = 1−Φ

(√n(c−µ0)

σ

).

UW-Madison (Statistics) Stat 709 Lecture 20 2016 4 / 12

Page 5: Stat 709: Mathematical Statistics Lecture 23

beamer-tu-logo

Example 2.28 (continued)In fact, it is also true that

supP∈P1

[1−αTc (µ)] = Φ

(√n(c−µ0)

σ

).

If we would like to use an α as the level of significance, then the mosteffective way is to choose a cα (a test Tcα

(X )) such that

α = supP∈P0

αTcα(µ),

in which case cα must satisfy

1−Φ

(√n(cα −µ0)

σ

)= α,

i.e., cα = σz1−α/√

n + µ0, where za = Φ−1(a).In Chapter 6, it is shown that for any test T (X ) satisfyingsupP∈P0

αT (P)≤ α,

1−αT (µ)≥ 1−αTcα(µ), µ > µ0.

UW-Madison (Statistics) Stat 709 Lecture 20 2016 5 / 12

Page 6: Stat 709: Mathematical Statistics Lecture 23

beamer-tu-logo

Choice of significance levelThe choice of a level of significance α is usually somewhat subjective.In most applications there is no precise limit to the size of T that canbe tolerated.Standard values, 0.10, 0.05, and 0.01, are often used for convenience.For most tests satisfying supP∈P0

αT (P)≤ α, a small α leads to a“small" rejection region.

p-valueIt is good practice to determine not only whether H0 is rejected for agiven α and a chosen test Tα , but also the smallest possible level ofsignificance at which H0 would be rejected for the computed Tα (x),i.e.,

α̂ = inf{α ∈ (0,1) : Tα (x) = 1}.Such an α̂, which depends on x and the chosen test and is a statistic,is called the p-value for the test Tα .

UW-Madison (Statistics) Stat 709 Lecture 20 2016 6 / 12

Page 7: Stat 709: Mathematical Statistics Lecture 23

beamer-tu-logo

Example 2.29Let us calculate the p-value for Tcα

in Example 2.28.Note that

α = 1−Φ

(√n(cα −µ0)

σ

)> 1−Φ

(√n(x̄−µ0)

σ

)if and only if x̄ > cα (or Tcα

(x) = 1).Hence

1−Φ

(√n(x̄−µ0)

σ

)= inf{α ∈ (0,1) : Tcα

(x) = 1}= α̂(x)

is the p-value for Tcα.

It turns out that Tcα(x) = I(0,α)(α̂(x)).

RemarksWith the additional information provided by p-values, usingp-values is typically more appropriate than using fixed-level testsin a scientific problem.In some cases, a fixed level of significance is unavoidable whenacceptance or rejection of H0 is a required decision.

UW-Madison (Statistics) Stat 709 Lecture 20 2016 7 / 12

Page 8: Stat 709: Mathematical Statistics Lecture 23

beamer-tu-logo

Randomized testsIn Example 2.28, supP∈P0

αT (P) = α can always be achieved by asuitable choice of c.This is, however, not true in general.We need to consider randomized tests.Recall that a randomized decision rule is a probability measure δ (x , ·)on the action space for any fixed x .Since the action space contains only two points, 0 and 1, for ahypothesis testing problem, any randomized test δ (X ,A) is equivalentto a statistic T (X ) ∈ [0,1] with T (x) = δ (x ,{1}) and1−T (x) = δ (x ,{0}).A nonrandomized test is obviously a special case where T (x) does nottake any value in (0,1).For any randomized test T (X ), we define the type I error probability tobe αT (P) = E [T (X )], P ∈P0, and the type II error probability to be1−αT (P) = E [1−T (X )], P ∈P1.For a class of randomized tests, we would like to minimize 1−αT (P)subject to supP∈P0

αT (P) = α.UW-Madison (Statistics) Stat 709 Lecture 20 2016 8 / 12

Page 9: Stat 709: Mathematical Statistics Lecture 23

beamer-tu-logo

Example 2.30Assume that the sample X has the binomial distribution Bi(θ ,n) withan unknown θ ∈ (0,1) and a fixed integer n > 1.Consider the hypotheses H0 : θ ∈ (0,θ0] versus H1 : θ ∈ (θ0,1), whereθ0 ∈ (0,1) is a fixed value.Consider the following class of randomized tests:

Tj ,q(X ) =

1 X > jq X = j0 X < j ,

where j = 0,1, ...,n−1 and q ∈ [0,1].

αTj ,q (θ) = P(X > j) + qP(X = j) 0 < θ ≤ θ0

1−αTj ,q (θ) = P(X < j) + (1−q)P(X = j) θ0 < θ < 1.

It can be shown that for any α ∈ (0,1), there exist an integer j andq ∈ (0,1) such that the size of Tj ,q is α.

UW-Madison (Statistics) Stat 709 Lecture 20 2016 9 / 12

Page 10: Stat 709: Mathematical Statistics Lecture 23

beamer-tu-logo

Confidence setsϑ : a k -vector of unknown parameters related to the unknownpopulation P ∈PC(X ) a Borel set (in the range of ϑ ) depending only on the sample XIf

infP∈P

P(ϑ ∈ C(X ))≥ 1−α, (2)

where α is a fixed constant in (0,1), then C(X ) is called a confidenceset for ϑ with level of significance 1−α.The left-hand side of (2) is called the confidence coefficient of C(X ),which is the highest possible level of significance for C(X ).

A confidence set is a random element that covers the unknown ϑ withcertain probability.If (2) holds, then the coverage probability of C(X ) is at least 1−α,although C(x) either covers or does not cover ϑ whence we observeX = x .

UW-Madison (Statistics) Stat 709 Lecture 20 2016 10 / 12

Page 11: Stat 709: Mathematical Statistics Lecture 23

beamer-tu-logo

Example 2.32

Let X1, ...,Xn be i.i.d. from the N(µ,σ2) distribution with both µ ∈R andσ2 > 0 unknown.Let θ = (µ,σ2) and α ∈ (0,1) be given.Let X̄ be the sample mean and S2 be the sample variance.Since (X̄ ,S2) is sufficient (Example 2.15), we focus on C(X ) that is afunction of (X̄ ,S2).

From Example 2.18, X̄ and S2 are independent and (n−1)S2/σ2 hasthe chi-square distribution χ2

n−1.Since

√n(X̄ −µ)/σ has the N(0,1) distribution,

P(−c̃α ≤

X̄ −µ

σ/√

n≤ c̃α

)=√

1−α,

where c̃α = Φ−1(

1+√

1−α

2

)(verify).

Since the chi-square distribution χ2n−1 is a known distribution, we can

always find two constants c1α and c2α such that

UW-Madison (Statistics) Stat 709 Lecture 20 2016 11 / 12

Page 12: Stat 709: Mathematical Statistics Lecture 23

beamer-tu-logo

Example 2.32 (continued)

P(

c1α ≤(n−1)S2

σ2 ≤ c2α

)=√

1−α.

Then

P(−c̃α ≤

X̄ −µ

σ/√

n≤ c̃α ,c1α ≤

(n−1)S2

σ2 ≤ c2α

)= 1−α,

or

P(

n(X̄ −µ)2

c̃2α

≤ σ2,

(n−1)S2

c2α

≤ σ2 ≤ (n−1)S2

c1α

)= 1−α. (3)

The left-hand side of (3) defines a set in the range of θ = (µ,σ2)bounded by two straight lines, σ2 = (n−1)S2/ciα , i = 1,2, and a curveσ2 = n(X̄ −µ)2/c̃2

α (see the shadowed part of Figure 2.3).This set is a confidence set for θ with confidence coefficient 1−α,since (3) holds for any θ .

UW-Madison (Statistics) Stat 709 Lecture 20 2016 12 / 12