Stat 709: Mathematical Statistics Lecture 23
Transcript of Stat 709: Mathematical Statistics Lecture 23
beamer-tu-logo
Lecture 20: Statistical inference
A major approach in statistical analysis does not use any loss-risk.
Three components in statistical inferencePoint estimators (Chapters 3-5)Hypothesis tests (Chapter 6)Confidence sets (Chapter 7)
Point estimatorsLet T (X ) be an estimator of ϑ ∈RBias: bT (P) = E [T (X )]−ϑ
Mean squared error (mse):
mseT (P) = E [T (X )−ϑ ]2 = [bT (P)]2 + Var(T (X )).
Bias and mse are two common criteria for the performance of pointestimators.
Read Example 2.26
UW-Madison (Statistics) Stat 709 Lecture 20 2016 1 / 12
beamer-tu-logo
Hypothesis testsTo test the hypotheses
H0 : P ∈P0 versus H1 : P ∈P1,
there are two types of errors we may commit:rejecting H0 when H0 is true (called the type I error)and accepting H0 when H0 is wrong (called the type II error).
A test T : a statistic from X to {0,1}.
Probabilities of making two types of errorsType I error rate:
αT (P) = P(T (X ) = 1) P ∈P0
Type II error rate:
1−αT (P) = P(T (X ) = 0) P ∈P1,
αT (P) is also called the power function of TPower function is αT (θ) if P is in a parametric family indexed by θ .
UW-Madison (Statistics) Stat 709 Lecture 20 2016 2 / 12
beamer-tu-logo
RemarksNote that these are risks of T under the 0-1 loss in statisticaldecision theory.Type I and type II error probabilities cannot be minimizedsimultaneously.These two error probabilities cannot be bounded simultaneouslyby a fixed α ∈ (0,1) when we have a sample of a fixed size.
Significance testsA common approach to finding an “optimal" test is to assign a smallbound α to the type I error rate αT (P), P ∈P0, and then to attempt tominimize the type II error rate 1−αT (P), P ∈P1, subject to
supP∈P0
αT (P)≤ α. (1)
The bound α is called the level of significance.The left-hand side of (1) is called the size of the test T .The level of significance should be positive, otherwise no test satisfies(1) except the silly test T (X )≡ 0 a.s. P.
UW-Madison (Statistics) Stat 709 Lecture 20 2016 3 / 12
beamer-tu-logo
Example 2.28
Let X1, ...,Xn be i.i.d. from the N(µ,σ2) distribution with an unknownµ ∈R and a known σ2.Consider the hypotheses H0 : µ ≤ µ0 versus H1 : µ > µ0, where µ0 is afixed constant.Since the sample mean X̄ is sufficient for µ ∈R, it is reasonable toconsider the following class of tests: Tc(X ) = I(c,∞)(X̄ ), i.e., H0 isrejected (accepted) if X̄ > c (X̄ ≤ c), where c ∈R is a fixed constant.Let Φ be the c.d.f. of N(0,1).By the property of the normal distributions,
αTc (µ) = P(Tc(X ) = 1) = 1−Φ
(√n(c−µ)
σ
).
Figure 2.2 provides an example of a graph of two types of errorprobabilities, with µ0 = 0.Since Φ(t) is an increasing function of t ,
supP∈P0
αTc (µ) = 1−Φ
(√n(c−µ0)
σ
).
UW-Madison (Statistics) Stat 709 Lecture 20 2016 4 / 12
beamer-tu-logo
Example 2.28 (continued)In fact, it is also true that
supP∈P1
[1−αTc (µ)] = Φ
(√n(c−µ0)
σ
).
If we would like to use an α as the level of significance, then the mosteffective way is to choose a cα (a test Tcα
(X )) such that
α = supP∈P0
αTcα(µ),
in which case cα must satisfy
1−Φ
(√n(cα −µ0)
σ
)= α,
i.e., cα = σz1−α/√
n + µ0, where za = Φ−1(a).In Chapter 6, it is shown that for any test T (X ) satisfyingsupP∈P0
αT (P)≤ α,
1−αT (µ)≥ 1−αTcα(µ), µ > µ0.
UW-Madison (Statistics) Stat 709 Lecture 20 2016 5 / 12
beamer-tu-logo
Choice of significance levelThe choice of a level of significance α is usually somewhat subjective.In most applications there is no precise limit to the size of T that canbe tolerated.Standard values, 0.10, 0.05, and 0.01, are often used for convenience.For most tests satisfying supP∈P0
αT (P)≤ α, a small α leads to a“small" rejection region.
p-valueIt is good practice to determine not only whether H0 is rejected for agiven α and a chosen test Tα , but also the smallest possible level ofsignificance at which H0 would be rejected for the computed Tα (x),i.e.,
α̂ = inf{α ∈ (0,1) : Tα (x) = 1}.Such an α̂, which depends on x and the chosen test and is a statistic,is called the p-value for the test Tα .
UW-Madison (Statistics) Stat 709 Lecture 20 2016 6 / 12
beamer-tu-logo
Example 2.29Let us calculate the p-value for Tcα
in Example 2.28.Note that
α = 1−Φ
(√n(cα −µ0)
σ
)> 1−Φ
(√n(x̄−µ0)
σ
)if and only if x̄ > cα (or Tcα
(x) = 1).Hence
1−Φ
(√n(x̄−µ0)
σ
)= inf{α ∈ (0,1) : Tcα
(x) = 1}= α̂(x)
is the p-value for Tcα.
It turns out that Tcα(x) = I(0,α)(α̂(x)).
RemarksWith the additional information provided by p-values, usingp-values is typically more appropriate than using fixed-level testsin a scientific problem.In some cases, a fixed level of significance is unavoidable whenacceptance or rejection of H0 is a required decision.
UW-Madison (Statistics) Stat 709 Lecture 20 2016 7 / 12
beamer-tu-logo
Randomized testsIn Example 2.28, supP∈P0
αT (P) = α can always be achieved by asuitable choice of c.This is, however, not true in general.We need to consider randomized tests.Recall that a randomized decision rule is a probability measure δ (x , ·)on the action space for any fixed x .Since the action space contains only two points, 0 and 1, for ahypothesis testing problem, any randomized test δ (X ,A) is equivalentto a statistic T (X ) ∈ [0,1] with T (x) = δ (x ,{1}) and1−T (x) = δ (x ,{0}).A nonrandomized test is obviously a special case where T (x) does nottake any value in (0,1).For any randomized test T (X ), we define the type I error probability tobe αT (P) = E [T (X )], P ∈P0, and the type II error probability to be1−αT (P) = E [1−T (X )], P ∈P1.For a class of randomized tests, we would like to minimize 1−αT (P)subject to supP∈P0
αT (P) = α.UW-Madison (Statistics) Stat 709 Lecture 20 2016 8 / 12
beamer-tu-logo
Example 2.30Assume that the sample X has the binomial distribution Bi(θ ,n) withan unknown θ ∈ (0,1) and a fixed integer n > 1.Consider the hypotheses H0 : θ ∈ (0,θ0] versus H1 : θ ∈ (θ0,1), whereθ0 ∈ (0,1) is a fixed value.Consider the following class of randomized tests:
Tj ,q(X ) =
1 X > jq X = j0 X < j ,
where j = 0,1, ...,n−1 and q ∈ [0,1].
αTj ,q (θ) = P(X > j) + qP(X = j) 0 < θ ≤ θ0
1−αTj ,q (θ) = P(X < j) + (1−q)P(X = j) θ0 < θ < 1.
It can be shown that for any α ∈ (0,1), there exist an integer j andq ∈ (0,1) such that the size of Tj ,q is α.
UW-Madison (Statistics) Stat 709 Lecture 20 2016 9 / 12
beamer-tu-logo
Confidence setsϑ : a k -vector of unknown parameters related to the unknownpopulation P ∈PC(X ) a Borel set (in the range of ϑ ) depending only on the sample XIf
infP∈P
P(ϑ ∈ C(X ))≥ 1−α, (2)
where α is a fixed constant in (0,1), then C(X ) is called a confidenceset for ϑ with level of significance 1−α.The left-hand side of (2) is called the confidence coefficient of C(X ),which is the highest possible level of significance for C(X ).
A confidence set is a random element that covers the unknown ϑ withcertain probability.If (2) holds, then the coverage probability of C(X ) is at least 1−α,although C(x) either covers or does not cover ϑ whence we observeX = x .
UW-Madison (Statistics) Stat 709 Lecture 20 2016 10 / 12
beamer-tu-logo
Example 2.32
Let X1, ...,Xn be i.i.d. from the N(µ,σ2) distribution with both µ ∈R andσ2 > 0 unknown.Let θ = (µ,σ2) and α ∈ (0,1) be given.Let X̄ be the sample mean and S2 be the sample variance.Since (X̄ ,S2) is sufficient (Example 2.15), we focus on C(X ) that is afunction of (X̄ ,S2).
From Example 2.18, X̄ and S2 are independent and (n−1)S2/σ2 hasthe chi-square distribution χ2
n−1.Since
√n(X̄ −µ)/σ has the N(0,1) distribution,
P(−c̃α ≤
X̄ −µ
σ/√
n≤ c̃α
)=√
1−α,
where c̃α = Φ−1(
1+√
1−α
2
)(verify).
Since the chi-square distribution χ2n−1 is a known distribution, we can
always find two constants c1α and c2α such that
UW-Madison (Statistics) Stat 709 Lecture 20 2016 11 / 12
beamer-tu-logo
Example 2.32 (continued)
P(
c1α ≤(n−1)S2
σ2 ≤ c2α
)=√
1−α.
Then
P(−c̃α ≤
X̄ −µ
σ/√
n≤ c̃α ,c1α ≤
(n−1)S2
σ2 ≤ c2α
)= 1−α,
or
P(
n(X̄ −µ)2
c̃2α
≤ σ2,
(n−1)S2
c2α
≤ σ2 ≤ (n−1)S2
c1α
)= 1−α. (3)
The left-hand side of (3) defines a set in the range of θ = (µ,σ2)bounded by two straight lines, σ2 = (n−1)S2/ciα , i = 1,2, and a curveσ2 = n(X̄ −µ)2/c̃2
α (see the shadowed part of Figure 2.3).This set is a confidence set for θ with confidence coefficient 1−α,since (3) holds for any θ .
UW-Madison (Statistics) Stat 709 Lecture 20 2016 12 / 12