Unit3: Statistical Inferences
Wenyaw ChanDivision of BiostatisticsSchool of Public Health
University of Texas- Health Science Center at Houston
Estimation• Point Estimates
– A point estimate of a parameter θ is a single number used as an estimate of the value of θ.
– e.g. A natural estimate to use for estimating the population mean is the sample mean .
• Interval Estimation– If an random interval I=(L,U) satisfying Pr(L< θ <U)=1-
α, the observed values of L and U for a given sample is called a 1- α conference interval estimate for θ.
Which one is more accurate? Which one is more precise?
nXXn
ii /
1
__
Estimation
What to estimate?
• B(n, p) proportion• Poisson () mean• N(, σ2) mean and/or variance
Estimation of the Mean of a Distribution
• A point estimator of the population mean is sample mean.
• Sampling Distribution of is the distribution of values of over all possible
samples of size n that could have been selected from the reference population.
X
X
)(XE
Estimation
• An estimator of a parameter is unbiased estimator if its expectation is equal to the parameter.
• Note: The unbiasedness is not sufficient to be used as the only criterion for chosen an estimator.
• The unbiased estimator with the minimum variance (MVUE) is preferred.
• If the population is normal, then is the MVUE of .X
Sample Mean
• Standard error (of the mean) = standard deviation of the sample mean
• The estimated standard error
where s: sample standard deviation .
nn
2
ns
Central Limit Theorem
• Let X1,…,Xn be a random sample from some population with mean and variance σ2
Then, for large n,
nNX
2
,
Interval Estimation
• Let X1, ….Xn be a random sample from a normal population N(, σ2). If σ2 is known, a 95% confidence interval (C.I.) for is
why? (next slide)
nX
nX 96.1,96.1
Interval Estimation
2
If ~ , , then Pr 1.96 1.96 .95
. .
1.96 1.96
1.96 1.96
1.96 1.96
XX Nn
ni e
Xn n
X Xn n
X Xn n
Interval Estimation
Interpretation of Confidence Interval• Over the collection of 95% confidence
intervals that could be constructed from repeated random samples of size n, 95% of them will contain the parameter
• It is wrong to say:There is a 95% chance that the parameter will fall within a particular 95% confidence interval.
Interval Estimation
• Note: 1. When and n are fixed, 99% C.I. is wider than 95% C.I.2. If the width of the C.I. is specified, the sample size can be determined.n length length
Hypothesis Testing
• Null hypothesis(H0): the statement to be tested, usually reflecting the status quo.
• Alternative hypothesis (H1): the logical compliment of H0.
• Note: the null hypothesis is analogous to the
defendant in the court. It is presumed to be true unless the data argue overwhelmingly to the contrary.
Hypothesis Testing• Four possible outcomes of the decision:
• Notation: = Pr (Type I error) = level of significance = Pr (Type II error)1- = power= Pr(reject H0|H1 is true)
Truth
Ho H1
DecisionAccept H0 OK Type II error
Reject H0 Type I error OK
Hypothesis Testing
• Goal : to make and both small
• Facts: then then
• General Strategy:fix , minimize
Testing for the Population Mean• When the sample is from normal population
H0 : = 120 vs H1 : < 120• The best test is based on ,which is called the test
statistic. The "best test" means that the test has the highest power among all tests with a given type I error.
Is there any bad test? Yes. • Rejection Region:
– range of values of test statistic for which H0 is rejected.
X
One-tailed test
• Our rejection region is • Now,
X c
2
0
0
00
Pr( | )
Pr( | ~ ( , ))
( ) /
i.e. or //
Type I error Ho is true
X c X Nn
cn
c Z c Z nn
Result
• To test H0 : = 0 vs H1 : < 0, based on the samples taken from a normal population with mean and variance unknown,
the test statistic is . • Assume the level of significance is α then,
– if t < tn-1, α , then we reject H0.
– if t ≥ tn-1, α, then we do not reject H0.
t xs n
0
/
P-value• The minimum α-level at
which we can reject Ho based on the sample.
• P-value can also be thought as the probability of obtaining a test statistic as extreme as or more extreme than the actual test statistic obtained from the sample, given that the null hypothesis is true.
P value
Remarks
• Two different approaches on determining the statistical significance:– Critical value method– P-value method.
One-tailed test• Testing H0: µ= µ0 vs H1: µ > µ0 When unknown and population is normal
Test Statistic:
Rejection Region: t > tn-1,α
p-value = 1- Ft,n-1 (t), where Ft,n-1 ( ) is the cdf for t distribution with df=n-1. • Note: If is known, the s in test statistic will be replaced σ by and tn-1,α in rejection region will be replaced by zα , Ft,n-1 (t) will be replace by Ф(t).
t xs n
0
/
2
2
Testing For Two-Sided Alternative• Let X1,….,Xn be the random samples from the
population N(µ, σ²), where σ² is unknown.• H0 : µ=µ0 vs H1 : µ≠µ0
– Test Statistic:
– Rejection Region: |t|> tn-1,1-α/2
– p-value = 2*Ft,n-1 (t), if t<= 0. (see figures on next slide)
2*[1- Ft,n-1 (t)], if t > 0. • Warning: exact p-value requires use of computer.
t xs n
0
/
Testing For Two-Sided Alternative
P-value for X>U0 P-value for X<=U0
2Uo-x Uo x
if x> Uo
x Uo 2Uo-x
if x<= Uo
The Power of A Test
• To test H0 : µ=µ0 vs H1 : µ<µ0 in normal population with known variance σ², the power is
• Review : Power= Pr [rejecting H0 | H0 is false ]• Factors Affecting the Power
1. 2. 3. 4.
0 1[ ( - ) / ].Z n
powerZ || 10 power
power n power
The Power of The 1-Sample T Test
• To test H0 : µ=µ0 vs H1 : µ<µ0 in a normal
population with unknown variance σ², the power, for true mean µ1 and true s.d.= σ, is F(tn-1, .05), where F( ) is the c.d.f of non-central t distribution with df=n-1 and non-centrality 1 0 .n
n-1,0.05 n-1,0.95
2
Notes: 1. t = -t . 2. If X and Y are independent random variables such that Y~ N( ,1) and X ~ with d.f.=m, then Y/ (X/m) is said to have a non-central t distribution with
non-centrality .
Power Function For Two-Sided Alternative
• To test H0 : µ=µ0 vs H1 : µ≠µ0 in normal population with known variance σ², the power is
,where µ1 is true alternative.
1 2 0 1 1 2 1 0[- + ] [- + ]Z n Z n
Case of Unknown Variance
• For the same test with an unknown variance population, the power is F(-tn-1, 1-α/2) + 1- F(tn-1, 1-
α /2), where F( ) is the c.d.f of non-central t distribution with df=n-1 and non-centrality 1 0 .n
Sample Size Determination
For example: H0 : µ=µ0 vs H1 : µ<µ0
power :
Hence,01
10
-1 ]/)([
ifnZ
)() (
)() (
/)(
210
22-11
10
-1
-110
ZZn
ZZn
ZnZ
Factor Affecting Sample Size
1. 2. 3. 4. • To test H0 : µ=µ0 vs H1 : µ≠µ0, σ² is known. Sample size calculation is
2 n
n 1 n
|| 10 n
)() (2
10
22-12/1
ZZn
Relationship between Hypothesis Testing and Confidence Interval
• To test H0 : µ=µ0 vs H1 : µ≠µ0, H0 is rejected with a two-sided level α test if and only if the two-sided 100%*(1 - α) confidence interval for µ does not contain µ0.
One Sample Test for the Variance of A Normal Population
One Sample Test for A Proportion
Exact Method
• If p(hat) < p0, the p-value
• If p(hat) ≥ p0, the p-value
knk
k
PPkn
PnX
)1( 2
)],B(~X|nobservatio in the events of #Pr[2
00
events of #
0
0
knk PPkn
PnX
)1( 2
)],B(~X|nobservatio in the events of #Pr[2
00
n
events of #k
0
Power and Sample size
One-Sample Inference for the Poisson Distribution
• X ~ Poisson with mean μ• To test H0 : µ=µ0 vs H1 : µ≠µ0 at α level of
significance,– Obtain a two-sided 100(1- α)% C.I. for µ,
say (C1, C2)– If µ0 (C1, C2), we accept H0 otherwise reject H0.
One-Sample Inference for the Poisson Distribution
• The p-value (for above two-sided test)– If observed X < µ0, then
– If observed X > µ0,
Where F(x | µ0) is the Poisson c.d.f with mean = µ0.
]1),|(2min[ 0xFP
]1)),|1(1(2min[ 0 xFP
Large-Sample Test for Poisson (for µ0 ≥ 10)
• To test H0 : µ=µ0 vs H1 : µ≠µ0 at α level of significance,– Test Statistic:
– Rejection Region:
– p-value:
02
12
00
202 ~)1100/()( HunderSMRxX
21,1
2 X
2 21Pr X
Top Related