CHAPTER 6 Statistical Inference & Hypothesis Testing
description
Transcript of CHAPTER 6 Statistical Inference & Hypothesis Testing
• 6.1 - One Sample
Mean μ, Variance σ 2, Proportion π
• 6.2 - Two Samples Means, Variances, Proportions μ1 vs. μ2 σ1
2 vs. σ22 π1 vs. π2
• 6.3 - Multiple Samples Means, Variances, Proportions μ1, …, μk σ1
2, …, σk2 π1, …, πk
CHAPTER 6 Statistical Inference & Hypothesis Testing
• 6.1 - One Sample
Mean μ, Variance σ 2, Proportion π
• 6.2 - Two Samples Means, Variances, Proportions μ1 vs. μ2 σ1
2 vs. σ22 π1 vs. π2
• 6.3 - Multiple Samples Means, Variances, Proportions μ1, …, μk σ1
2, …, σk2 π1, …, πk
CHAPTER 6 Statistical Inference & Hypothesis Testing
Consider two independent populations…
Null Hypothesis H0: μ1 = μ2, i.e., μ1 – μ2 = 0 (“No mean difference") Test at signif level α
POPULATION 1
and a random variable X, normally distributed in each.
POPULATION 2
Classic Example: “Randomized Clinical Trial”… Pop 1 = Treatment, Pop 2 = Control
X2 ~ N(μ2, σ2)
1
σ1
2
σ2
X1 ~ N(μ1, σ1)
Random Sample, size n1
Random Sample, size n2
Sampling Distribution =?
2X1X
μ0
Consider two independent populations…
Null Hypothesis H0: μ1 = μ2, i.e., μ1 – μ2 = 0 (“No mean difference") Test at signif level α
POPULATION 1
and a random variable X, normally distributed in each.
POPULATION 2
Classic Example: “Randomized Clinical Trial”… Pop 1 = Treatment, Pop 2 = Control
X2 ~ N(μ2, σ2)
1
σ1
2
σ2
X1 ~ N(μ1, σ1)
Random Sample, size n1
Random Sample, size n2
Sampling Distribution =?
2 2 2 2~ ,X N n
1 2 ~ ????X X
1 1 1 1~ ,X N n
μ0
Mean(X – Y) = Mean(X) – Mean(Y)
Consider two independent populations…
Null Hypothesis H0: μ1 = μ2, i.e., μ1 – μ2 = 0 (“No mean difference") Test at signif level α
POPULATION 1
and a random variable X, normally distributed in each.
POPULATION 2
Classic Example: “Randomized Clinical Trial”… Pop 1 = Treatment, Pop 2 = Control
X2 ~ N(μ2, σ2)
1
σ1
2
σ2
X1 ~ N(μ1, σ1)
Random Sample, size n1
Random Sample, size n2
Sampling Distribution =?
2 2 2 2~ ,X N n
1 2 ~ ????, ????X X N
1 1 1 1~ ,X N n
Recall from section 4.1 (Discrete Models):
and if X and Y are independent…
Var(X – Y) = Var(X) + Var(Y)
μ0
Consider two independent populations…
Null Hypothesis H0: μ1 = μ2, i.e., μ1 – μ2 = 0 (“No mean difference") Test at signif level α
POPULATION 1
and a random variable X, normally distributed in each.
POPULATION 2
Classic Example: “Randomized Clinical Trial”… Pop 1 = Treatment, Pop 2 = Control
X2 ~ N(μ2, σ2)
1
σ1
2
σ2
X1 ~ N(μ1, σ1)
Random Sample, size n1
Random Sample, size n2
Sampling Distribution =?
2 2 2 2~ ,X N n
1 2 1 2~ , ????X X N
1 1 1 1~ ,X N n
Recall from section 4.1 (Discrete Models):Mean(X – Y) = Mean(X) – Mean(Y)
and if X and Y are independent…
Var(X – Y) = Var(X) + Var(Y)
μ0
Consider two independent populations…
Null Hypothesis H0: μ1 = μ2, i.e., μ1 – μ2 = 0 (“No mean difference") Test at signif level α
POPULATION 1
and a random variable X, normally distributed in each.
POPULATION 2
Classic Example: “Randomized Clinical Trial”… Pop 1 = Treatment, Pop 2 = Control
X2 ~ N(μ2, σ2)
1
σ1
2
σ2
X1 ~ N(μ1, σ1)
Random Sample, size n1
Random Sample, size n2
Sampling Distribution =?
2 2 2 2~ ,X N n
1 2 1 2~ , ????X X N
1 1 1 1~ ,X N n
Recall from section 4.1 (Discrete Models):Mean(X – Y) = Mean(X) – Mean(Y)
and if X and Y are independent…
Var(X – Y) = Var(X) + Var(Y)
μ0
Consider two independent populations…
Null Hypothesis H0: μ1 = μ2, i.e., μ1 – μ2 = 0 (“No mean difference") Test at signif level α
POPULATION 1
and a random variable X, normally distributed in each.
POPULATION 2
Classic Example: “Randomized Clinical Trial”… Pop 1 = Treatment, Pop 2 = Control
X2 ~ N(μ2, σ2)
1
σ1
2
σ2
X1 ~ N(μ1, σ1)
Random Sample, size n1
Random Sample, size n2
Sampling Distribution =?
2 2 2 2~ ,X N n
21
1 2 1 21
~ ,X X Nn
1 1 1 1~ ,X N n
Recall from section 4.1 (Discrete Models):Mean(X – Y) = Mean(X) – Mean(Y)
and if X and Y are independent…
Var(X – Y) = Var(X) + Var(Y)
μ0
Consider two independent populations…
Null Hypothesis H0: μ1 = μ2, i.e., μ1 – μ2 = 0 (“No mean difference") Test at signif level α
POPULATION 1
and a random variable X, normally distributed in each.
POPULATION 2
Classic Example: “Randomized Clinical Trial”… Pop 1 = Treatment, Pop 2 = Control
X2 ~ N(μ2, σ2)
1
σ1
2
σ2
X1 ~ N(μ1, σ1)
Random Sample, size n1
Random Sample, size n2
Sampling Distribution =?
2 2 2 2~ ,X N n
2 21 2
1 2 1 21 2
~ ,X X Nn n
1 1 1 1~ ,X N n
Recall from section 4.1 (Discrete Models):Mean(X – Y) = Mean(X) – Mean(Y)
and if X and Y are independent…
Var(X – Y) = Var(X) + Var(Y)
μ0
Consider two independent populations…
Null Hypothesis H0: μ1 = μ2, i.e., μ1 – μ2 = 0 (“No mean difference") Test at signif level α
POPULATION 1
and a random variable X, normally distributed in each.
POPULATION 2
Classic Example: “Randomized Clinical Trial”… Pop 1 = Treatment, Pop 2 = Control
X2 ~ N(μ2, σ2)
1
σ1
2
σ2
X1 ~ N(μ1, σ1)
Random Sample, size n1
Random Sample, size n2
Sampling Distribution =?
2 2 2 2~ ,X N n
2 21 2
1 2 1 21 2
~ ,X X Nn n
1 1 1 1~ ,X N n
Recall from section 4.1 (Discrete Models):Mean(X – Y) = Mean(X) – Mean(Y)
and if X and Y are independent…
Var(X – Y) = Var(X) + Var(Y)
μ0
Consider two independent populations…
Null Hypothesis H0: μ1 = μ2, i.e., μ1 – μ2 = 0 (“No mean difference") Test at signif level α
POPULATION 1
and a random variable X, normally distributed in each.
POPULATION 2
Classic Example: “Randomized Clinical Trial”… Pop 1 = Treatment, Pop 2 = Control
X2 ~ N(μ2, σ2)
1
σ1
2
σ2
X1 ~ N(μ1, σ1)
Random Sample, size n1
Random Sample, size n2
Sampling Distribution =?
2 2 2 2~ ,X N n
2 21 2
1 2 1 21 2
~ ,X X Nn n
1 1 1 1~ ,X N n
Recall from section 4.1 (Discrete Models):Mean(X – Y) = Mean(X) – Mean(Y)
and if X and Y are independent…
Var(X – Y) = Var(X) + Var(Y)
μ0
= 0 under H0
Consider two independent populations…
Null Hypothesis H0: μ1 = μ2, i.e., μ1 – μ2 = 0 (“No mean difference") Test at signif level α
POPULATION 1
and a random variable X, normally distributed in each.
POPULATION 2X2 ~ N(μ2, σ2)
1
σ1
2
σ2
X1 ~ N(μ1, σ1)
Null Distribution2 2
1 21 2
1 2
~ 0,X X Nn n
0
s.e.
But what if σ12 and σ2
2 are unknown?
1 2X X
Then use sample estimates s12 and s2
2 with Z- or t-test, if n1 and n2 are large.
Consider two independent populations…
Null Hypothesis H0: μ1 = μ2, i.e., μ1 – μ2 = 0 (“No mean difference") Test at signif level α
POPULATION 1
and a random variable X, normally distributed in each.
POPULATION 2X2 ~ N(μ2, σ2)
1
σ1
2
σ2
X1 ~ N(μ1, σ1)
Null Distribution2 2
1 21 2
1 2
~ 0, s sX X Nn n
0
s.e.
But what if σ12 and σ2
2 are unknown?
1 2X X
Then use sample estimates s12 and s2
2 with Z- or t-test, if n1 and n2 are large.
(But what if n1 and n2 are small?)
Later…
22s 1663.0
Example: X = “$ Cost of a certain medical service”
• Data Sample 1: n1 = 137
1 2x x 84NOTE:
> 0
Assume X is known to be normally distributed at each of k = 2 health care facilities (“groups”).
Clinic: X2 ~ N(μ2, σ2) Hospital: X1 ~ N(μ1, σ1) • Null Hypothesis H0: μ1 = μ2, i.e., μ1 – μ2 = 0 (“No difference exists.") 2-sided test at significance level α = .05
1x 630 2x 546Sample 2: n2 = 140
21s 788.5
Null Distribution2 2
1 21 2
1 2
0, s sX X Nn n
0
0,137 140
N
788.5 1663.0
0, 4.2N
4.2
95% Confidence Interval for μ1 – μ2:95% Margin of Error = (1.96)(4.2) = 8.232
(84 – 8.232, 84 + 8.232) =
(75.768, 92.232) does not contain 0
Z-score = 0
4.284
= 20 >> 1.96 p << .05
Reject H0; extremely strong significant difference
Null Hypothesis H0: μ1 = μ2, i.e., μ1 – μ2 = 0 (“No mean difference") Test at signif level α
POPULATION 1 POPULATION 2X2 ~ N(μ2, σ2)
1 2
X1 ~ N(μ1, σ1)
Null Distribution2 2
1 21 2
1 2
~ 0,X X Nn n
12
Samplesize n1
Sample size n2
2 21 2 unknown and
Consider two independent populations…and a random variable X, normally distributed in each.
large n1 and n2
21s
22s
2 21 2 2 2
0 1 2:H
Null Hypothesis H0: μ1 = μ2, i.e., μ1 – μ2 = 0 (“No mean difference") Test at signif level α
POPULATION 1 POPULATION 2X2 ~ N(μ2, σ2)
1 2
X1 ~ N(μ1, σ1)
Null Distribution2 2
1 21 2
1 2
~ 0,X X Nn n
12
Samplesize n1
Sample size n2
2 21 2 unknown and
Consider two independent populations…and a random variable X, normally distributed in each.
large n1 and n2
21s
22s
small n1 and n2
then conduct a t-test on the “pooled” samples.
IF the two populations are equivariant, i.e.,
12
2 20 1 2
2 21 2
:
:A
H
H
21s
22s 2 2
0 1 2:H
21s
22s
2 20 1 2
2 21 2
:
:A
H
H
Test Statistic
Sampling Distribution =?
2122
sFs
Working Rule of Thumb
Acceptance Region for H0
¼ < F < 4
2 21 1 2 2
1
2pooled
2
( 1) ( 1)2
n s n ssn n
2
12
2po
2
1 2
2ole
1d
( 1) ( 1)2
s ns snn n
2
2 12
1
1
2
2pooled
2( 1) ( 1)2
s ns sn
nn
2
2 12
2pooled
1 2
1 2( 1) ( 1)2
s ns snn n
2 20 1 2:H
Null Hypothesis H0: μ1 = μ2, i.e., μ1 – μ2 = 0 (“No mean difference") Test at signif level α
POPULATION 1 POPULATION 2X2 ~ N(μ2, σ2)
1 2
X1 ~ N(μ1, σ1)
Null Distribution
2 21 2
1 21 2
~ 0,X X Nn n
12
2 21 2 unknown and
Consider two independent populations…and a random variable X, normally distributed in each.
small n1 and n2
is accepted, then estimate their common value with a “pooled” sample variance.
IF equal variances
12
The pooled variance is a weighted average of s1
2 and s22, using the
degrees of freedom as the weights.
2 21 1 2 2
1
2pooled
2
( 1) ( 1)2
n s n ssn n
2
12
2po
2
1 2
2ole
1d
( 1) ( 1)2
s ns snn n
2
2 12
1
1
2
2pooled
2( 1) ( 1)2
s ns sn
nn
2
2 12
2pooled
1 2
1 2( 1) ( 1)2
s ns snn n
2 20 1 2:H
Null Hypothesis H0: μ1 = μ2, i.e., μ1 – μ2 = 0 (“No mean difference") Test at signif level α
POPULATION 1 POPULATION 2X2 ~ N(μ2, σ2)
1 2
X1 ~ N(μ1, σ1)
Null Distribution
2 21 2
1 21 2
~ 0,X X Nn n
12
2 21 2 unknown and
Consider two independent populations…and a random variable X, normally distributed in each.
small n1 and n2
is accepted, then estimate their common value with a “pooled” sample variance.
IF equal variances
12
2 2pooled pooled
1 2
s.e.s s
n n 2
pooled1 2
1 1s.e. sn n
The pooled variance is a weighted average of s1
2 and s22, using the
degrees of freedom as the weights.
is rejected,2 20 1 2:H IF equal variances
then use Satterwaithe Test, Welch Test, etc. SEE LECTURE NOTES AND TEXTBOOK.
1 2y y 84
2 21 1 2 2
1 2
( 1) ( 1)2pooled 2
n s n sn ns
s2 = SS/df
2 2(593 ) (520 )22 3 1s
546 546 16632 2(667 ) (604 )21 5 1s
630 630 788.5
Example: Y = “$ Cost of a certain medical service”
Assume Y is known to be normally distributed at each of k = 2 health care facilities (“groups”).
Clinic: Y2 ~ N(μ2, σ2) Hospital: Y1 ~ N(μ1, σ1) • Null Hypothesis H0: μ1 = μ2, i.e., μ1 – μ2 = 0 (“No difference exists.") 2-sided test at significance level α = .05
593 525 5202 3y 546
• Data: Sample 1 = {667, 653, 614, 612, 604}; n1 = 5 Sample 2 = {593, 525, 520}; n2 = 3
667 653 614 612 6041 5y 630
• Analysis via T-test (if equivariance holds): Point estimates /iy y nNOTE:
> 0“Group Means”
2 2(593 546) (520 546)22 3 1s
1663
(5 1)( ) (3 1)( )2pooled (5 1) (3 1)s
788.5 1663 1080 The pooled variance is a weighted average of the group variances, using the degrees of freedom as the weights.
2 2(667 630) (604 63021 5 1s
) 788.5“Group Variances”
Pooled Variance
1663788.5 2.11 4F
SS1 SS2
df1 df2
2pooled
1 2
1 1s.e. sn n
1 1s.e.5 3
1080 24
df = 6
s2 = SS/df
2 2(593 ) (520 )22 3 1s
546 546 16632 2(667 ) (604 )21 5 1s
630 630 788.5
1 2y y 84
p-value =
Example: Y = “$ Cost of a certain medical service”
Assume Y is known to be normally distributed at each of k = 2 health care facilities (“groups”).
Clinic: Y2 ~ N(μ2, σ2) Hospital: Y1 ~ N(μ1, σ1) • Null Hypothesis H0: μ1 = μ2, i.e., μ1 – μ2 = 0 (“No difference exists.") 2-sided test at significance level α = .05
Standard Error
02 2P T P T 2484
6 6 3.5
> 2 * (1 - pt(3.5, 6))[1] 0.01282634
Reject H0 at α = .05stat signif, Hosp > Clinic
593 525 5202 3y 546
• Data: Sample 1 = {667, 653, 614, 612, 604}; n1 = 5 Sample 2 = {593, 525, 520}; n2 = 3
667 653 614 612 6041 5y 630
• Analysis via T-test (if equivariance holds): Point estimates /iy y nNOTE:
> 0“Group Means”
2 2(593 546) (520 546)22 3 1s
1663
The pooled variance is a weighted average of the group variances, using the degrees of freedom as the weights.
2 2(667 630) (604 63021 5 1s
) 788.5“Group Variances”
Pooled Variance
1663788.5 2.11 4F
SS = 6480
(5 1)( ) (3 1)( )2pooled (5 1) (3 1)s
788.5 1663 1080
1 22 ( )P Y Y 84
R code:
> y1 = c(667, 653, 614, 612, 604)> y2 = c(593, 525, 520)> > t.test(y1, y2, var.equal = T)
Two Sample t-test
data: y1 and y2 t = 3.5, df = 6, p-value = 0.01283alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: 25.27412 142.72588 sample estimates:mean of x mean of y 630 546
p-value < α = .05Reject H0 at this level.
The samples provide evidence that the difference between mean costs is (moderately) statistically significant, at the 5% level, with the hospital being higher than the clinic (by an average of $84).
Formal Conclusion
Interpretation
NEXT UP…
PAIRED MEANSpage 6.2-7, etc.