Chapter%6:%% Inferences%Comparing%Two% Populaon%Central...

1/31/16

1

Chapter 6: Inferences Comparing Two Popula<on Central Values

February 1, 2016

Measures of Center •  Two commonly used central values of a population are:

> data=c(10,3,2,1,4) > mean(data) [1] 4

> median(data) [1] 3 > median(c(3,2,1,4)) [1] 2.5

1/31/16

2

Comparing Two Popula<on Means

Population of males with diabetes μ1 = mean glucose level σ1 = sd glucose level

•  Test H0:μ1=μ2 vs. H1:μ1≠μ2 (or H1:μ1>μ2, or H1:μ1<μ2) •  Test H0:μ1-μ2=0 vs. H1:μ1-μ2≠0 (or H1:μ1-μ2>0)

x2 s2

3

Population of females with diabetes μ2 = mean glucose level σ2 = sd glucose level

X1 s1

Recall, that when both populations are normally distributed,

and

Two-‐Sample Sta<s<cs

4

Suppose both populations are normally distributed and the two samples are independent.

Satterthwaite Approximation

1. Two-Sample Z-Test (when both σ1 and σ2 are known)

2. Welch T-Test (when both σ1 and σ2 are unknown)

3. Pooled T-Test (when σ1=σ2 but still unknown)

1/31/16

3

Using R: Welch Two-‐Sample T-‐test •  Example 6.1 (p. 294): Company officials were concerned about the length of time

a particular drug product retained its potency. A random sample of 10 bottles of the product was drawn from the production line and analyzed for potency. A second sample 10 bottles was obtained and stored in a regulated environment for a period of 1 year. The readings obtained from each sample are given below:

> fresh=c(10.2,10.5,10.3,10.8,9.8,10.6,10.7,10.2,10,10.6)!> stored=c(9.8,9.6,10.1,10.2,10.1,9.7,9.5,9.6,9.8,9.9)!> t.test(fresh,stored)!

!Welch Two Sample t-test!data: fresh and stored!t = 4.2368, df = 16.628, p-value = 0.000581!alternative hypothesis: true difference in means is not equal to 0!95 percent confidence interval:! 0.2706369 0.8093631!sample estimates:!mean of x mean of y ! 10.37 9.83 !

Hence, we are 95% confident that the reduction in potency is between 0.27 and 0.81.

Since the p-value is less than α=0.05, we reject H0:µ1=µ2 and conclude that mean potency is statistically lower after 1 year.

Using R: Pooled T-‐test •  If it is reasonable to assume that the two unknown standard deviations are equal,

(Rule of Thumb: If the larger sample standard deviation is not more than twice the smaller, it is reasonable to make this assumption. In the next chapter, we will learn a formal way to test this assumption.) we should use the Pooled T-Test.

> sd(fresh) ! ! !# 0.3233505!> sd(stored) ! ! !# 0.2406011!> t.test(fresh,stored,var.equal=T)!

!Two Sample t-test!data: fresh and stored!t = 4.2368, df = 18, p-value = 0.0004959!alternative hypothesis: true difference in means is not equal to 0!95 percent confidence interval:! 0.2722297 0.8077703!sample estimates:!mean of x mean of y ! 10.37 9.83!

Although we get the same conclusion, it is worth noting that the p-value is a little smaller and the confidence interval is a little shorter using the pooled t-test compared to the Welch.

1/31/16

4

Verifying Assump<ons •  The two samples are independent.

–  The two samples are randomly selected from two distinct populations and that the elements of one sample are statistically independent of those of the second sample.

•  A second type of dependence is the result of serial or spatial correlation. When measurements are taken over time, observations that are closer together in time tend to be more similar than observations collected at greatly different times. –  An index plot can be used to detect this kind of dependence. > plot(fresh,type=‘b’)!

Shows Independence Shows Serial Correlation

Verifying Assump<ons •  The samples came from normal populations.

–  Use the normal quantile-quantile plot (or qq-plot). > qqnorm(fresh); qqline(fresh)!> qqnorm(stored); qqline(stored)!!

Fresh Data Set Stored Data Set

Both qq-plots show no strong evidence against the normality assumption as the points are close to the reference lines.

1/31/16

5

Shapiro-‐Wilk Test •  You can also perform a formal test to assess the normality of the

data using the Shapiro-Wilk Test. •  Test H0:Data are normal vs. H1:Data are not normal

> shapiro.test(fresh)!!Shapiro-Wilk normality test!

data: fresh!W = 0.9528, p-value = 0.7013!> shapiro.test(stored)!!Shapiro-Wilk normality test!

data: stored!W = 0.9346, p-value = 0.4951!!Word of caution: The Shapiro-Wilk Test was found to be sensitive to minor deviations from normality. So when the sample size is big, this test might give a significant result even when the data are reasonably normal. !

Since both p-values are larger than 0.05, it is reasonable to assume that both data come from normal populations.

Using R: Pooled T-‐test •  Suppose we want to test H0:μ1-μ2= 0.5 vs. H1:μ1-μ2 > 0.5.

> t.test(fresh,stored,alternative="greater",mu=.5,var.equal=T)!!

!Two Sample t-test!!data: fresh and stored!t = 0.3138, df = 18, p-value = 0.3786!alternative hypothesis: true difference in means is greater than 0.5!95 percent confidence interval:! 0.3189872 Inf!sample estimates:!mean of x mean of y ! 10.37 9.83 !

Since the p-value is not less than α=0.05, we don’t reject the null hypothesis. Hence, we didn’t find sufficient evidence that reduction in mean potency is more than 0.5.

1/31/16

6

Wilcoxon Rank Sum Test •  What do we do if at least one of the two samples is not normal?

–  We could try to transform the data to make them normal. We will talk more about this strategy later on.

–  We could use a nonparametric test called Wilcoxon Rank Sum Test (also know as the Mann-Whitney Test).

!•  The assumptions for this test are: 1.  The two random samples are

independent. 2.  The samples come from identical

distributions with the exception that one distribution may be shifted to the right.

Using R: Wilcoxon Rank Sum Test •  H0: Δ=0 (The 2 populations are identical) vs. H1:Δ≠0.

> ?wilcox.test!> wilcox.test(fresh,stored)!

!Wilcoxon rank sum test with continuity correction!data: fresh and stored!W = 91, p-value = 0.00211!alternative hypothesis: true location shift is not equal to 0!!> median(fresh) # 10.4!> median(stored) # 9.8!

Since the p-value is less than α=0.05, we reject the null hypothesis. Hence, we found sufficient evidence that the median potency is statistically lower after 1 year.

Note: By default, R will compute an exact p-value if the samples contain less than 50 values and there are no ties. Otherwise, a normal approximation is used.

This indicates that a normal approximation was used to compute the p-value.

1/31/16

7

Wilcoxon Rank Sum Test: Example 2

(p. 307)

Example 2: Assump<ons •  Independence of the two samples:

Since the values were obtained from two distinct groups of individuals, it is reasonable to assume that the samples are independent.

•  Normality of data: > qqnorm(placebo);qqline(placebo) > qqnorm(alcohol);qqline(alcohol) > shapiro.test(placebo) W = 0.8634, p-‐value = 0.08367

> shapiro.test(alcohol) W = 0.8149, p-‐value = 0.02201

Both qq-plots show deviations from the reference line and the Shapiro-Wilk test yielded a p-value for the alcohol data that is less than 0.05. Hence, the normality assumption is violated.

1/31/16

8

Wilcoxon Rank Sum Test: Example 2 •  H0: Δ=0 (The distributions of reaction times for the placebo and alcohol

populations are identical). •  H1: Δ<0 (The distributions of reaction times for the placebo consumption

population is shifted to the left of the distribution for the alcohol populations) .

> placebo=c(.9,.37,1.63,.83,.95,.78,.86,.61,.38,1.97)!> alcohol=c(1.46,1.45,1.76,1.44,1.11,3.07,.98,1.27,2.56,1.32)!> wilcox.test(placebo,alcohol,alternative="less",conf.int=T)!

!Wilcoxon rank sum test!data: placebo and alcohol!W = 15, p-value = 0.003421!alternative hypothesis: true location shift is less than 0!95 percent confidence interval:! -Inf -0.37!sample estimates:!difference in location ! -0.61 !> median(placebo) # 0.845!> median(alcohol) # 1.645!

Since the p-value is less than α=0.05, we reject the null hypothesis. Hence, we found sufficient evidence that the median reaction time for the placebo population is statistically lower than that of the alcohol population.

Wilcoxon Rank Sum Test Sta<s<c

Test Sta<s<c 1 = sum of the ranks of the values in the first sample (textbook) = 1+2+3+4+5+6+7+8+16+18 = 70

Test Sta<s<c 2 = sum of the ranks of the values in the first sample – minimum possible sum (R) = (1+2+3+4+5+6+7+8+16+18) – (1+2+…+10) = 70 – 55 = 15

1/31/16

9

Paired Data

> data=read.csv("Example6_7.csv",header=T) > head(data) Car Garage1 Garage2 1 1 17.6 17.3 2 2 20.2 19.1 3 3 19.5 18.4 4 4 11.3 11.5 5 5 13.0 12.7 6 6 16.3 15.8 > attach(data)

(p. 314)

Since the estimates made by Garages I and II are for the same set of cars, these values are not independent.

Paired Data > plot(Garage1,Garage2,pch=19) > cor.test(Garage1,Garage2)

Pearson's product-moment correlation data: Garage1 and Garage2 t = 37.5052, df = 13, p-value = 1.243e-14 alternative hypothesis: true correlation is not equal to 0 95 percent confidence interval: 0.9858393 0.9985176 sample estimates: cor 0.9954108

Both the linear pattern in the above scatterplot and the close-to-1 correlation coefficient indicate that the two set of values are not independent.

r= sample correlation coefficient The small p-value indicates that correlation ρ is not 0.

1/31/16

10

Paired T-‐test •  H0: There is no difference in the estimates given by the two garages. •  H1: Garage I estimates are higher than those of Garage II .

> t.test(Garage1,Garage2,alternative=”greater”,paired=T)!!Paired t-test!

data: Garage1 and Garage2!t = 6.0234, df = 14, p-value = 1.563e-05!alternative hypothesis: true difference in means is greater than 0!95 percent confidence interval:! 0.4339886 Inf!sample estimates:!mean of the differences ! 0.6133333 !

Since the p-value is extremely small, we reject the null hypothesis. Hence, we found sufficient evidence that, on average, Garage I estimates are higher than Garage II.

Paired T-‐test •  H0: μd = 0 (No difference in estimates) •  H1: μd > 0 (Garage I estimates are higher on average than Garage II)

> Diff=Garage1-Garage2!> Diff! [1] 0.3 1.1 1.1 -0.2 0.3 0.5 0.4 0.9 0.2 0.6 0.3 1.1 0.8 0.9 0.9!> t.test(Diff,alternative=”greater”)!

!One Sample t-test!data: Diff!t = 6.0234, df = 14, p-value = 1.563e-05!alternative hypothesis: true mean is greater than 0!95 percent confidence interval:! 0.4339886 Inf!sample estimates:!mean of x !0.6133333 !

Note that we get the same t-value of 6.0234 and the same p-value of 1.563e-05 as earlier.

1/31/16

11

Paired T-‐test •  What if you incorrectly applied the Independent Two-Sample T-test?

> t.test(Garage1,Garage2,alternative=”greater”,var.equal=T)!!Two Sample t-test!

data: Garage1 and Garage2!t = 0.5462, df = 28, p-value = 0.2946!alternative hypothesis: true difference in means is greater than 0!95 percent confidence interval:! -1.297014 Inf!sample estimates:!mean of x mean of y ! 16.84667 16.23333!!> sd(Garage1)! !# 3.203986!> sd(Garage2)! !# 2.94125!> sd(Diff) ! !# 0.3943651 !

Note that the p-value is bigger than 0.05, giving us a not significant result.

The large variability among the 15 estimates from each garage diminishes the relative size of any difference between the two garages.

Taking the difference between the estimates from the garages reduce the large car-to-car variability.

Power Calcula<ons •  Recall from our review that the power of the test (1-β) depends on 4

things: sample size (n), significance level (α), magnitude of the difference (Δ), and the population standard deviation (σ).

> power.t.test(n=20,sig.level=0.05,delta=1,sd=2,!alternative=”two.sided”,type="two.sample”)!! Two-sample t test power calculation !! n = 20! delta = 1! sd = 2! sig.level = 0.05! power = 0.3377084! alternative = two.sided!!NOTE: n is number in *each* group!

Hence, the likelihood of detecting a difference in means of 1 unit, using a 0.05 level of significance and a sample size of 20 measurements per group and assuming the population standard deviation of the measurements is about 2 units is approximately 0.34.

1/31/16

12

Sample Size Calcula<ons •  Similarly, the sample size (n) depends on 4 things: power of the test (1-β),

significance level (α), magnitude of the difference (Δ), and the population standard deviation (σ).

> power.t.test(power=.90,sig.level=0.05,delta=1,sd=2,!alternative=”two.sided”,type="two.sample”)!! Two-sample t test power calculation !! n = 85.03129! delta = 1! sd = 2! sig.level = 0.05! power = 0.9! alternative = two.sided!!NOTE: n is number in *each* group!

Therefore, if we want the power of the test of be 0.90 to detect a difference in means of 1 unit, using a 0.05 level of significance and assuming the population standard deviation of the measurements is about 2, we need to take about 85 (or 86) measurements per group.

Note: It is standard practice to round up when calculating sample sizes.

Sample Size Calcula<ons -‐ Example

> power.t.test(power=90,sig.level=0.05,delta=1.5,sd=2.4,!alternative="one.sided”,type="two.sample”)!! Two-sample t test power calculation ! n = 44.53998!

(p. 324)

Therefore, we recommend taking 45 test samples per treatment group.

Chapter%6:%% Inferences%Comparing%Two% Populaon%Central...

Documents

Transcript of Chapter%6:%% Inferences%Comparing%Two% Populaon%Central...