Chapter 11wellsmat.startlogic.com/.../apstat_ch11_studynotes.pdfThe researchers studying vitamin C...
Transcript of Chapter 11wellsmat.startlogic.com/.../apstat_ch11_studynotes.pdfThe researchers studying vitamin C...
Chapter 11
Inference For Distribution
Introduction
Single Population when σ unknown
Confidence Intervals
Significance Test
Comparing two populations when σ unknown
Confidence Intervals
Significance Test
Lesson 11-1, Part 1
Inference for the Mean of a Population
Conditions for Testing claims about the Population Mean σ Unknown
Simple random sample (SRS)
Distribution is approximately normal if
Population is normally distribution
The sampling distribution will resemble that of the population.
n ≥ 30
The Central Limit Theorem tells us if the sampling distribution of the sample mean will be approximately normal.
Conditions for Testing claims about the Population Mean σ Unknown
n < 30
Normal probability plot show a linear trend with no outliers.
One-Sample t Statistic
xt
s
n
standard error of the sample mean iss
n
xz
n
Density Curve for the t-Distribution
Symmetric about zero, single peaked, and bell shape
The spread is a bit greater than a normal distribution
Has more probability in the tails and less in the center
As the degree of freedom (n – 1) increases it becomes approximately normal.
t – Confidence Interval
*x zn
z – confidence interval t – confidence interval
* sx t
n
A confidence interval or significance test is called robustif the confidence level or P – value does not change very muchwhen the assumptions are violated.
Example – Page 620, #11.2
What are the critical value t* from Table C satisfies each of the followingconditions?
A). the t distribution with 5 degrees of freedom has probability 0.05 to the right of t*
0.05
2.015
Example – Page 620, #11.2
B). the t distribution with 21 degrees of freedom has probability 0.99 to the left of t*
0.99
2.518
0.01
Example – Page 620, #11.4
What is the critical value t* from table C should be used for a confidence interval for the mean of the population in each of the following situations?
A) A 90% confidence interval based on n = 12 observations.
1 12 1 11df n 0.90
0.05 0.05
1.796
Example – Page 620, #11.4
B) A 95% confidence interval from an SRS of 30 observations.
1 30 1 29df n
0.95
0.025 0.025
2.045
Example – Cricket Love
The article “Well Fed Crickets Bowl of Maidens Over” (Nature Science Update, February 11, 1999) reported that female field crickets are attractedto males that have high chirp rates and hypothesized that chirp rate is relatedto nutritional status. The usual chirp rate for male field crickets was reported to vary around a mean of 60 chirps per second. To investigatewhether chirp rate was related to nutritional status, investigators fed malecrickets a high protein diet for 8 days, after which chirp rate was measured. The mean chirp rate for the crickets on the high protein diet was reported at109 chirps per second. Is this convincing evidence that the mean chirprate for crickets on a high protein diet is greater than 60 (which wouldthen apply an advantage in attracting the ladies)? Suppose that the samplesize and sample standard deviation are n = 32, s = 40. We test therelevant hypothesis with = 0.01.
Example – Cricket Love
Step 1 – Identify the population of interest and parameter you want to draw conclusions about.
μ = mean chirp rate for crickets on a high protein diet.
: 60
: 60
o
a
H
H
Example – Cricket Love
use a one sample t test with the following conditions:
Step 2 – Identify the statistical procedure you should use. Then verify the conditions for using this procedure.
• Simple random sample (assume)• Sampling distribution is approximately normal since n ≥ 30.
Example – Cricket Love
Step 3 – Carry out the inference procedure.
oxt
s
n
109 606.93
40
32
0
31
P value
df
Example – Cricket Love
Step 4 –Interpret your results in the context of the problem
There is sufficient evidence to reject Ho since P-value = 0
≤ = .01 and conclude that the mean chirp rate is higher for
male crickets that eat a higher protein diet.
Example – Page 628, #11.10
Here are estimates of the daily intakes of calcium (in milligrams) for 38 women between the ages of 18 and 24 years who participated in a studyof women’s bone health:
808 882 1062 970 909 802 374 416 784 997
651 716 438 1420 1425 948 1050 976 572 403
626 774 1253 549 1325 446 465 1269 671 696
1156 684 1933 748 1203 2433 1255 1100
Example – Page 628, #11.10
A). Display the data using a stemplot and make a normal probability plot.Describe the distribution of calcium intakes for these women.
0 3
0 4 4 4 4 4 5 5
0 6 6 6 6 6 7 7 7 7
1 0 0 1 1
1 2 2 2 2 3
1 4 4
1
1 9
2
2
2 4
2nd Y= Plot
Example – Page 628, #11.10
The data are right skewed with a high outlier of 2433 and possible 1933.
0 3
0 4 4 4 4 4 5 5
0 6 6 6 6 6 7 7 7 7
1 0 0 1 1
1 2 2 2 2 3
1 4 4
1
1 9
2
2
2 4
Example – Page 628, #11.10
B). Calculate the mean, the standard deviation, and the standard error.
926.03
427.23
x
s
427.2369.3
38
sSE
n
Example – Page 628, #11.10
C). Find a 95% confidence interval for the mean. Use the inference toolbox.
Step 1 – Identify the population of interest and the parameter you want to draw conclusion about.
μ = , the mean daily intakes of calcium for women between that ages of 18 and 24.
Example – Page 628, #11.10
Step 2 – Identify the statistical procedure you should use. Then verify the conditions for using this procedure.
Use a one-sample t interval.
Conditions
1. Simple random sample - assume
2. The sampling distribution is approximately normal since n ≥ 30.
Example – Page 628, #11.10
Step 3 – Carry out the inference procedure.
STAT TESTS TInterval427.23
* 926.03 2.04238
sx t
n
use df = 37 use 30 in Table C
785.6 to 1066.5
Example – Page 628, #11.10
Step 4 –Interpret your results in the context of the problem
We are 95% confident that the true mean daily intake of calcium for women between is 18 and 24 years is between 785.6 mg and 1066.5 mg.
Example – Page 628, #11.10
D) Eliminate the two largest values and recomputethe 95% confidence interval. What do you notice?
The normal quantile plot is essentiallythe same as before (except the twopoints that deviated greatly from the lineare gone).
Lesson 11-1, Part 2
Inference for the Mean of a Population
Matched Pairs
To compare the responses to the two treatments in amatched pair, apply the one-sample t procedure to theobserved differences.
Example – Page 643, #11.29
The researchers studying vitamin C in CSB in Exercise 11.9 (page 628) were also interested in similar commodity called wheat soy blend (WSB). A major concern was the possibility that some of the vitamin C content would be destroyed as a result of storage and shipment of the commodity to its final destination. The researchers specially marked a collection of bags at the factory and took a sample form each of these to determine the vitamin C content. Five months later in Haiti they found the specially marked bags and took samples. The consist of two vitamin C measures for each bag, one at the time of production in the factory and the other five months later in Haiti. The units are mg/100g.
Example – Page 643, #11.29
Factory Haiti Factory Haiti Factory Haiti
44 40 45 38 39 43
50 37 32 40 52 38
48 39 47 35 45 38
44 35 40 38 37 38
42 35 38 34 38 41
47 41 41 35 44 40
49 37 43 37 43 35
50 37 40 34 39 38
39 34 37 40 44 36
Example – Page 643, #11.29
A). Examine the question of interest to these researchers. Provide appropriate statistical evidence to justify you conclusions.
Step 1 – Identify the population of interest and the parameter you want to draw conclusion about.
µ = mean change (Haiti – factory) of vitamin C in mg/100g.
: 0: 0
o
a
HH
Example – Page 643, #11.29
Step 2 – Identify the statistical procedure you should use. Then verify the conditions for using this procedure.
We will use a one-sample t test.
Conditions
1. Simple random sample - assumed
2. The sampling distribution is approximately normal since the normal probability plot also appears to reasonably linear with no outliers.
Example – Page 643, #11.29
-1 4 3 3 2 2 -0 9 9 8 8 7 7 7 6 6 6 6 5 4 4 4 2 10 1 3 3 4 8
Example – Page 643, #11.29
Step 3 – Calculate the test statistic and p-value. Illustrate with a graph.
STAT TESTS T-Test
diff o
diff
xt
s
n
5.33 04.96
5.589
27
0.000018 0.0000235
p valuedf
Example – Page 643, #11.29
Step 4 – Interpret your results in the context of the problem
There is sufficient evidence to reject Ho (P-value = 0.00005 < α = 0.01) and
conclude that the mean change between Haiti and factory is less
than 0.
This indicates that the average amount of vitamin C has decrease
as a result of storage and shipment.
Example – Page 643, #11.29
B). Estimate the loss in vitamin C content over the five-month period. Use a 95% confidence level.
Step 3 – Carry out the inference procedure.
STAT TESTS TInterval
*diff
diff
sx t
n
5.5885.33 2.056
27
t* use table C with a df = 26( 7.544, 3.123)
We are 95% confident that the populationmean loss in vitamin C content over a five month period is between – 7.544 mg/100g and – 3.123 mg/100g.
Example – Page 643, #11.29
C). Do these data provide evidence that the mean vitamin C content ofall bags of WSB shipped to Haiti differs from the target value of40 mg/100g.
Step 1 – Identify the population of interest and the parameter you want to draw conclusion about.
μ = mean of vitamin C in mg/100g for all bags shipped to Haiti.
: 40: 40
o
a
HH
Example – Page 643, #11.29
Step 3 – Calculate the test statistic and p-value. Illustrate with a graph.
STAT TESTS T-Test
oxt
s
n
42.85 403.09
4.79
27
0.00526
p valuedf
Example – Page 643, #11.29
Step 4 – Interpret your results in the context of the problem
There is sufficient evidence to reject Ho (P-value = 0.005 < α = 0.01 and
conclude that the mean vitamin C content is different from the target
mean of 40 mg/100g.
Robustness of t Procedures
The t procedures are strongly influenced by outliers.
Always check the data first!
If there are outliers and the sample size is small, the results will not be reliable.
The t procedures are robust when there are no outliers, especially when the distribution is approximately symmetric.
When to Use t Procedures
If the sample size is less than 15, only use t procedures if the data are close to normal.
If the sample size is at least 15, only use t procedures if there are no outliers.
If the sample size is at least 40, you may use t procedures, even if the data is skewed.
Lesson 11-2
Comparing Two Sample Means
Two-Sample Problems
In a comparative study, we want to compare the responses to two treatments or to compare the characteristic of two populations.
We have separate samples for each treatment or each population.
The samples must be chosen randomly and independently in order to perform statistical inference.
Because matched pairs are NOT chosen independently, we will NOT use two-sample inference for a matched pair design. For a matched pair design, apply the one-sample t
procedures to observe difference.
Notation for Two Samples
Population Variable Mean Standard Deviation
1 x1 μ1 σ1
2 x2 μ2 σ2
Population Sample Size
Sample Mean
Sample Standard Deviation
1 n1 s1
2 n2 s2
1x
2x
Null and Alternative Hypothesis
The null hypothesis is that there is no
difference between the two parameters.
Ho: µ1 = µ2 or Ho: µ1 – µ2 = 0
The alternative hypothesis could be that
Ha: µ1 ≠ µ2 (two-sided)
Ha: µ1 < µ2 or Ha: µ1 – µ2< 0 (one-sided)
Ha: µ1 > µ2 or Ha: µ1 – µ2> 0 (one-sided)
Conditions for Comparing Two Samples
Two random samples (SRS) that are independent.
Both populations are normally distributed if
For each sample either population is normal distributed
The sampling distribution will resemble that of the population.
Conditions for Comparing Two Samples
n1 ≥ 30 and n2 ≥ 30
The central limit theorem tells us the sampling distribution of the mean will be approximately normal.
n1 < 30 and n2 < 30
The sampling distribution of the mean will be approximately normal if the normal probability plot is linear with no outliers.
Sampling Distribution of
Mean
1 2x x
1 2μ μ
Variance
2 2
1 2
1 2
σ σ
n n
Standardize z
x μz
σ
1 2 1 2
2 2
1 2
1 2
x x μ μz
σ σ
n n
Test Statistic for Two Samples when σ Unknown
1 2 1 2
2 2
1 2
1 2
x xt
s s
n n
Standard Error – Estimated Standard Deviation
Confidence Interval for Two Samples when σ Unknown
2 2
1 2
1 2
1 2
*s s
x x tn n
Confidence interval for μ1 – μ2 given by
1*
2
Ct
Degree of freedom is equal to the smaller of n1 – 1 or n2 – 1.
Example – Page 657, #11.40
How badly does logging damage tropical rainforests? One study comparedforest plots in Borneo that had never been logged with similar plotsnearby that had been logged 8 years earlier. The study found that theeffects of logging were somewhat less severe than expected. Here arethe data on the number of tree species in 12 unlogged plots and 9 loggedplots:
Unlogged: 22 18 22 20 15 21 13 13 19 13 19 15
Logged: 17 4 18 14 18 15 15 10 12
Example – Page 657, #11.40
A) The study report says, “ Loggers were unaware that the effects oflogging would be assessed.” Why is this important? The study reportalso explains why the plots can considered to be randomly assigned.
If the loggers had known that a study would be done, they mighthave (consciously or subconsciously) cut down fewer trees than theytypically would, in order to reduce the impact of logging.
Example – Page 657, #11.40
B) Does logging significantly reduce the mean number of species in a plot after 8 years. Give appropriate statistical evidence to supportyour conclusion.
We want to compare the mean number tree species of unlogged (μ1)
plots versus logged (μ2) plots.
1 2
1 2
::
o
a
HH
or 1 2
1 2
: 0: 0
o
a
HH
Example – Page 657, #11.40
We will use a two sample t test
Conditions:
1. Plots are randomly assigned, not sure if it’s a SRS so we may beable to generalize the results
2. The stemleaf plot and the normal probability plot appear to be approximately normal.
Example – Page 657, #11.40
1 3 3 3 5 5 8 9 9 2 0 1 2 2
Unlogged
0 41 0 2 4 5 5 7 8 8
Logged
Example – Page 657, #11.40
STAT TESTS 2-SampTTest
Example – Page 657, #11.40
1 2 1 2
2 2
1 2
1 2
x xt
s s
n n
2 2
(17.5 13.67) (0)2.11
3.53 4.5
12 9
t
P-value = 0.0259df = 14.793
Example – Page 657, #11.40
There is sufficient evidence to reject Ho (P-value = 0.0256 < α = 0.05) and
conclude that logging reduce the mean number of tree species. Also note that this would be not be statistical significant at 1%.
Example – Page 657, #11.40
C). Give a 90% confidence interval for the difference in the mean numberof species between unlogged and logged plots.
STAT TESTS 2-SampTInt
Example – Page 657, #11.40
2 2
1 2
1 2
1 2
*s s
x x tn n
2 23.529 4.5(17.5 13.67) 1.860
12 9
(0.6517,7.015)
t* = 1.860 where df = 14.79