Chapter 11wellsmat.startlogic.com/.../apstat_ch11_studynotes.pdfThe researchers studying vitamin C...

Chapter 11

Inference For Distribution

Introduction

Single Population when σ unknown

Confidence Intervals

Significance Test

Comparing two populations when σ unknown

Confidence Intervals

Significance Test

Lesson 11-1, Part 1

Inference for the Mean of a Population

Conditions for Testing claims about the Population Mean σ Unknown

Simple random sample (SRS)

Distribution is approximately normal if

Population is normally distribution

The sampling distribution will resemble that of the population.

n ≥ 30

The Central Limit Theorem tells us if the sampling distribution of the sample mean will be approximately normal.

Conditions for Testing claims about the Population Mean σ Unknown

n < 30

Normal probability plot show a linear trend with no outliers.

One-Sample t Statistic

xt

s

n

standard error of the sample mean iss

n

xz

n

Density Curve for the t-Distribution

Symmetric about zero, single peaked, and bell shape

The spread is a bit greater than a normal distribution

Has more probability in the tails and less in the center

As the degree of freedom (n – 1) increases it becomes approximately normal.

t – Confidence Interval

*x zn

z – confidence interval t – confidence interval

* sx t

n

A confidence interval or significance test is called robustif the confidence level or P – value does not change very muchwhen the assumptions are violated.

Example – Page 620, #11.2

What are the critical value t* from Table C satisfies each of the followingconditions?

A). the t distribution with 5 degrees of freedom has probability 0.05 to the right of t*

0.05

2.015


B). the t distribution with 21 degrees of freedom has probability 0.99 to the left of t*

0.99

2.518

0.01


What is the critical value t* from table C should be used for a confidence interval for the mean of the population in each of the following situations?

A) A 90% confidence interval based on n = 12 observations.

1 12 1 11df n 0.90

0.05 0.05

1.796


B) A 95% confidence interval from an SRS of 30 observations.

1 30 1 29df n

0.95

0.025 0.025

2.045

Example – Cricket Love

The article “Well Fed Crickets Bowl of Maidens Over” (Nature Science Update, February 11, 1999) reported that female field crickets are attractedto males that have high chirp rates and hypothesized that chirp rate is relatedto nutritional status. The usual chirp rate for male field crickets was reported to vary around a mean of 60 chirps per second. To investigatewhether chirp rate was related to nutritional status, investigators fed malecrickets a high protein diet for 8 days, after which chirp rate was measured. The mean chirp rate for the crickets on the high protein diet was reported at109 chirps per second. Is this convincing evidence that the mean chirprate for crickets on a high protein diet is greater than 60 (which wouldthen apply an advantage in attracting the ladies)? Suppose that the samplesize and sample standard deviation are n = 32, s = 40. We test therelevant hypothesis with = 0.01.


Step 1 – Identify the population of interest and parameter you want to draw conclusions about.

μ = mean chirp rate for crickets on a high protein diet.

: 60

: 60

o

a

H

H


use a one sample t test with the following conditions:

Step 2 – Identify the statistical procedure you should use. Then verify the conditions for using this procedure.

• Simple random sample (assume)• Sampling distribution is approximately normal since n ≥ 30.


Step 3 – Carry out the inference procedure.

oxt

s

n

109 606.93

40

32

0

31

P value

df


Step 4 –Interpret your results in the context of the problem

There is sufficient evidence to reject Ho since P-value = 0

≤ = .01 and conclude that the mean chirp rate is higher for

male crickets that eat a higher protein diet.


Here are estimates of the daily intakes of calcium (in milligrams) for 38 women between the ages of 18 and 24 years who participated in a studyof women’s bone health:

808 882 1062 970 909 802 374 416 784 997

651 716 438 1420 1425 948 1050 976 572 403

626 774 1253 549 1325 446 465 1269 671 696

1156 684 1933 748 1203 2433 1255 1100


A). Display the data using a stemplot and make a normal probability plot.Describe the distribution of calcium intakes for these women.

0 3

0 4 4 4 4 4 5 5

0 6 6 6 6 6 7 7 7 7

1 0 0 1 1

1 2 2 2 2 3

1 4 4

1

1 9

2

2

2 4

2nd Y= Plot


The data are right skewed with a high outlier of 2433 and possible 1933.

0 3

0 4 4 4 4 4 5 5

0 6 6 6 6 6 7 7 7 7

1 0 0 1 1

1 2 2 2 2 3

1 4 4

1

1 9

2

2

2 4


B). Calculate the mean, the standard deviation, and the standard error.

926.03

427.23

x

s

427.2369.3

38

sSE

n


C). Find a 95% confidence interval for the mean. Use the inference toolbox.

Step 1 – Identify the population of interest and the parameter you want to draw conclusion about.

μ = , the mean daily intakes of calcium for women between that ages of 18 and 24.



Use a one-sample t interval.

Conditions

1. Simple random sample - assume

2. The sampling distribution is approximately normal since n ≥ 30.



STAT TESTS TInterval427.23

* 926.03 2.04238

sx t

n

use df = 37 use 30 in Table C

785.6 to 1066.5


Step 4 –Interpret your results in the context of the problem

We are 95% confident that the true mean daily intake of calcium for women between is 18 and 24 years is between 785.6 mg and 1066.5 mg.


D) Eliminate the two largest values and recomputethe 95% confidence interval. What do you notice?

The normal quantile plot is essentiallythe same as before (except the twopoints that deviated greatly from the lineare gone).

Lesson 11-1, Part 2

Inference for the Mean of a Population

Matched Pairs

To compare the responses to the two treatments in amatched pair, apply the one-sample t procedure to theobserved differences.


The researchers studying vitamin C in CSB in Exercise 11.9 (page 628) were also interested in similar commodity called wheat soy blend (WSB). A major concern was the possibility that some of the vitamin C content would be destroyed as a result of storage and shipment of the commodity to its final destination. The researchers specially marked a collection of bags at the factory and took a sample form each of these to determine the vitamin C content. Five months later in Haiti they found the specially marked bags and took samples. The consist of two vitamin C measures for each bag, one at the time of production in the factory and the other five months later in Haiti. The units are mg/100g.


Factory Haiti Factory Haiti Factory Haiti

44 40 45 38 39 43

50 37 32 40 52 38

48 39 47 35 45 38

44 35 40 38 37 38

42 35 38 34 38 41

47 41 41 35 44 40

49 37 43 37 43 35

50 37 40 34 39 38

39 34 37 40 44 36


A). Examine the question of interest to these researchers. Provide appropriate statistical evidence to justify you conclusions.


µ = mean change (Haiti – factory) of vitamin C in mg/100g.

: 0: 0

o

a

HH



We will use a one-sample t test.

Conditions

1. Simple random sample - assumed

2. The sampling distribution is approximately normal since the normal probability plot also appears to reasonably linear with no outliers.


-1 4 3 3 2 2 -0 9 9 8 8 7 7 7 6 6 6 6 5 4 4 4 2 10 1 3 3 4 8


Step 3 – Calculate the test statistic and p-value. Illustrate with a graph.

STAT TESTS T-Test

diff o

diff

xt

s

n

5.33 04.96

5.589

27

0.000018 0.0000235

p valuedf


Step 4 – Interpret your results in the context of the problem

There is sufficient evidence to reject Ho (P-value = 0.00005 < α = 0.01) and

conclude that the mean change between Haiti and factory is less

than 0.

This indicates that the average amount of vitamin C has decrease

as a result of storage and shipment.


B). Estimate the loss in vitamin C content over the five-month period. Use a 95% confidence level.


STAT TESTS TInterval

*diff

diff

sx t

n

5.5885.33 2.056

27

t* use table C with a df = 26( 7.544, 3.123)

We are 95% confident that the populationmean loss in vitamin C content over a five month period is between – 7.544 mg/100g and – 3.123 mg/100g.


C). Do these data provide evidence that the mean vitamin C content ofall bags of WSB shipped to Haiti differs from the target value of40 mg/100g.


μ = mean of vitamin C in mg/100g for all bags shipped to Haiti.

: 40: 40

o

a

HH


Step 3 – Calculate the test statistic and p-value. Illustrate with a graph.

STAT TESTS T-Test

oxt

s

n

42.85 403.09

4.79

27

0.00526

p valuedf


Step 4 – Interpret your results in the context of the problem

There is sufficient evidence to reject Ho (P-value = 0.005 < α = 0.01 and

conclude that the mean vitamin C content is different from the target

mean of 40 mg/100g.

Robustness of t Procedures

The t procedures are strongly influenced by outliers.

Always check the data first!

If there are outliers and the sample size is small, the results will not be reliable.

The t procedures are robust when there are no outliers, especially when the distribution is approximately symmetric.

When to Use t Procedures

If the sample size is less than 15, only use t procedures if the data are close to normal.

If the sample size is at least 15, only use t procedures if there are no outliers.

If the sample size is at least 40, you may use t procedures, even if the data is skewed.

Lesson 11-2

Comparing Two Sample Means

Two-Sample Problems

In a comparative study, we want to compare the responses to two treatments or to compare the characteristic of two populations.

We have separate samples for each treatment or each population.

The samples must be chosen randomly and independently in order to perform statistical inference.

Because matched pairs are NOT chosen independently, we will NOT use two-sample inference for a matched pair design. For a matched pair design, apply the one-sample t

procedures to observe difference.

Notation for Two Samples

Population Variable Mean Standard Deviation

1 x1 μ1 σ1

2 x2 μ2 σ2

Population Sample Size

Sample Mean

Sample Standard Deviation

1 n1 s1

2 n2 s2

1x

2x

Null and Alternative Hypothesis

The null hypothesis is that there is no

difference between the two parameters.

Ho: µ1 = µ2 or Ho: µ1 – µ2 = 0

The alternative hypothesis could be that

Ha: µ1 ≠ µ2 (two-sided)

Ha: µ1 < µ2 or Ha: µ1 – µ2< 0 (one-sided)

Ha: µ1 > µ2 or Ha: µ1 – µ2> 0 (one-sided)

Conditions for Comparing Two Samples

Two random samples (SRS) that are independent.

Both populations are normally distributed if

For each sample either population is normal distributed

The sampling distribution will resemble that of the population.

Conditions for Comparing Two Samples

n1 ≥ 30 and n2 ≥ 30

The central limit theorem tells us the sampling distribution of the mean will be approximately normal.

n1 < 30 and n2 < 30

The sampling distribution of the mean will be approximately normal if the normal probability plot is linear with no outliers.

Sampling Distribution of

Mean

1 2x x

1 2μ μ

Variance

2 2

1 2

1 2

σ σ

n n

Standardize z

x μz

σ

1 2 1 2

2 2

1 2

1 2

x x μ μz

σ σ

n n

Test Statistic for Two Samples when σ Unknown

1 2 1 2

2 2

1 2

1 2

x xt

s s

n n

Standard Error – Estimated Standard Deviation

Confidence Interval for Two Samples when σ Unknown

2 2

1 2

1 2

1 2

*s s

x x tn n

Confidence interval for μ1 – μ2 given by

1*

2

Ct

Degree of freedom is equal to the smaller of n1 – 1 or n2 – 1.


How badly does logging damage tropical rainforests? One study comparedforest plots in Borneo that had never been logged with similar plotsnearby that had been logged 8 years earlier. The study found that theeffects of logging were somewhat less severe than expected. Here arethe data on the number of tree species in 12 unlogged plots and 9 loggedplots:

Unlogged: 22 18 22 20 15 21 13 13 19 13 19 15

Logged: 17 4 18 14 18 15 15 10 12


A) The study report says, “ Loggers were unaware that the effects oflogging would be assessed.” Why is this important? The study reportalso explains why the plots can considered to be randomly assigned.

If the loggers had known that a study would be done, they mighthave (consciously or subconsciously) cut down fewer trees than theytypically would, in order to reduce the impact of logging.


B) Does logging significantly reduce the mean number of species in a plot after 8 years. Give appropriate statistical evidence to supportyour conclusion.

We want to compare the mean number tree species of unlogged (μ1)

plots versus logged (μ2) plots.

1 2

1 2

::

o

a

HH

or 1 2

1 2

: 0: 0

o

a

HH


We will use a two sample t test

Conditions:

1. Plots are randomly assigned, not sure if it’s a SRS so we may beable to generalize the results

2. The stemleaf plot and the normal probability plot appear to be approximately normal.


1 3 3 3 5 5 8 9 9 2 0 1 2 2

Unlogged

0 41 0 2 4 5 5 7 8 8

Logged


STAT TESTS 2-SampTTest


1 2 1 2

2 2

1 2

1 2

x xt

s s

n n

2 2

(17.5 13.67) (0)2.11

3.53 4.5

12 9

t

P-value = 0.0259df = 14.793


There is sufficient evidence to reject Ho (P-value = 0.0256 < α = 0.05) and

conclude that logging reduce the mean number of tree species. Also note that this would be not be statistical significant at 1%.


C). Give a 90% confidence interval for the difference in the mean numberof species between unlogged and logged plots.

STAT TESTS 2-SampTInt


2 2

1 2

1 2

1 2

*s s

x x tn n

2 23.529 4.5(17.5 13.67) 1.860

12 9

(0.6517,7.015)

t* = 1.860 where df = 14.79

Chapter 11wellsmat.startlogic.com/.../apstat_ch11_studynotes.pdfThe researchers studying vitamin C...

Documents

Transcript of Chapter 11wellsmat.startlogic.com/.../apstat_ch11_studynotes.pdfThe researchers studying vitamin C...