69
Chapter 8: Hypothesis Testing for Population Proportions The basics of Significance Testing

egbert-phillips
• Category

## Documents

• view

224

2

### Transcript of Chapter 8: Hypothesis Testing for Population Proportions The basics of Significance Testing.

Chapter 8: Hypothesis Testing for Population Proportions

The basics of Significance Testing

Statistical Inference• Already discussed confidence intervals for unknown

population parameter, p

• CI’s used when the goal is to estimate an unknown population parameter like ρ

• This chapter... statistical inference through significance tests

• Evaluate evidence (a statistic) provided by sample data about some claim concerning an unknown population parameter like ρ

I’m a great free-throw shooter...• I claim that I make 95% of my basketball free

throws.• To test my claim, I am asked to shoot 20 free

throws. I make only 8 of the 20 (only 40%). Now people don’t believe that my claim of making 95% of my basketball free throws.

• Making only 8 of 20 attempts would almost never happen/very unlikely if I truly did make 95% of my free throws

Significance Testing

• Basic idea... An outcome that would rarely happen if a claim were really true is good evidence that the claim is not true.

• Example... I claim that 99% of adult humans are 6 feet tall or taller.

• If my claim was true, it would be very rare to get most of the adult humans in a SRS of 100 that are shorter than 6 feet.

Significance Testing... Let’s begin by knowing μ and σ (unrealistic)

Because paramedic response time is critical to saving lives, several cities monitor these response times. In one city, the mean response time to all accidents involving life-threatening injuries last year was μ = 6.7 minutes with a standard deviation of σ = 2 minutes. The city manager encourages them to “do better” next year.

At the end of the following year, the city manager selects a SRS of 400 calls involving life-threatening injuries. For this sample, the mean response time was = 6.48 minutes. Do these data provide good evidence that response times have decreased since last year?

Previous Year: μ = 6.7 minutes; σ = 2 minutesFollowing year, SRS 400 with = 6.48 minutes

Does this data provide good evidence that the response times have decreased?

Remember, statistics vary from sample to sample. Maybe = 6.48 is a result of sampling variability.

Maybe response time hasn’t improved.

Previous Year: μ = 6.7 minutes; σ = 2 minutesFollowing year, SRS 400 with = 6.48 minutes

Make a claim and see if the data provides evidence against it.

Ho: μ = 6.7 minutes

Ho, null hypothesis; usually no effect, no change, no difference; neutral

Hypothesis always refers to some population parameter, like μ or ρ (NEVER a sample statistic! Don’t want to make hypothesis about something we already know.)

Previous Year: μ = 6.7 minutes; σ = 2 minutesFollowing year, SRS 400 with = 6.48 minutes

Ho: μ = 6.7 minutes

We are seeking evidence of a decrease in response time, so

Ha: μ < 6.7 minutes

Ha, alternative hypothesis, claim about population we are trying to find evidence for.

One-sided, only interested in decrease (in this case) but can be two-sided, such as

Ho: μ = 0 and Ha: μ ≠ 0

Is our sampling distribution’s rare?

Practice: State the appropriate null hypothesis and alternative hypothesis in each case. Be sure to define

Larry's car averages 26 miles per gallon on the highway. He switches to a new brand of motor oil that is advertised to increase gas mileage. After driving 3000 highway miles with the new oil, he wants to determine if the average gas mileage has increased.

Parameter: μ = mean gas mileage for Larry’s car on the highway

Ho: μ = 26 mpg Ha: μ > 26 mpg

Practice: State the appropriate null hypothesis and alternative hypothesis in each case. Be sure to define

A May 2005 Gallup Poll report on a national survey of 1028 teenagers revealed that 72% of teens said they rarely or never argue with their friends. You wonder whether this national result would be true in your school. So you conduct your own survey of a random sample of students at your school.

Parameter: Proportion of teens in your school who rarely or never fight with their friends.

Ho: p = 0.72 Ha: p ≠ 0.72

Explain what is wrong in each situation and why it is wrong

A change is made that should improve student satisfaction with the parking situation at your school. The null hypothesis, that there is an improvement, is tested versus the alternative, that there is no change.

Ho and Ha have been switched. The null hypothesis should be a statement of ‘no change.’

Explain what is wrong in each situation and why it is wrong

A researcher tests the following null hypothesis: H0: = 10.

The null hypothesis (and the alternative hypothesis as well) should be a statement/claim about a population parameter (like µ), not a sample statistic (like )

Explain what is wrong in each situation and why it is wrong

The Survey of Study Habits and Attitudes (SSHA) is a psychological test that measures students' attitudes toward school and study habits. Scores range from 0 to 200. The mean score for U.S. college students is about 115. A teacher suspects that older students have better attitudes toward school.

Ho: μ = 115 Ha: μ > 120

Ho and Ha must share same numeric value (only change =, >, <, ≠)

Explain what is wrong in each situation and why it is wrong

The Census Bureau reports that households spend an average of 31% of their total spending on housing. A homebuilders association in Cleveland believes that this average is lower in their area. They interview a sample of 40 households in the Cleveland metropolitan area to learn what percent of their spending goes toward housing. Take μ to be the mean percent of spending devoted to housing among all Cleveland households.

H0: p = 31% Ha: p < 31%

Conditions for Significance Tests(just like Confidence Intervals)

• SRS (randomization)

• Normality (for means, proportions; requirements are different)

• Independence (population must be --or must be able to reasonably assume that it is-- at least 10 times as large as the sample size; and that one observation has no influence on any others)

Significance tests use test statistics...Some principles that apply to most tests:

• The test is based on a statistic that compares the value of the parameter (Ho : μ = ) with an estimate of the parameter from the sample data ( , )

• Values of the estimate far from the parameter value in the direction specified by the alternative hypothesis give evidence against H0

• To assess how far the estimate is from the parameter, standardize the estimate (what does this mean?)

Good evidence against μ = 6.7 minutes...

... But rather μ < 6.7 minutes this year. Data would unlikely happen if Ho were true. But how unlikely? Need precise way to measure ‘how unlikely.’

P-Values

A p-value is a quantitative measure of rarity of/how unlikely a finding

Small p-values are evidence against Ho

Large p-values fail to give evidence against Ho

Definition of a P-Value

The probability computed assuming that Ho is true, that the observed outcome would take a value as extreme as or more extreme than the actual observed value.

Let’s go back to the paramedics example again...

Ho: μ = 6.7 Ha: μ < 6.7z = -2.20

(Just fyi, in this example, negative values of z favor Ha over Ho (not always the case))

Remember, p-value is the probability.... Or the area under the curve... what’s the area under the curve for z = -2.20?

Ho: μ = 6.7 Ha: μ < 6.7p-value = 0.0139

Small p-value strong evidence against Ho

Favors alternative hypothesis, Ha: μ < 6.7 minutes

What’s the difference between...

Ho: p = 0.2 Ha: p < 0.2

and

Ho: p = 0.2 Ha: p ≠ 0.2

• Because the alternative is two-sided, the P-value is the probability of getting a z at least as far from 0 in either direction as the observed z = 1.20.

If Ha is 2-sided (≠), both directions count

Statistical Significance...

• Most of the time, we take one more step to assess evidence against Ho

• We compare p-value to some pre-determined value (versus ‘unlikely’) called a significance level, symbol α (alpha)

• Can think of this as a rejection zone (sketch)

Statistical Significance

• Most common α levels are α = 0.05 or α = 0.01• Interpretation:– At α = 0.05, data give evidence against Ho so

strong it would happen no more than 5% of the time

Statistical Significance

• If p-value is as small or smaller than α, we say data are statistically significant at level α

• Note: ‘significant’ in statistics doesn’t mean important (like in English); it means not likely to happen by chance

Statistically Significant Sketches

• If p-value is p = 0.03... this is significant at α = 0.05 level (in rejection zone)

• If p-value is p = 0.03... this is not significant at α = 0.01 level (not in rejection zone)

Interpretation/Wording

Reject Ho (Null Hypothesis):

This happens when sample statistic is statistically significant, p-value is too unlikely to have occurred by chance (we don’t believe null hypothesis), in the rejection zone

Wording must reference all of the following for a complete interpretation... p-value, α level, reject Ho, and conclusion in context (caution about using the word ‘cause’ or ‘prove’).

Interpretation/Wording

Fail to Reject Ho (Null Hypothesis):

This happens when sample statistic could have occurred by chance (we do believe null hypothesis; we don’t believe the alternative), not in rejection zone

Wording must reference all of the following for a complete interpretation... p-value, α level, fail to reject Ho, and conclusion in context (caution about using the word ‘cause’ or ‘prove’)

Tests about a population proportion ...

Conditions for Tests about a population proportion...

• Random Sample ... SRS or randomly selected or randomly assigned

• Large Sample Size; Normality ... npo ≥ 10 and n(1 – po) ≥ 10

• Independence ... Population at least 10 times sample size; and each observation has no influence on any other

Work stress...According to the National Institute for Occupational Safety and Health, job stress

poses a major threat to the health of workers. A national survey of restaurant employees found that 75% said that work stress had a negative impact on their personal lives.

A simple random sample of 100 employees from a large restaurant chain finds that 68 answer “Yes” when asked, “Does work stress have a negative impact on your personal life?” Is this good reason to think that the proportion of all employees in this chain who would say “Yes” differs from the national proportion p0 = 0.75?

H0: p = 0.75 Ha: p ≠ 0.75

We want to test a claim about p, the true proportion of this chain's employees who would say that work stress has a negative impact on their personal lives.

Work stress...Conditions: 1-proportion z test

SRS – stated in problem

Normality - The expected number of “Yes” and “No” responses are (100)(0.75) = 75 and (100)(0.25) = 25, respectively. Both are at least 10.

Independence - Since we are sampling without replacement, this “large chain” must have at least (10)(100) = 1000 employees; and we must assume that one employee does not influence the response of any other employee

Work stress....

Calculations for 1-prop z test; use Minitab

1 sample, proportion; change options and data as needed

Work stress...

Interpretation:

Fail to reject Ho. There is over a 10% (which is well over a reasonable α level) chance of obtaining a sample result as unusual as or even more unusual than we did ( = 0.68) when the null hypothesis is true. We have insufficient evidence to suggest that the proportion of this chain restaurant's employees who suffer from work stress is different from the national survey result, 0.75.

We want to be rich...

• In a recent study, 73% of first-year college students responding to a national survey identified “being very well-off financially” as an important personal goal. A state university finds that 132 of an SRS of 200 of its first-year students say that this goal is important.

• Is there evidence that the proportion of first-year students at this university who think being very well-off is important differs from the national value, 73%? Carry out a significance test to help answer this question.

n = 200; x = 132; SRS; p = .73; = 0.66

We want to test Ho: p = 0.73 versus Ha: p ≠ 0.73 regarding the proportion of first-year students at this university who think being very well-off is important differs from the national value of 73%

n = 200; x = 132; SRS; p = .73; = 0.66Conditions:SRS – stated in problem

Normality – np ≥ 10 & n (1 – p) ≥ 10(200)(0.73) ≥ 10 & (200) (1 -0.73) ≥ 10

Independence – We must assume at least (10)(200) first-year students in the population and that one student’s response does not influence any other student’s response.

Interpretation...

Reject Ho. With a p-value of 0.0258, and assuming an α = 0.05, we conclude that we do have statistically significant evidence that the proportion of all first-year students at this university who think being very well-off is important differs from the national value.

(determination, p-value, α, and context... Always)

Use & Abuse of Tests...

• Significance tests are used in a variety of settings... Marketing, FDA drug testing, discrimination court cases, etc.

• Significance tests quantify event that is unlikely to occur simply by chance

• Different levels of significance (α) are chosen depending on the given situation; typically α = 0.10, 0.05, or 0.01

Use & Abuse of Tests...

• P-values allow us to decide individually if evidence is sufficiently strong

• But, there is still no practical distinction between p-values of, say, 0.049 and 0.051

• Statistical inference does not correct basic flaws in survey or experimental design

Using Inference to Make Decisions...

Sometimes we do everything correctly... data collection, conditions, calculations, interpretation... but we still make an incorrect decision/determination... perhaps we just happen to get a sample statistic that is very extreme... that really doesn’t represent our population accurately

... we reject the null hypothesis when we really should have failed to reject (Ho was really true)

OR we fail to reject the null hypothesis when we really should have rejected the null hypothesis (Ho was really false)

... we make an error

Making errors when using inference...

• Type I ErrorWe reject Ho (null hypothesis) when Ho is really true

In other words, we determine Ha (alternative hypothesis) is true when, in actuality, Ho (null hypothesis) is true

• Type II ErrorWe fail to reject Ho (null hypothesis) when Ho is really false

In other words, we determine Ho (null hypothesis) is true, when, in reality, Ha (alternative hypothesis) is true

Type I and Type II Errors...

Paramedic Response Times Revisited...H0: μ = 6.7 minutes

Ha: μ < 6.7 minutes

... where μ was the mean response time to all calls involving life-threatening injuries this year.

Type I error: reject H0 when H0 is true

Description: The city manager concludes that the mean response time this year is less than 6.7 minutes (last year's average) when in fact the mean response time is still 6.7 minutes (or higher).

Consequences: The city manager believes that paramedic response times have improved when they really haven't. This could result in additional loss of life for accident victims.

Paramedic Response Times Revisited...H0: μ = 6.7 minutes

Ha: μ < 6.7 minutes

... where μ was the mean response time to all calls involving life-threatening injuries this year.

Type II error: fail to reject H0 when H0 is false

Description: The city manager decides that the paramedics' mean response time this year is still 6.7 minutes (or higher) when it is actually less than 6.7 minutes.

Consequences: The city manager may take action to decrease paramedic response times when such action is unnecessary. This could result in considerable expense for the city, as well as some disgruntled paramedics.

Probabilities of Type I and Type II Errors...

• Probability of Type I Error (rejecting Ho when null is really true): α, your significance level for the hypothesis test.

• Probability of Type II Error (failing to reject Ho when alternative is really true): β. Very complicated to calculate. Beyond scope of this course.

Power of a Test...

• Power: Probability that a test will reject Ho when Ha is true

• Think of power as making the correct decision, not making an error, not making a mistake

• High level of power is a good thing• Power = 1 – β (remember β is probability of

making a type II error); so ‘power’ and β are complimentary

Power of a Test...

• How can we increase power (making the correct decision)?

• Increase α• Increase n• Decrease standard deviation (same effect as

increasing the sample size, n)

Comparing Proportions from Two Populations: Hypothesis Testing

• Ho: p1 = p2

• Ha: p1 ≠ or > or < p2

We must first find the combined proportion of successes in both samples combined

= =

Two Proportion Hypothesis Testing

• Ho: p1 = p2

• Ha: p1 ≠ or > or < p2

Minitab will calculate this for us; no need to memorize

Two Proportion Hypothesis Testing Conditions...

• SRS – Each of the two samples must be SRSs from their respective populations or they must each be randomized experiments

• Normality – Each of the following are all ≥ 10(n1)(c)

(n1)(1 – c)

(n2)(c)

(n2)(1 – c)

Two Proportion Hypothesis Testing Conditions...

• IndependenceEach of the populations must be at least (10) times each of the corresponding sample sizes; and one sample does not influence the other

Confidence Interval for 1 – 2

To study the long-term effects of preschool programs for poor children, a research foundation has followed two groups of Michigan children since early childhood. A control group of 61 children represents Population 1, poor children with no pre-school. Another group of 62 from the same area and similar backgrounds attended pre-school as 3- and 4-year-olds represents Population 2, poor children who attend pre-school. Sizes are n1 = 61 and n2 = 62.

One response variable of interest is the need for social services as adults. In the past ten years, 38 of the preschool sample and 49 of the control sample have needed social services (mainly welfare). Carry out an hypothesis test to determine if there is significant evidence that pre-school reduces or increases the later need for social services?

n pre-school = 62 nno pre-school = 6138 of pre-school needed social services;49 of no pre-school needed social services

State null and alternative hypothesisHo: pno pre-school = ppre-school

Ha: pno pre-school ≠ ppre-school

ConditionsRandomization, Normality/Large

Sample,Independence

Ho: pno pre-school = ppre-school

Ha: pno pre-school ≠ ppre-school

Minitab to calculate test statistic, p-value, etc.

Two Sample, Proportion, Options & Data

Ho: pno pre-school = ppre-school

Ha: pno pre-school ≠ ppre-school

Interpretation:

Reject null hypothesis. At a significance level of 5% (α = 0.05), and a p-value of approximately 0.02 there is sufficient evidence to show that p no pre-school ≠ p pre-school

Fear of Crime...

The elderly fear crime more than younger people, even though they are less likely to be victims of crime. One of the few studies that looked at older blacks recruited random samples of 56 black women and 63 black men over the age of 65 from Atlantic City, New Jersey. Of the women, 27 said they “felt vulnerable” to crime; 46 of the men said this.

What proportion of women in the sample feel vulnerable? Of men? (Note: Men are victims of crime more often than women, so we expect a higher proportion of men to feel vulnerable.)

Fear of Crime...

Test the hypothesis that the true, unknown population proportion of elderly black males who feel vulnerable is higher than that of elderly black women who feel vulnerable.

Hypothesis, Conditions, Computations, Interpretation

Cholesterol & Heart Attacks...• High levels of cholesterol in the blood are associated

with higher risk of heart attacks. Will using a drug to lower blood cholesterol reduce heart attacks? The Helsinki Heart Study looked at this question. Middle-aged men were assigned at random to one of two treatments: 2051 men took the drug gemfibrozil to reduce their cholesterol levels, and a control group of 2030 men took a placebo. During the next five years, 56 men in the gemfibrozil group and 84 men in the placebo group had heart attacks.

• Is the apparent benefit of gemfibrozil statistically significant?

Ho: pgemfibrozil = pplacebo

Ha: pgemfibrozil < pplacebo

We want to use this comparative randomized experiment to draw conclusions about p1, the proportion of middle-aged men who would suffer heart attacks after taking gemfibrozil, and p2, the proportion of middle-aged men who would suffer heart attacks if they only took a placebo. We hope to show that gemfibrozil reduces heart attacks, so we have a one-sided alternative.

Note: you could also state as Ho: pgemfibrozil – pplacebo = 0Ha: pgemfibrozil – pplacebo < 0

A Civil Action

The movie A Civil Action tells the story of a major legal battle that took place in the small town of Woburn, Massachusetts. A town well that supplied water to East Woburn residents was contaminated by industrial chemicals. During the period that residents drank water from this well, a sample of 414 births showed 16 birth defects. On the west side of Woburn, a sample of 228 babies born during the same time period revealed 3 with birth defects. The plaintiffs suing the companies responsible for the contamination claimed that these data show that the rate of birth defects was significantly higher in East Woburn, where the contaminated well water was in use.

Assume all conditions have been checked and met. How strong is the evidence supporting this claim? What should the judge for this case conclude?

East = 16/414 = 0.0386 West = 3/228 = 0.0132

• Is the rate of birth defects in East Woburn higher than in West Woburn?Ho: pEast = pWest or pEast – pWest = 0

Ha: pEast > pWest or pEast – pWest > 0

Is the difference East – West , 0.0386 – 0.0132 = 0.0254 statistically significant? (remember, these are just s; don’t determine that 2% is within rejection zone! This is NOT a p-value; you must actually do the test to reach a p-value).

East = 16/414 = 0.0386 West = 3/228 = 0.0132

Ho: pEast = pWest or pEast – pWest = 0Ha: pEast > pWest or pEast – pWest > 0

p-value = 0.034

Seat Belt Use...

The proportion of drivers who use seat belts depends on things like age (young people are more likely to go unbelted) and gender (women are more likely to buckle up). It also depends on local law. Here are data from observing random samples of female Hispanic drivers in two cities:

Seat Belt Use...

Comparing local law suggests that a larger proportion of drivers wear seat belts in New York than in Boston. Do the data give good evidence that this is true for female Hispanic drivers? Justify your answer. Assume all conditions have been checked and met.

Ho: pNY = pB Ha: pNY > pB

p-value = 0.000000253

Reject Ho. With a p-value of ≈ 0, there is strong evidence at any reasonable α that a smaller proportion of female Hispanic drivers wear seat belts in Boston than in New York.

We check conditions for a reason...

• If conditions are not satisfied, our results may not be accurate, reliable, trustworthy, etc.