21
The Practice of Statistics Third Edition Chapter (13.1) 14.1: Chi-square Test for Goodness of Fit Copyright © 2008 by W. H. Freeman & Company Daniel S. Yates

bethanie-ward
• Category

## Documents

• view

215

1

### Transcript of The Practice of Statistics Third Edition Chapter (13.1) 14.1: Chi-square Test for Goodness of Fit...

The Practice of StatisticsThird Edition

Chapter (13.1) 14.1:Chi-square Test for Goodness of Fit

Daniel S. Yates

When to use the Chi Square, χ2, Procedure

• Used when the dependent variable is categorical or ranked data.

• When the assumptions about the population are not reasonable. For example, populations that are non-normal distributions.

This Chapter will Cover Three Tests based on the Chi-square

Distributions.• Test if observed counts for a categorical data

could come from a certain hypothesized distribution. ( Goodness of Fit).

• Test whether a single categorical variable has the same distribution in two or more distinct population. ( Inference for Two-Way Tables, Tests for Homogeneity of Populations)

• Test whether two categorical variables are associated or independent. (Inference for Two-Way Tables, Tests for of Association/ Independence.)

Required Conditions For Goodness of Fit Procedure

• SRS • The observations must be independent and each

observation must fit into one and only one cell or category.

• All individual expected counts are at least one and no more than 20% of the expected counts are less than 5.

• Please note: We are working with counts – not proportions.

• There is no mention of normality. Chi-squared procedures do not rely on assumptions about the population from which the sample is selected.

Hypothesizes for Goodness of Fit Test

• H0 = the actual population proportions are equal to the hypothesized proportions.

• Ha = the actual proportions are different from the hypothesized proportions.

Chi-square Test Statistic

E

EO 22

22

count expected

count) expected -count (observed

Degree of freedom = k – 1, where k is the number of categories.

Use the appropriate chi-square distribution based on degree of freedom, to find the critical value of χ2 at an α level.

Properties of Chi-square Distributions

• Total area under the curve is one.• Each chi-square distribution except for df

= 1 start at the origin, increases to a peak and then approach the x-axis asymptotically form above.

• Each distribution is skewed to the right. As the number of degrees of freedom increases the distribution becomes for symmetrical and looks like a normal curve.

Example 1• Consider the problem of determining whether the distribution of car sales in

the Eastern United States in the current year for Nissans, Mazdas, Toyotas and Hondas is the same as the known distribution of the pervious year, given in the table below:

Nissan 18%

Mazda 10%

Toyota 35%

Honda 37%

From the Motor Vehicle Bureau records, we select a random sample of 1,000 of new car purchases for one of these four types of foreign cars in the current year. The information is displayed below:

Frequency

Nissan 150

Mazda 65

Toyota 385

Honda 400

Is the current year’s sales distribution the same as last year’s sales ?

Example 1 Continued

• Step 1 – We want to determine if the sales distribution is different from last year’s sales distribution.– Population – this year sales of Nissan, Mazda,

Toyota, and Hondas.– Parameter – the proportion of each car sold.– H0 = The current year’s sales distribution is the same

as that of the pervious year’s distribution ( Nissan: 18%, Mazda: 10%, Toyota: 35%, and Honda: 37%).

– Ha = The current year’s sales distribution is not the same as the previous year.

Example 1 Continued

• Step 2 Condition– SRS – Random sample taken from the Motor Vehicle

Bureau. We do not know if the sample was taken from all state motor vehicle bureau is eastern United States. We will assume we have an SRS.

– Expected counts:Nissan: 0.18 x 1000 = 180

Mazda: 0.10 x 1000 = 100 Toyota: 0.35 x 1000 = 350 Honda: 0.37 x 1000 = 370 All expected counts are at least 5 or more.– Independence - observations or counts are

independent.

Example 1 Continued• Step 3 Calculations

Nissan 150 180 5

Mazda 65 100 12.25

Toyota 385 350 3.50

Honda 400 370 2.43

Observed Expected

Count (O) Count (E)

E

EO 2

Sum = 23.18

k

i E

EO

1

22

14 18.23 From Table D using df = 3 and α = 0.05, the critical χ2 * = 7.81.

Example 1 Continued

• Step 4 InterpretationSince χ2 = 23.18 is to the right of χ2*, the P-value is smaller than α = 0.05. The results are statistically significant to reject H0. The current sales distribution is not the same as last year’s sales distribution.

• The test only tells you there is a change. Additional analysis may be required. We need to look at (O –E)2/E column to find the major contributor to the Chi-square statistic. In this problem, not as many Mazda were sold in the current year.

Example 2

• Are you more likely to have a motor vehicle collision when using a cell phone? A study of 699 drivers who were using a cell phone when they were involved in a collision examined this question. These drivers made 26,798 cell phone calls during a 14 month study period, Each of the 699 collisions was classified in various ways. Here are the counts for each day of the week:

Day: Sun Mon Tues Wed Thu Fri Sat Total Num 20 133 126 159 136 113 12 699 Are the accidents equally likely to occur on any day of

the week?

Example 2 Continued

• Step 1– Population?– Parameter?– H0?– Ha?

• Step 1– Population – all accidents

involving cell phones.– Parameter – proportion of

accidents for each day of the week.

– H0: Motor vehicle accidents involving cell phone use are equally likely to occur on each day of the week.

– Ha: The probabilities of a motor accident involving a cell phone use vary from day to day ( not all the same.)

Example 2 continued

• Step 2 Conditions– SRS?– Expected counts?– Independent?

• Step 2– SRS Assume an SRS.– Expected counts are:Sun 699 x (1/7) = 99.857Mon 699 x (1/7) = 99.857Tue 699 x (1/7) = 99.857Wed 699 x (1/7) = 99.857Thu 699 x (1/7) = 99.857 Fri 699 x (1/7) = 99.857 Sat 699 x (1/7) = 99.857All expected counts are

greater than 5.- The observed counts are

independent.

Example 2 Continued

• Step 3 Calculations

Use calculator.

L1 = Observed counts

L2 = Expected counts

L3 – (O –E)2/E = (L1 – L2)2/ L2

Sum (L3)

Sum = χ2

2nd Distr χ2cdf( Lower bound, Upper bound, df)

Example 2 Continued

• Step 4 Interpretation– The P-value is extremely small. At α = 0.05

we would reject H0. The accidents involving cell phones are not evenly distributed over the days of the week.

– Additional analysis: Saturday and Sunday provided the biggest contribution to χ2 statistic. There were less accidents involving cell phones over the weekends.