Chi Squared Test

Post on 25-Dec-2015

21 views 1 download

description

chi square

Transcript of Chi Squared Test

HYPOTHESIS TESTING

Chi squared test

Chi squared test

• Chi squared test is used when the population distribution under study has no parameters

• The symbol χ is used• The sampling distribution χ ² is called χ ²

distribution• The χ ² statistic is compared with its critical

value

ASSUMPTIONS

• 1. The experiment consists of n categories but independent trials. The outcome of each trial falls into each of k categories. The observed number in each category written as O1,O2,…On

• 2. If there are only 2 cells the expected frequency in each cell should be 5 or more

• 3. For more than two cells, if more than 20 % of cells have expected frequencies less than 5 then χ ² should not be applied

ASSUMPTIONS

• 4.Samples must be drawn randomly from population on interest

• 5. The sample should contain at least 50 observations

• 6. The data should be expressed in original units rather than in percentage or ratio

Chi squared test statistic

• χ ² = Σ (O-E) ² / E• O = an observed frequency in particular category• E = expected frequency in particular category• Decision rule– The calculated value of χ ² test statistic is compared

at particular level of significance and degree of freedom

– If χ ²cal > χ ²critical then null hypothesis is rejected in favor of alternate hypothesis

– The degree of freedom for χ ² test statistic depends on the test and certain other factors

APPLICATIONS OF χ ² TEST

A few important applications of χ ² are1. Test of Independence2. Test of goodness of fit3. Yale’s correction for continuity4. Test for population variance5. Test for Homogeneity

Contingency Table Analysis : χ ² test of Independence

• The χ ² test of independence is used to analyze the frequency of two qualitative variables or attributes with multiple categories to determine whether the two variables are independent

• The χ ² test of independence can be used to analyze any level of measurement, but it is particularly useful in analyzing nominal data

Contingency Table Analysis : χ ² test of Independence

• For e.g.,• Whether voters can be classified by gender is

independent of the political affiliation• Whether university students classified by

gender are independent of courses of study• Whether wage earners classified by education

level are independent of income

Contingency Table Analysis : χ ² test of Independence

• Contingency Table – When observations (frequencies) are classified according to two qualitative variables or attributes and arranged in a table the display is called a contingency table

Contingency Table Analysis : χ ² test of Independence

• The value Oij is the observed frequency for the cell in row I and column j

• The ‘total sum’ rows and columns are sum of the frequencies in respective rows and columns.

• ‘N’ is total of frequencies

Variable A

Variable B

A1 A2

…….

Ac Total

B1 O11 O12 …. O1c R1B2 O21 O22 … O2c R2. ….. ….. …Br Or1 Or2 … Orc RrTotal C1 C2 Cc N

Contingency Table Analysis : χ ² test of Independence

Eij

Row i total x column j total x grand total Sample size sample size Ri x Cj x N Ri x Cj

N N N

=

==

Contingency Table Analysis : χ ² test of Independence

• The analysis of two way contingency table helps to answer the question whether the two variable are unrelated or independent of each other

• The null hypothesis for a χ ² test of Independence is that two variables are independent

Procedure

• Step 1 –State null hypothesis and alternate hypothesis

Ho : The variables are independent. No relationship exists

H1: A relationship exists• Step 2- – Select a random sample and record observed

frequencies (O values)in each cell of contingency table

– Calculate row, column and grand total

Procedure

• Step 3 – calculate the expected frequencies (E values)for each cell

E = (row total x column total) / grand total• Step 4 – Compute the value of test statistic

χ ² = Σ (O-E)² / E• Step 5 – Calculate the degrees of freedom. The

degree of freedom for the χ ² test of independence

df=(number of rows -1)(number of columns -1) = (r-1)(c-1)

Procedure

• Step 6 – Using a level of significance α and df find the critical value of χ ²α. This value of corresponds to an area in right tail of the distribution

• Step 7 -Compare the calculated and table value of χ ²

• Decision rule– Accept Ho if χ ²cal is less than table value χ ²(r-1)(c-1)– Otherwise reject Ho

EXAMPLE 1• Two hundred randomly selected adults were asked

whether TV shows as a whole are primarily entertaining , educational or a waste of time (only one answer to be chosen). The respondents were categorized as gender. Opinions are as

Gender Entertaining Educational Waste of time

Total

Female 52 28 30 110

Male 28 12 50 90

Total 80 40 80 200

Opinion

EXAMPLE 1

• Is this convincing that there is a relationship between gender and opinion in the population interest

• The critical value of χ ² =5.99at α =0.05 and df=2

EXAMPLE 1 -Solution

• Let us assume the null hypothesis that the opinion of adults is independent of gender

• The contingency table is of size 2x3, the degree of freedom is (2-1)(3-1)=2. Therefore we would have to calculate only two expected frequencies and other four can be automatically determined

EXAMPLE 1 -Solution

E11 = row 1 total x column 1 total /grand total = 110 x 80/200 = 44 E22 = row1 total x column 2 total/grand total =110 x 40/200 = 22E13 = 110-(44+22) = 44E21 = 80-E11 = 80-44 = 36E22 = 40-E12 = 40-22 =18E23 = 80-E13 = 80-44 = 36

EXAMPLE 1 -Solution

The contingency table of expected frequencies is as follows

Gender Entertaining Educational Waste of time

Total

Female 52 28 30 110Male 28 12 50 90Total 80 40 80 200

EXAMPLE 1 -Solution

Observed (O)

Expected(E) (O-E) (O-E)² (O-E)²/E

522830281250

EXAMPLE 1 -Solution

Observed (O)

Expected(E) (O-E) (O-E)² (O-E)²/E

52 4428 2230 4428 3612 1850 36

EXAMPLE 1 -Solution

Observed (O)

Expected(E) (O-E) (O-E)² (O-E)²/E

52 44 828 22 630 44 1428 36 -812 18 -650 36 14

EXAMPLE 1 -Solution

Observed (O)

Expected(E) (O-E) (O-E)² (O-E)²/E

52 44 8 6428 22 6 3630 44 14 19628 36 -8 6412 18 -6 3650 36 14 196

EXAMPLE 1 -Solution

Observed (O)

Expected(E) (O-E) (O-E)² (O-E)²/E

52 44 8 64 1.45428 22 6 36 1.63630 44 14 196 4.45528 36 -8 64 1.77712 18 -6 36 2.00050 36 14 196 5.444

16.766

EXAMPLE 1 -Solution

The critical value of χ ² =5.99 at α =0.05 and df=2. Since the calculated value of χ ² =16.777 is more than its critical value , the null hypothesis is rejected. Hence we conclude that the opinion of adults is not independent of gender