Chi-square Test of Independence Hypotheses Neha Jain Lecturer School of Biotechnology Devi Ahilya...

28
Chi-square Test of Independence Hypotheses Neha Jain Lecturer School of Biotechnology Devi Ahilya University, Indore

Transcript of Chi-square Test of Independence Hypotheses Neha Jain Lecturer School of Biotechnology Devi Ahilya...

Page 1: Chi-square Test of Independence Hypotheses Neha Jain Lecturer School of Biotechnology Devi Ahilya University, Indore.

Chi-square Test of Independence Hypotheses

Neha Jain

Lecturer

School of Biotechnology

Devi Ahilya University, Indore

Page 2: Chi-square Test of Independence Hypotheses Neha Jain Lecturer School of Biotechnology Devi Ahilya University, Indore.

Chi-square Test of Independence The chi-square test of independence is probably the

most frequently used hypothesis test in the sciences.

Chi-square, symbolically written as χ2 (Pronounced as Ki-square), is a statistical measure used in the context of sampling analysis for comparing a variance to a theoretical variance.

A fundamental problem in genetics is determining whether the experimentally determined data fits the results expected from theory (i.e. Mendel’s laws as expressed in the Punnett square). This can be solved by using Chi- Square test.

Page 3: Chi-square Test of Independence Hypotheses Neha Jain Lecturer School of Biotechnology Devi Ahilya University, Indore.

It can be used to determine if categorical data shows dependency or the two classifications are independent. It can also be used to make comparisons between theoretical populations and actual data when categories are used.”

The test is, in fact, a technique through the use of which it is possible for all researchers to (i) test the goodness of fit; (ii) test the significance of association between two attributes, and (iii) test the homogeneity or the significance of population variance.

Page 4: Chi-square Test of Independence Hypotheses Neha Jain Lecturer School of Biotechnology Devi Ahilya University, Indore.

Independence Defined Two variables are independent if, for all

cases, the classification of a case into a particular category of one variable (the group variable) has no effect on the probability that the case will fall into any particular category of the second variable (the test variable).

When two variables are independent, there is no relationship between them. We would expect that the frequency breakdowns of the test variable to be similar for all groups.

Page 5: Chi-square Test of Independence Hypotheses Neha Jain Lecturer School of Biotechnology Devi Ahilya University, Indore.

Example Suppose we are interested in the

relationship between gender and attending college.

If there is no relationship between gender and attending college and 40% of our total sample attend college, we would expect 40% of the males in our sample to attend college and 40% of the females to attend college.

If there is a relationship between gender and attending college, we would expect a higher proportion of one group to attend college than the other group, e.g. 60% to 20%.

Page 6: Chi-square Test of Independence Hypotheses Neha Jain Lecturer School of Biotechnology Devi Ahilya University, Indore.

Independent and Dependent Relationships

Independent Relationship between Gender and College

40% 40% 40%

0%

20%

40%

60%

80%

100%

Males Females TotalPo

po

rtio

n A

tte

nd

ing

Co

lleg

e

Dependent Relationship between Gender and College

60%

20%

40%

0%

20%

40%

60%

80%

100%

Males Females TotalPo

po

rtio

n A

tte

nd

ing

Co

lleg

e

Page 7: Chi-square Test of Independence Hypotheses Neha Jain Lecturer School of Biotechnology Devi Ahilya University, Indore.

Expected Frequencies Expected frequencies are computed as if there is no

difference between the groups, i.e. both groups have the same proportion as the total sample in each category of the test variable.

Since the proportion of subjects in each category of the group variable can differ, we take group category into account in computing expected frequencies as well.

To summarize, the expected frequencies for each cell are computed to be proportional to both the breakdown for the test variable and the breakdown for the group variable.

Expected frequencies are the number of observations that would be expected for each category of a frequency distribution assuming the null hypothesis is true with chi-squared analysis."

Page 8: Chi-square Test of Independence Hypotheses Neha Jain Lecturer School of Biotechnology Devi Ahilya University, Indore.

Observed Frequencies The test of independence starts with

frequencies or counts we observe in our sample, or the observed frequencies.

Observed frequencies are the number of actual observations noted for each category of a frequency distribution with chi-squared analysis. 

This is what be get through our Experimentation.

For example, The frequency of 5 in the sample 4, 6, 5, 7, 4,

5, 2, 5 is 3.

Page 9: Chi-square Test of Independence Hypotheses Neha Jain Lecturer School of Biotechnology Devi Ahilya University, Indore.

Hypothesis The research hypothesis states that the two variables

are dependent or related. This will be true if the observed counts for the categories of the variables in the sample are different from the expected counts.

The null hypothesis is that the two variables are independent. This will be true if the observed counts in the sample are similar to the expected counts.

Suppose that Variable A has r levels, and Variable B has c levels. The null hypothesis states that knowing the level of Variable A does not help you predict the level of Variable B. That is, the variables are independent.

H0: Variable A and Variable B are independent.  Ha: Variable A and Variable B are not independent.

The alternative hypothesis is that knowing the level of Variable A can help you predict the level of Variable B.

Page 10: Chi-square Test of Independence Hypotheses Neha Jain Lecturer School of Biotechnology Devi Ahilya University, Indore.

The level of significance The level of significance : This is a very important

concept in the context of hypothesis testing. It is always some percentage (usually 5%) which

should be chosen wit great care, thought and reason. In case we take the significance level at 5 per cent,

then this implies that H0 will be rejected. when the sampling result (i.e., observed evidence) has a less than 0.05 probability of occurring if H0 is true.

In other words, the 5 per cent level of significance means that researcher is willing to take as much as a 5 per cent risk of rejecting the null hypothesis when it (H0) happens to be true. Thus the significance level is the maximum value of the probability of rejecting H0 when it is true and is usually determined in advance before testing the hypothesis.

Page 11: Chi-square Test of Independence Hypotheses Neha Jain Lecturer School of Biotechnology Devi Ahilya University, Indore.

Expected Frequencies versus Observed Frequencies The chi-square test of independence plugs

the observed frequencies and expected frequencies into a formula which computes how the pattern of observed frequencies differs from the pattern of expected frequencies.

The general formula is

exp

exp)( 22 obs

Page 12: Chi-square Test of Independence Hypotheses Neha Jain Lecturer School of Biotechnology Devi Ahilya University, Indore.

• where – O = observed data in each category– E = observed data in each category based on

the experimenter’s hypothesis = Sum of the calculations for each category

If two distributions (observed and theoretical) are exactly alike, χ2 = 0; but generally due to sampling errors, χ2 is not equal to zero

Page 13: Chi-square Test of Independence Hypotheses Neha Jain Lecturer School of Biotechnology Devi Ahilya University, Indore.

Degrees of Freedom If there are 10 frequency classes and there is

one independent constraint, then there are (10 – 1) = 9 degrees of freedom.

Thus, if ‘n’ is the number of groups and one constraint is placed by making the totals of observed and expected frequencies equal, the d.f. would be equal to

(n– 1).

Page 14: Chi-square Test of Independence Hypotheses Neha Jain Lecturer School of Biotechnology Devi Ahilya University, Indore.

P-value P-value. The P-value is the probability of observing a

sample statistic as extreme as the test statistic. The probability value (p-value) of a statistical hypothesis

test is the probability of getting a value of the test statistic as extreme as or more extreme than that observed by chance alone, if the null hypothesis H0, is true.

It is the probability of wrongly rejecting the null hypothesis if it is in fact true.

It is equal to the significance level of the test for which we would only just reject the null hypothesis. The p-value is compared with the actual significance level of our test and, if it is smaller, the result is significant. That is, if the null hypothesis were to be rejected at the 5% signficance level, this would be reported as "p < 0.05".

Small p-values suggest that the null hypothesis is unlikely to be true. The smaller it is, the more convincing is the rejection of the null hypothesis. It indicates the strength of evidence for say, rejecting the null hypothesis H0, rather than simply concluding "Reject H0' or "Do not reject H0".

Page 15: Chi-square Test of Independence Hypotheses Neha Jain Lecturer School of Biotechnology Devi Ahilya University, Indore.

CONDITIONS FOR THE APPLICATION OF χ2 TEST The following conditions should be satisfied before χ2

test can be applied:

(i) Observations recorded and used are collected on a random basis.

(ii) All the items in the sample must be independent. (iii) No group should contain very few items, say less

than 10. In case where the frequencies are less than 10, regrouping is done by combining the frequencies of adjoining groups so that the new frequencies become greater than 10. Some statisticians take this number as 5, but 10 is regarded as better by most of the statisticians.

(iv) The overall number of items must also be reasonably large. It should normally be at least 50, howsoever small the number of groups may be.

Page 16: Chi-square Test of Independence Hypotheses Neha Jain Lecturer School of Biotechnology Devi Ahilya University, Indore.

Test of Goodness of Fit As a test of goodness of fit, χ2 test enables us to see

how well does the assumed theoretical distribution fit to the observed data.

When some theoretical distribution is fitted to the given data, we are always interested in knowing as to how well this distribution fits with the observed data.

The chi-square test can give answer to this. If the calculated value of χ2 is less than the table value at a certain level of significance, the fit is considered to be a good one which means that the divergence between the observed and expected frequencies is attributable to fluctuations of sampling.

But if the calculated value of χ2 is greater than its table value, the fit is not considered to be a good one.

Page 17: Chi-square Test of Independence Hypotheses Neha Jain Lecturer School of Biotechnology Devi Ahilya University, Indore.

Chi Square test of Independence As a test of independence, χ2 test enables us to explain whether or not

two attributes are associated. For instance, we may be interested in knowing whether a new medicine

is effective in controlling fever or not, χ2 test will helps us in deciding this issue. In such a situation, we proceed with the null hypothesis that the two attributes (viz., new medicine and control of fever) are independent which means that new medicine is not effective in controlling fever.

On this basis we first calculate the expected frequencies and then work out the value of χ2.

If the calculated value of χ2 is less than the table value. we conclude that null hypothesis stands which means that the two attributes are independent or not associated (i.e., the new medicine is not effective in controlling the fever). But if the calculated value of χ2 is greater than its table value, our inference then would be that null hypothesis does not hold good which means the two attributes are associated and the association is not because of some chance factor but it exists in reality (i.e., the new medicine is effective in controlling the fever and as such may be prescribed).

Page 18: Chi-square Test of Independence Hypotheses Neha Jain Lecturer School of Biotechnology Devi Ahilya University, Indore.

STEPS INVOLVED IN APPLYING CHI-SQUARE TEST The various steps involved are as follows: First of all calculate the expected frequencies on the basis

of given hypothesis or on the basis of null hypothesis.

Obtain the difference between observed and expected frequencies and find out the squares of such differences i.e., calculate (Oij – Eij)2

Divide the quantity (Oij – Eij)2 obtained as stated above by the corresponding expected frequency to get

(Oij – Eij)2/Eij. Find the summation of (Oij – Eij)2/Eij values. This is the

required χ2value. The χ2 value obtained as such should be compared with

relevant table value of χ2 and then inference be drawn as stated above.

Page 19: Chi-square Test of Independence Hypotheses Neha Jain Lecturer School of Biotechnology Devi Ahilya University, Indore.

Critical Values of the χ2 DistributionP

df 0.995 0.975 0.9 0.5 0.1 0.05 0.025 0.01 0.005 df

1 .000 .000 0.016 0.455 2.706 3.841 5.024 6.635 7.879 1

2 0.010 0.051 0.211 1.386 4.605 5.991 7.378 9.210 10.597 2

3 0.072 0.216 0.584 2.366 6.251 7.815 9.348 11.345 12.838 3

4 0.207 0.484 1.064 3.357 7.779 9.488 11.143 13.277 14.860 4

5 0.412 0.831 1.610 4.351 9.236 11.070 12.832 15.086 16.750 5

6 0.676 1.237 2.204 5.348 10.645 12.592 14.449 16.812 18.548 6

7 0.989 1.690 2.833 6.346 12.017 14.067 16.013 18.475 20.278 7

8 1.344 2.180 3.490 7.344 13.362 15.507 17.535 20.090 21.955 8

9 1.735 2.700 4.168 8.343 14.684 16.919 19.023 21.666 23.589 9

10 2.156 3.247 4.865 9.342 15.987 18.307 20.483 23.209 25.188 10

11 2.603 3.816 5.578 10.341 17.275 19.675 21.920 24.725 26.757 11

12 3.074 4.404 6.304 11.340 18.549 21.026 23.337 26.217 28.300 12

13 3.565 5.009 7.042 12.340 19.812 22.362 24.736 27.688 29.819 13

14 4.075 5.629 7.790 13.339 21.064 23.685 26.119 29.141 31.319 14

15 4.601 6.262 8.547 14.339 22.307 24.996 27.488 30.578 32.801 15

Page 20: Chi-square Test of Independence Hypotheses Neha Jain Lecturer School of Biotechnology Devi Ahilya University, Indore.

 The Chi square test in Genetics. Example 1 Genetic theory states that children having one parent

of blood type A and the other of blood type B will always be of one of three types, A, AB, Band that the proportion of three types will on an average be as 1 : 2 : 1. A report states that out of 300 children having one Aparent and B parent, 30 per cent were found to be types A, 45 per cent per cent type AB and remainder type B. Test the hypothesis by χ2test.

Solution: The observed frequencies of type A, AB and B is given

in the question are 90, 135 and 75 respectively. The expected frequencies of type A, ABand B(as per

the genetic theory) should have been 75,150 and 75 respectively.

Page 21: Chi-square Test of Independence Hypotheses Neha Jain Lecturer School of Biotechnology Devi Ahilya University, Indore.

We now calculate the value of χ2 as follows:

Page 22: Chi-square Test of Independence Hypotheses Neha Jain Lecturer School of Biotechnology Devi Ahilya University, Indore.

Critical Values of the χ2 DistributionP

df 0.995 0.975 0.9 0.5 0.1 0.05 0.025 0.01 0.005 df

1 .000 .000 0.016 0.455 2.706 3.841 5.024 6.635 7.879 1

2 0.010 0.051 0.211 1.386 4.605 5.991 7.378 9.210 10.597 2

3 0.072 0.216 0.584 2.366 6.251 7.815 9.348 11.345 12.838 3

4 0.207 0.484 1.064 3.357 7.779 9.488 11.143 13.277 14.860 4

5 0.412 0.831 1.610 4.351 9.236 11.070 12.832 15.086 16.750 5

6 0.676 1.237 2.204 5.348 10.645 12.592 14.449 16.812 18.548 6

7 0.989 1.690 2.833 6.346 12.017 14.067 16.013 18.475 20.278 7

8 1.344 2.180 3.490 7.344 13.362 15.507 17.535 20.090 21.955 8

9 1.735 2.700 4.168 8.343 14.684 16.919 19.023 21.666 23.589 9

10 2.156 3.247 4.865 9.342 15.987 18.307 20.483 23.209 25.188 10

11 2.603 3.816 5.578 10.341 17.275 19.675 21.920 24.725 26.757 11

12 3.074 4.404 6.304 11.340 18.549 21.026 23.337 26.217 28.300 12

13 3.565 5.009 7.042 12.340 19.812 22.362 24.736 27.688 29.819 13

14 4.075 5.629 7.790 13.339 21.064 23.685 26.119 29.141 31.319 14

15 4.601 6.262 8.547 14.339 22.307 24.996 27.488 30.578 32.801 15

Page 23: Chi-square Test of Independence Hypotheses Neha Jain Lecturer School of Biotechnology Devi Ahilya University, Indore.

The calculated value of χ 2 is 4.5 which is less than the table value and hence can be ascribed to have taken place because of chance. This supports the theoretical hypothesis of the genetic theory that on an average type A, AB and B stand in the proportion of 1 : 2 : 1.

Page 24: Chi-square Test of Independence Hypotheses Neha Jain Lecturer School of Biotechnology Devi Ahilya University, Indore.

Example 2

Page 25: Chi-square Test of Independence Hypotheses Neha Jain Lecturer School of Biotechnology Devi Ahilya University, Indore.
Page 26: Chi-square Test of Independence Hypotheses Neha Jain Lecturer School of Biotechnology Devi Ahilya University, Indore.
Page 27: Chi-square Test of Independence Hypotheses Neha Jain Lecturer School of Biotechnology Devi Ahilya University, Indore.

CAUTION IN USING χ2 TEST

The chi-square test is no doubt a most frequently used test, but its correct application is equally an uphill task. It should be borne in mind that the test is to be applied only when the individual observations of sample are independent which means that the occurrence of one individual observation (event) has no effect upon the occurrence of any other observation (event) in the sample under consideration. Small theoretical frequencies, if these occur in certain groups, should be dealt with under special care. The other possible reasons concerning the improper application or misuse of this test can be

(i) neglect of frequencies of non-occurrence; (ii) failure to equalise the sum of observed and the sum of the

expected frequencies; (iii) wrong determination of the degrees of freedom; (iv) wrong computations, and the like. The researcher while

applying this test must remain careful about all these things and must thoroughly understand the rationale of this important test before using it and drawing inferences in respect of his hypothesis.

Page 28: Chi-square Test of Independence Hypotheses Neha Jain Lecturer School of Biotechnology Devi Ahilya University, Indore.

Thanks