Lecture 9: the 2-test - Oxford Statisticsmassa/Lecture 9.pdf · Lecture 9: the ˜2-test S. Massa,...

37
Lecture 9: the χ 2 -test S. Massa, Department of Statistics, University of Oxford 19 January 2016

Transcript of Lecture 9: the 2-test - Oxford Statisticsmassa/Lecture 9.pdf · Lecture 9: the ˜2-test S. Massa,...

Page 1: Lecture 9: the 2-test - Oxford Statisticsmassa/Lecture 9.pdf · Lecture 9: the ˜2-test S. Massa, Department of Statistics, University of Oxford 19 January 2016

Lecture 9: the χ2-test

S. Massa, Department of Statistics, University of Oxford

19 January 2016

Page 2: Lecture 9: the 2-test - Oxford Statisticsmassa/Lecture 9.pdf · Lecture 9: the ˜2-test S. Massa, Department of Statistics, University of Oxford 19 January 2016

Hypothesis Testing

1. Begin with a research hypothesis (this is usually thealternative hypothesis)

2. Decide significance level, e.g. 5%;

3. Set up the null hypothesis (roughly the opposite of theresearch hypothesis);

4. Calculate a test statistic from your data;

5. Calculate the p-value: the probability under the nullhypothesis of observing something at least as extreme as yourobservation.

6. Reject the null if the p-value is less than the significance level,5%.

Page 3: Lecture 9: the 2-test - Oxford Statisticsmassa/Lecture 9.pdf · Lecture 9: the ˜2-test S. Massa, Department of Statistics, University of Oxford 19 January 2016

The Z-test

Roughly the Z-test is given by

Z =observed− expected

standard error.

Example (Population mean)

Suppose X1, · · · , Xn are n i.i.d. normal samples with known SD σ.

H0: mean = µ;

H1: mean 6= µ;

Then with X̄ the sample mean, and since SE = σ/√n

Z =X̄ − µσ/√n∼ N(0, 1). (1)

Page 4: Lecture 9: the 2-test - Oxford Statisticsmassa/Lecture 9.pdf · Lecture 9: the ˜2-test S. Massa, Department of Statistics, University of Oxford 19 January 2016

The Z-test

Example (Test for a proportion)

When testing the probability of success in n trials

H0: P(success)= p0;

H1: P(success)6= p0;

Under the null hypothesis the expected proportion of successes isp0, the standard error is

√p0(1− p0)/n and the test statistic is

Z =proportion of successes− p0√

p0(1− p0)/n≈ N(0, 1).

Page 5: Lecture 9: the 2-test - Oxford Statisticsmassa/Lecture 9.pdf · Lecture 9: the ˜2-test S. Massa, Department of Statistics, University of Oxford 19 January 2016

Z-test recap

Some good properties of the Z-statistic

I computed from the data (definition of statistic);

I extreme values correspond intuitively to failure of the null;

I under the null hypothesis we know the distribution of Z;

I very good for testing whether the mean of quantitative datacould have a certain value.

But what about categorical/qualitative data?

Page 6: Lecture 9: the 2-test - Oxford Statisticsmassa/Lecture 9.pdf · Lecture 9: the ˜2-test S. Massa, Department of Statistics, University of Oxford 19 January 2016

Suicide by Month of Birth

Month Prob Female Male Total

Jan 0.0849 527 1774 2301Feb 0.0773 435 1639 2074Mar 0.0849 454 1939 2393Apr 0.0821 493 1777 2270May 0.0849 535 1969 2504Jun 0.0821 515 1739 2254Jul 0.0849 490 1872 2362Aug 0.0849 489 1833 2322Sep 0.0821 476 1624 2100Oct 0.0849 474 1661 2135Nov 0.0821 442 1568 2010Dec 0.0849 471 1690 2161

Total 1.0000 5801 21085 26886

Mean 483.4 1757.1 2240.5

Table: Suicides 1979–2001 by month of birth in England and Wales1955–66.

Page 7: Lecture 9: the 2-test - Oxford Statisticsmassa/Lecture 9.pdf · Lecture 9: the ˜2-test S. Massa, Department of Statistics, University of Oxford 19 January 2016

Suicide by Month of Birth

I Of course different months of birth have different numbers ofsuicides, due to chance variation.

I Is there an underlying relation? Is it more likely that onecommits suicide if they were born in a particular month?

I It seems there are slightly more suicides among those born inMarch, April and May. This will be our research hypothesis.

H0: a suicide is equally to be born on any day of the year,1/365.25.

Does the z-test work in this setting?

Page 8: Lecture 9: the 2-test - Oxford Statisticsmassa/Lecture 9.pdf · Lecture 9: the ˜2-test S. Massa, Department of Statistics, University of Oxford 19 January 2016

Does the z-test Work in This Setting?

I One choice is to group March to June into ”Spring”, and”Other” for all the other months. We can now test for theproportion of spring born suicides.

I Then 9421 spring born, and 17465 other. 122 days in thespring category

H0 : p0 = Probability of a suicide being born in spring

= 122/365.25 = 0.33.

I Expected = 26886× p0 = 8980, SE =√p0(1− p0)n = 77.3.

I Then the z-statistic becomes

Z =observed− expected

SE=

9421− 8980

77.3= 5.71,

giving a p-value of less than 10−8.

Page 9: Lecture 9: the 2-test - Oxford Statisticsmassa/Lecture 9.pdf · Lecture 9: the ˜2-test S. Massa, Department of Statistics, University of Oxford 19 January 2016

Problem with Using the z-test with More than Two Levels

I The problem is that we made a choice about how to split upthe months.

I This may give one the opportunity to choose the particularsplitting that proves the desired result.

I This problem will always appear whenever there are more thantwo levels in our categorical variables (12 months).

I We need a method that deals with all categoriessimultaneously.

Page 10: Lecture 9: the 2-test - Oxford Statisticsmassa/Lecture 9.pdf · Lecture 9: the ˜2-test S. Massa, Department of Statistics, University of Oxford 19 January 2016

Goodness of Fit Tests

I Observe n samples of a categorical variable with k levels.H0 : the categories have the probabilities p1, . . . , pk.

I Under the null hypothesis, since there are n observations theexpected counts in the categories are np1, . . . , npk.

I Our statistic should measure how far the observed countsn1, . . . , nk deviate from the expected ones np1, . . . , npk.

I Add up (ni − npi)2.

I Getting 1 rather than 10 in a category should count more thangetting 1010 while expecting 1000. So we need to normalise.

I Therefore we get the χ2-statistic

X2 :=(n1 − np1)2

np1+ · · ·+ (nk − npk)2

npk

=∑ (observed− expected)2

expected.

Page 11: Lecture 9: the 2-test - Oxford Statisticsmassa/Lecture 9.pdf · Lecture 9: the ˜2-test S. Massa, Department of Statistics, University of Oxford 19 January 2016

The χ2 Distribution

Straightaway we know that

(i) X2 is a statistic: it can be computed from the data;

(ii) large values imply that the null hypothesis is likely not true.

(iii) do we know its distribution?

YES, but approximately for large n.This distribution is called the χ2 distribution and is parameterizedby a positive integer, the number of degrees of freedom(d.f.)

How do we figure out the degrees of freedom?

Page 12: Lecture 9: the 2-test - Oxford Statisticsmassa/Lecture 9.pdf · Lecture 9: the ˜2-test S. Massa, Department of Statistics, University of Oxford 19 January 2016

The Number of Degrees of Freedom

The number of degrees of freedom depends on:

I the number of categories;

I the form of the null hypothesis, specifically the number ofparameters.In general

degrees of freedom = #categories−#parameters fitted from data−1.

I n large enough: expected count in each category at least 5. Ifnot, merge categories together.

Page 13: Lecture 9: the 2-test - Oxford Statisticsmassa/Lecture 9.pdf · Lecture 9: the ˜2-test S. Massa, Department of Statistics, University of Oxford 19 January 2016

Properties of χ2

I The χ2 distribution with d degrees of freedom is a continuousdistribution.

I It has mean d and variance 2d.

I Distributions right-skewed but skew drops as d grows.

I For large d becomes close to normal with mean d and variance2d.

Page 14: Lecture 9: the 2-test - Oxford Statisticsmassa/Lecture 9.pdf · Lecture 9: the ˜2-test S. Massa, Department of Statistics, University of Oxford 19 January 2016

Some Plots

Figure: χ2 distributions with 0.5, 3, 5, and 10 degrees of freedom.

Page 15: Lecture 9: the 2-test - Oxford Statisticsmassa/Lecture 9.pdf · Lecture 9: the ˜2-test S. Massa, Department of Statistics, University of Oxford 19 January 2016

Tabled.f. P = 0.05 P = 0.01

1 3.84 6.632 5.99 9.213 7.81 11.344 9.49 13.285 11.07 15.096 12.59 16.817 14.07 18.488 15.51 20.099 16.92 21.67

10 18.31 23.2115 25.00 30.5820 31.41 37.5725 37.65 44.3130 43.77 50.8940 55.76 63.6960 79.08 88.38

If more than 60 degrees offreedom approximate withN(d, 2d).

Find p-value for observedX2 using

Z =X2 − d√

2d≈ N(0, 1),

and the normal table.

Page 16: Lecture 9: the 2-test - Oxford Statisticsmassa/Lecture 9.pdf · Lecture 9: the ˜2-test S. Massa, Department of Statistics, University of Oxford 19 January 2016

Assumptions for the χ2 Goodness of Fit Test

Simple Random sample: The sample is taken from a fixedpopulation where each individual has equal chance ofbeing selected.

Independence: The observations are independent.

Sample size: We have a sufficiently large sample size for the wholetable and also for the individuals cells: each cell musthave expected count at least 5.

The sample size assumption goes back to the central limittheorem: here under the null hypothesis you are sampling from amultinomial distribution. You need adequate expected cell countsfor the CLT to kick in.

Page 17: Lecture 9: the 2-test - Oxford Statisticsmassa/Lecture 9.pdf · Lecture 9: the ˜2-test S. Massa, Department of Statistics, University of Oxford 19 January 2016

Fixed Distribution: a Fair Die

I Given a standard die we want to test if it’s fair.

H0 : each face has probability 1/6.

I If it is fair then each side has probability 1/6.

I Roll it 60 times. Expected count 10 for each side.

1 2 3 4 5 6Observed 16 15 4 6 14 5Expected 10 10 10 10 10 10

Compute the χ2 statistic

X2 =(16− 10)2

10+

(15− 10)2

10+ · · ·+ (5− 10)2

10= 3.6 + 2.5 + 3.6 + 1.6 + 1.6 + 2.5 = 15.4

Page 18: Lecture 9: the 2-test - Oxford Statisticsmassa/Lecture 9.pdf · Lecture 9: the ˜2-test S. Massa, Department of Statistics, University of Oxford 19 January 2016

Fixed Distribution: a Fair Die

I To compute the p-value we need to know the number ofdegrees of freedom.

I There are 6 categories and no parameters so

d.f. = 6− 0− 1 = 5.

I Therefore comparing X2 = 15.4 with 11.07 and 15.09 wereject the null hypothesis at both the 5% and the 1%significance level.

Page 19: Lecture 9: the 2-test - Oxford Statisticsmassa/Lecture 9.pdf · Lecture 9: the ˜2-test S. Massa, Department of Statistics, University of Oxford 19 January 2016

Parametric Families

Sometimes we want to know if the data came from a whole familyof distributions, for example from the Binomial or Poissondistribution.

H0 : the data comes from a Poisson(µ) distribution, for some µ.

At its current state, it’s impossible to check the hypothesis. Wedon’t know µ and thus we cannot calculate probabilities under thenull hypothesis.

First use the data to estimate the parameter. This gives you aconcrete null hypothesis to test.

Page 20: Lecture 9: the 2-test - Oxford Statisticsmassa/Lecture 9.pdf · Lecture 9: the ˜2-test S. Massa, Department of Statistics, University of Oxford 19 January 2016

Example (Deaths from Horsekicks)

Deaths 0 1 2 3 4 ≥ 5Frequency 109 65 22 3 1 0 Total=200

Table: Deaths from horsekicks in Prussian army (Preece, Ross and Kirby,1988)

We want to test

H0 : The observations come from a Poisson distribution.

Recall Poisson has a single parameter µ equal to its mean.Approximate the mean with the sample mean:

X̄ =1

200

(109×0+65×1+22×2+3×3+1×4

)=

122

200= 0.61.

Page 21: Lecture 9: the 2-test - Oxford Statisticsmassa/Lecture 9.pdf · Lecture 9: the ˜2-test S. Massa, Department of Statistics, University of Oxford 19 January 2016

Example (Deaths from Horsekicks)

Now test:

H0 : The observations come from a Poisson(0.61).

Construct the expected table:

What is the expected frequency of zeroes in 200 samples from aPoisson distribution?

expected frequency of zeroes = 200× P (zero) = 200× e−0.610.610

0!= 200× 0.5434 = 108.67

Page 22: Lecture 9: the 2-test - Oxford Statisticsmassa/Lecture 9.pdf · Lecture 9: the ˜2-test S. Massa, Department of Statistics, University of Oxford 19 January 2016

Computing the Expected Row

expected frequency of zeroes = 200× P (zero) = 200× e−0.610.610

0!= 200× 0.5434 = 108.6702

expected frequency of 1 = 200× e−0.610.611

1!= 200× 0.3314 = 66.29,

and so on until we arrive at

0 1 2 3 4 ≥ 5Expected freq. 108.67 66.3 20.2 4.1 0.6269 0....Observed freq. 109 65 22 3 1 0

The expected frequency for 2 or more is less than 5 so we combinethe last three columns.

0 1 ≥ 2Expected freq. 108.67 66.3 24.93Observed freq. 109 65 26

Page 23: Lecture 9: the 2-test - Oxford Statisticsmassa/Lecture 9.pdf · Lecture 9: the ˜2-test S. Massa, Department of Statistics, University of Oxford 19 January 2016

I We then calculate

X2 =(108.67− 109)2

108.67+

(66.3− 65)2

66.3+

(24.93− 26)2

24.93= 0.07.

I The final table has three categories, so we would have 2degrees of freedom, but we have also estimated one parameterfrom the data, so that leaves 1 degree of freedom.

I Consulting the table the critical value for the χ2 distributionwith 1 degree of freedom is 3.84 at 5% and 6.63 at 1%.

Conclusion: Since our χ2 statistic is 0.07 which is less than bothcritical values, we retain (do not reject) the null hypothesis thatthe data comes from a Poisson distribution.

Page 24: Lecture 9: the 2-test - Oxford Statisticsmassa/Lecture 9.pdf · Lecture 9: the ˜2-test S. Massa, Department of Statistics, University of Oxford 19 January 2016

The Normal Distribution

Suppose we are given the heights of 100 students in centimetres

Height (cm) 155-160 161-166 167-172 173-178 179-184 185-190Frequency 5 17 38 25 9 6

and you want to test whether it comes from a Normal distribution.H0 : The data follows N(µ, σ2), for some µ and σ2.H1 : The data do not follow a Normal distribution.

How do you compute the sample mean and sample standarddeviation here?

Replace each bin with its midpoint, e.g. 157.5 for 155-160, 163.5for [161,166], and so on up to 187.5 for [185-190].

Page 25: Lecture 9: the 2-test - Oxford Statisticsmassa/Lecture 9.pdf · Lecture 9: the ˜2-test S. Massa, Department of Statistics, University of Oxford 19 January 2016

Estimate the Parameters

Like with the Poisson distribution, we use the data to estimate theparameters µ and σ2.

Use the sample mean and sample variance.

x̄ = 172 s2 = 7.152.

H0 : The data follows N(172, 7.152).H1 : The data do not follow N(172, 7.152).

Page 26: Lecture 9: the 2-test - Oxford Statisticsmassa/Lecture 9.pdf · Lecture 9: the ˜2-test S. Massa, Department of Statistics, University of Oxford 19 January 2016

Compute Expected counts in Each Bin I

I Our observations are provided in bins, so what we can do iscompute the expected count in each bin from N(172, 7.152).

I So first compute the probability of each interval under the nullhypothesis, e.g. let Y ∼ N(172, 7.152), and Z ∼ N(0, 1)

I Replace the first interval with (−∞, 160)

P (Y ≤ 160) = P(Y − 172

7.15≤ 160.5− 172

7.15

)= P (Z ≤ −1.61) = 0.054.

Page 27: Lecture 9: the 2-test - Oxford Statisticsmassa/Lecture 9.pdf · Lecture 9: the ˜2-test S. Massa, Department of Statistics, University of Oxford 19 January 2016

Compute Expected counts in Each Bin II

I For the second interval

P (160.5 ≤ Y ≤ 166.5)

= P(160.5− 172

7.15≤ Y − 172

7.15≤ 166.5− 172

7.15

)= P (−1.61 ≤ Z ≤ −0.77) = 0.221.

I For the last interval replace [185, 190] with [184.5,∞).

I Multiply each probability with 100, the total number ofobservations.

Page 28: Lecture 9: the 2-test - Oxford Statisticsmassa/Lecture 9.pdf · Lecture 9: the ˜2-test S. Massa, Department of Statistics, University of Oxford 19 January 2016

Test Statistic I

Height (cm) 155-160 161-166 167-172 173-178 179-184 185-190prob. 0.054 0.167 0.307 0.290 0.142 0.040

Expected 5.4 16.7 30.7 29.0 14.2 4.0Observed 5 17 38 25 9 6

Check: All categories have expected ≥ 5? No, so combine last twocolumns to get

Height (cm) 155-160 161-166 167-172 173-178 179-190Expected 5.4 16.7 30.7 29.0 18.2Observed 5 17 38 25 15

Page 29: Lecture 9: the 2-test - Oxford Statisticsmassa/Lecture 9.pdf · Lecture 9: the ˜2-test S. Massa, Department of Statistics, University of Oxford 19 January 2016

Test Statistic II

The χ2 statistic is then

X2 =(5.4− 6)2

5.4+ . . .+

(15− 18.2)2

18.2= 2.88.

The number of degrees of freedom is 5 (categories)-1-2 (for thenumber of parameters estimated from the data)=2.

Critical value for χ2(2) is 5.99 at the 5% confidence level.

Conclusion: At the 5% confidence level, there is not enoughevidence to reject the null hypothesis.

Page 30: Lecture 9: the 2-test - Oxford Statisticsmassa/Lecture 9.pdf · Lecture 9: the ˜2-test S. Massa, Department of Statistics, University of Oxford 19 January 2016

Contingency Tables

Table: NHANES handedness data for Americans aged 25–34

Men Women Total

right-handed 934 1070 2004left-handed 113 92 205ambidextrous 20 8 28

Total 1067 1170 2237

T:handedness

This type of table is known as a contingency table.

The data seem to suggest that women tend to be right-handedmore than men. Let’s test this.

One can use the χ2 statistic to test whether two variables areindependent.

Page 31: Lecture 9: the 2-test - Oxford Statisticsmassa/Lecture 9.pdf · Lecture 9: the ˜2-test S. Massa, Department of Statistics, University of Oxford 19 January 2016

χ2 Test of Association

We have two variables, one for the gender, one for being right orleft-handed.

Our research hypothesis implies that gender has an effect onwhether one is right- or left-handed.

H0 : The two variables are independent.H1 : The two variables are associated.

We will use the χ2 test: we have the observed counts, but whatare the expected counts under the null hypothesis?

The null hypothesis implies that the two variables are independent.This means that

P (man left-handed) = P (man)× P (left-handed),

and so on.

Page 32: Lecture 9: the 2-test - Oxford Statisticsmassa/Lecture 9.pdf · Lecture 9: the ˜2-test S. Massa, Department of Statistics, University of Oxford 19 January 2016

χ2 Test of AssociationBut we still don’t know P (man) or P(left-handed): estimate fromthe data.

P (man) =No. of men in sample

No. of participants=

1067

2237= 0.477

P (left-handed) =No. of lhanded in sample

No. of participants=

205

2237= 0.092.

This gives us the expected table: e.g.

P (man)× P (left)× 2237 = 0.477× 0.092× 2237 ≈ 98.

Men Womenright-handed 956 1048left-handed 98 107ambidextrous 13 15

Table: Expected table

Page 33: Lecture 9: the 2-test - Oxford Statisticsmassa/Lecture 9.pdf · Lecture 9: the ˜2-test S. Massa, Department of Statistics, University of Oxford 19 January 2016

Computing the χ2 Statistic

Then

X2 =(934− 956)2

956+

(1070− 1048)2

1048+ · · ·+ (8− 15)2

15= 12.4.

But how many degrees of freedom?

Page 34: Lecture 9: the 2-test - Oxford Statisticsmassa/Lecture 9.pdf · Lecture 9: the ˜2-test S. Massa, Department of Statistics, University of Oxford 19 January 2016

d.f. for Test of Association I

We have two categorical variables one with 2 levels, one with 3levels.

For gender, we estimated the probability of the event {man}, whilethe other one follows immediately since there are only twopossibilities.

For handedness, we had to estimate the probability of two of thethree levels, the last one following since the probabilities must sumto 1.

Thus for each variable, the number of parameters we estimated isone less the number of levels.

Page 35: Lecture 9: the 2-test - Oxford Statisticsmassa/Lecture 9.pdf · Lecture 9: the ˜2-test S. Massa, Department of Statistics, University of Oxford 19 January 2016

d.f. for Test of Association II

The number of categories is 3× 2 = 6.Therefore

d.f. = 3× 2− 1− 3 = 2.

RemarkIn general if we have two variables with r and c levels respectively

d.f. = r × c− 1− (r − 1)− (c− 1) = (r − 1)× (c− 1).

Conclusion: Since the critical value at the 1% confidence level is9.21 and our statistic is 12.4 there is enough evidence to reject thenull hypothesis. There is association between gender and beingright or left-handed.

Page 36: Lecture 9: the 2-test - Oxford Statisticsmassa/Lecture 9.pdf · Lecture 9: the ˜2-test S. Massa, Department of Statistics, University of Oxford 19 January 2016

Recap I

χ2 distribution: parametrized by degrees of freedom;

Goodness of fit test: Given n sample of a categorical variable withk levels we test the null hypothesis

H0 : the categories have probabilities p0, . . . , pk.

If observed counts are n1, . . . , nk then

X2 :=(n1 − np1)2

np1+ · · ·+ (nk − npk)2

npk

=∑ (observed− expected)2

expected

d.f. = #categories−#parameters fit from data− 1.

Assumptions: simple random sample, independence, expected cellcount ≥ 5;

Page 37: Lecture 9: the 2-test - Oxford Statisticsmassa/Lecture 9.pdf · Lecture 9: the ˜2-test S. Massa, Department of Statistics, University of Oxford 19 January 2016

Recap II

χ2 test of association: two categorical variables summarized in acontingency table

H0 : the two variables are independent.

H1 : the two variables are associated.

If variables have r and c levels resp.

d.f. = (r − 1)× (c− 1).