Using Statistics To Make Inferences 8

80
8.1 Using Statistics To Make Inferences 8 Summary Contingency tables. Goodness of fit test. Friday 17 June 2022 12:20 AM

description

Using Statistics To Make Inferences 8. Summary Contingency tables. Goodness of fit test. 1. Thursday, 23 October 2014 12:33 PM. Goals. To assess contingency tables for independence. To perform and interpret a goodness of fit test. Practical Construct and analyse contingency tables. - PowerPoint PPT Presentation

Transcript of Using Statistics To Make Inferences 8

Page 1: Using Statistics To Make Inferences 8

8.11

Using Statistics To Make Inferences 8

Summary  

Contingency tables.Goodness of fit test.

Thursday 20 April 2023 06:29 PM

Page 2: Using Statistics To Make Inferences 8

8.22

Goals To assess contingency tables for independence.To perform and interpret a goodness of fit test.

 Practical

Construct and analyse contingency tables.

Page 3: Using Statistics To Make Inferences 8

8.33

Recall

To compare a population and sample variance we employed?χ2Cc

cc

Page 4: Using Statistics To Make Inferences 8

8.44

Today

The probability approach from last week is employed to tell if “observed” data confirms to the pattern “expected” under a given model.

Page 5: Using Statistics To Make Inferences 8

8.55

Categorical Data - Example

Assessed intelligence of athletic and non-athletic schoolboys.

bright stupid Total

athletic

581 567 1148

lazy 209 351 560

Total 790 918 1708K. Pearson “On The Relationship Of Intelligence To Size And Shape Of Head, And To Other Physical And Mental Characters”, Biometrika, 1906, 5, 105-146, data on page 144.

Page 6: Using Statistics To Make Inferences 8

8.66

Procedure1. Formulate a null hypothesis. Typically

the null hypothesis is that there is no association between the factors.

2. Calculate expected frequencies for the cells in the table on the assumption that the null hypothesis is true.

3. Calculate the chi-squared statistic. This is for an r x c table with entries in row i and column j.

r

i

c

j jijijiobserved

1 1

22

,expected,expected,

Page 7: Using Statistics To Make Inferences 8

8.77

Procedure4. Compare the calculated statistic with

tabulated values of the chi-squared distribution with ν degrees of freedom.

ν   =  (rows ‑ 1)(columns ‑ 1) = (r – 1)(c – 1)

Page 8: Using Statistics To Make Inferences 8

8.88

Key Assumptions1. Independence of the observations. The data

found in each cell of the contingency table used in the chi-squared test must be independent observations and non-correlated.

2. Large enough expected cell counts. As described by Yates et al., "No more than 20% of the expected counts are less than 5 and all individual expected counts are 1 or greater" (Yates, Moore & McCabe, 1999, The Practice of Statistics, New York: W.H. Freeman p. 734).

Page 9: Using Statistics To Make Inferences 8

8.99

Key Assumptions3. Randomness of data. The data in the table

should be randomly selected.

4. Sufficient Sample Size. It is also generally assumed that the sample size for the entire contingency table is sufficiently large to prevent falsely accepting the null hypothesis when the null hypothesis is true.

Page 10: Using Statistics To Make Inferences 8

8.1010

Example

Assessed intelligence of athletic and non athletic schoolboys.

 Observed

bright stupid Total

athletic

581 567 1148

lazy 209 351 560

Total 790 918 1708

Page 11: Using Statistics To Make Inferences 8

8.1111

ProbabilitiesThe probability a random boy is athletic is

6721.017081148

The probability a random boy is bright is

4625.01708790

Assuming independence, the probability a random boy is both athletic and bright is

98.5301708

7901148

3109.01708790

17081148

bright stupid Total

athletic

581 567 1148

lazy 209 351 560

Total 790 918 1708

For 1708 respondents the expected number of athletic bright boys is

CCCCCCCCCCCCCCC

Page 12: Using Statistics To Make Inferences 8

8.1212

Expected

bright stupid Total

athletic

530.98 1148

lazy 560

Total 790 918 1708

The expected number of athletic bright boys is

98.5301708

7901148

Page 13: Using Statistics To Make Inferences 8

8.1313

Expected

bright stupid Total

athletic

530.98 ? 1148

lazy 560

Total 790 918 1708

The expected number of athletic stupid boys is

Page 14: Using Statistics To Make Inferences 8

8.1414

Expected

bright stupid Total

athletic

530.98 617.02 1148

lazy 560

Total 790 918 1708

The expected number of athletic stupid boys is

1148 – 530.98 = 617.02

Page 15: Using Statistics To Make Inferences 8

8.1515

Expected

bright stupid Total

athletic

530.98 617.02 1148

lazy ? 560

Total 790 918 1708

The expected number of lazy bright boys is

Page 16: Using Statistics To Make Inferences 8

8.1616

Expected

bright stupid Total

athletic

530.98 617.02 1148

lazy 259.02 ? 560

Total 790 918 1708

The expected number of stupid lazy boys is

Page 17: Using Statistics To Make Inferences 8

8.1717

Expected

bright stupid Total

athletic

530.98 617.02 1148

lazy 259.02 300.98 560

Total 790 918 1708

The expected number of stupid lazy boys is

918 – 617.02 = 300.98

Page 18: Using Statistics To Make Inferences 8

8.1818

Expected

bright stupid Total

athletic

530.98 617.02 1148

lazy 259.02 300.98 560

Total 790 918 1708

Page 19: Using Statistics To Make Inferences 8

8.1919

χ2

73.26

98.30098.300351

02.25902.259209

02.61702.617567

98.53098.530581

22

22

2

calc

Expected

Expected - Observed 2

111 cr

1708918790Total

560351209lazy

1148567581athletic

Totalstupidbright

1708918790Total

560351209lazy

1148567581athletic

Totalstupidbright

Observed Expected

1708918790Total

560300.98259.02lazy

1148617.02530.98athletic

Totalstupidbright

1708918790Total

560300.98259.02lazy

1148617.02530.98athletic

Totalstupidbright

Only one cell is free.

Page 20: Using Statistics To Make Inferences 8

8.2020

χ2

As a general rule to employ this statistic,all expected frequencies should exceed 5.

If this is not the case categories are pooled (merged) to achieve this goal. See the Prussian data later.

Page 21: Using Statistics To Make Inferences 8

8.2121

Conclusion73.262 calc 1

84.305.2

1

ν p=0.1

p=0.05

p=0.025

p=0.01

p=0.005

p=0.002

1 2.706 3.841 5.024 6.635 7.879 9.550

The result is significant (26.73 > 3.84) at the 5% level. So we reject the hypothesis of independence between athletic prowess and intelligence.

Page 22: Using Statistics To Make Inferences 8

8.2222

SPSSRaw data

Note v1 are the row labelsv2 are the column labelsv3 is the frequency

for each cell

Page 23: Using Statistics To Make Inferences 8

8.2323

SPSSData > Weight Cases

Since frequency data has been input, necessary to weight.This is essential, do not use percentages.

Page 24: Using Statistics To Make Inferences 8

8.2424

SPSSAnalyze > Descriptive Statistics > Crosstabs

Set row and column variables.

Frequencies already set.

Page 25: Using Statistics To Make Inferences 8

8.2525

SPSS

Select chi-square

Page 26: Using Statistics To Make Inferences 8

8.2626

SPSS

SelectObserved – input dataExpected – output data,

under the model

Page 27: Using Statistics To Make Inferences 8

8.2727

SPSS

V1 * V2 Crosstabulation

581 567 1148

531.0 617.0 1148.0

209 351 560

259.0 301.0 560.0

790 918 1708

790.0 918.0 1708.0

Count

Expected Count

Count

Expected Count

Count

Expected Count

athletic

lazy

V1

Total

bright stupid

V2

Total

Expected cell frequencies

Expected under the model.

Page 28: Using Statistics To Make Inferences 8

8.2828

SPSS

Chi-Square Tests

26.736b 1 .000

26.204 1 .000

26.973 1 .000

.000 .000

1708

Pearson Chi-Square

Continuity Correctiona

Likelihood Ratio

Fisher's Exact Test

N of Valid Cases

Value dfAsymp. Sig.

(2-sided)Exact Sig.(2-sided)

Exact Sig.(1-sided)

Computed only for a 2x2 tablea.

0 cells (.0%) have expected count less than 5. The minimum expected count is 259.02.

b.

Pearson Chi Square is the required statistic

Do not report p = .000, rather p < .001

Note Fisher’s exact test, only available in SPSS for 2x2 tables (see next slide).

ff

Page 29: Using Statistics To Make Inferences 8

8.2929

What If We Have Small Cell Counts?

Fisher's exact test

The Fisher's exact test is used when you want to conduct a chi-square test but one or more of your cells has an expected frequency of five or less. Remember that the chi-square test assumes that each cell has an expected frequency of five or more, but the Fisher's exact test has no such assumption and can be used regardless of how small the expected frequency is. In SPSS, unless you have the SPSS Exact Test Module, you can only perform a Fisher's exact test on a 2x2 table, and these results are presented by default.

Page 30: Using Statistics To Make Inferences 8

8.3030

AsideTwo dials were compared. A subject was asked to read each dial many times, and the experimenter recorded his errors. Altogether 7 subjects were tested. The data shows how many errors each subject produced. Do the two conditions differ at the 0.05 significance level (give the appropriate p value)?

Observed data1 2 3 4 5 6 736 31 31 29 32 25 2629 35 34 35 34 35 30

What key word describes this data?

Page 31: Using Statistics To Make Inferences 8

8.3131

Aside

What tests are available for paired data?

One sample t test

Sign test

Wilcoxon Signed Ranks Test

CCCCCCCCCc

Page 32: Using Statistics To Make Inferences 8

8.3232

Aside

What tests are available for paired data? What assumptions are made?One sample t test

Sign test

Wilcoxon Signed Ranks Test

normality

Resembles the SignTest in scope, but it is much more sensitive. In fact, for large numbers it is almost as sensitive as the Student t-test

No assumption of normality

Page 33: Using Statistics To Make Inferences 8

8.3333

Aside

What tests are available for paired data? One sample t test

Sign test

Wilcoxon Signed Ranks Test

Sign test answers the question How Often?, whereas other tests answer the question How Much?

One sample t test – meanWilcoxon Signed Ranks Test - median

Page 34: Using Statistics To Make Inferences 8

8.3434

Example

The table is based on case-records of women employees in Royal Ordnance factories during 1943-6. The same test being carried out on the left eye (columns) and right eye (rows).

Stuart “The estimation and comparison of strengths of association in contingency tables”, Biometrika, 1953, 40, 105-110.

Page 35: Using Statistics To Make Inferences 8

8.3535

ObservedHighes

tSecon

dThird Lowes

tTotal

Highest

1520 266 124 66 1976

Second

234 1512 432 78 2256

Third 117 362 1772 205 2456

Lowest

36 82 179 492 789

Total 1907 2222 2507 841 7477

Is there any obvious structure?

Page 36: Using Statistics To Make Inferences 8

8.3636

Expected

In general to find the expected frequency in a particular cell the equation is

Row total x Column total / Grand total

Page 37: Using Statistics To Make Inferences 8

8.3737

Expected

7477841250722221907Total

7894921798236Lowest

24562051772362117Third

2256784321512234Second

1976661242661520Highest

TotalLowestThirdSecondHighest

7477841250722221907Total

7894921798236Lowest

24562051772362117Third

2256784321512234Second

1976661242661520Highest

TotalLowestThirdSecondHighestIn general to find the expected frequency in a particular cell the equation is

Row total x Column total / Grand total

So for highest right and bottom left the equation becomes

1976 x 1907 / 7477 = 503.98

Page 38: Using Statistics To Make Inferences 8

8.3838

ExpectedHighest Secon

dThird Lowes

tTotal

Highest

503.98 ? 1976

Second

? 2256

Third ? 2456

Lowest

? ? ? ? 789

Total 1907 2222 2507 841 7477

Row total x Column total / Grand total1976 x 1907 / 7477 = 503.98

Page 39: Using Statistics To Make Inferences 8

8.3939

ExpectedHighest Secon

dThird Lowes

tTotal

Highest

503.98 587.22 662.54 ? 1976

Second

575.39 670.43 756.43 ? 2256

Third 626.40 729.87 823.48 ? 2456

Lowest

? ? ? ? 789

Total 1907 2222 2507 841 7477

Row total x Column total / Grand total

Page 40: Using Statistics To Make Inferences 8

8.4040

ExpectedHighest Secon

dThird Lowes

tTotal

Highest

503.98 587.22 662.54 ? 1976

Second

575.39 670.43 756.43 ? 2256

Third 626.40 729.87 823.48 ? 2456

Lowest

? ? ? ? 789

Total 1907 2222 2507 841 7477

The missing values are simply found by subtraction

Page 41: Using Statistics To Make Inferences 8

8.4141

ExpectedHighest Secon

dThird Lowes

tTotal

Highest

503.98 587.22 662.54 ? 1976

Second

575.39 670.43 756.43 2256

Third 626.40 729.87 823.48 2456

Lowest

789

Total 1907 2222 2507 841 7477

1976 – 503.98 – 587.22 – 662.54 = 222.26

Page 42: Using Statistics To Make Inferences 8

8.4242

Expected

Highest Second

Third Lowest

Total

Highest

503.98 587.22 662.54 222.26 1976

Second

575.39 670.43 756.43 2256

Third 626.40 729.87 823.48 2456

Lowest

789

Total 1907 2222 2507 841 7477

1976 – 503.98 – 587.22 – 662.54 = 222.26

Page 43: Using Statistics To Make Inferences 8

8.4343

ExpectedHighest Secon

dThird Lowes

tTotal

Highest

503.98 587.22 662.54 222.26 1976

Second

575.39 670.43 756.43 ? 2256

Third 626.40 729.87 823.48 ? 2456

Lowest

? ? ? ? 789

Total 1907 2222 2507 841 7477

Similarly for the remaining cells

Page 44: Using Statistics To Make Inferences 8

8.4444

ExpectedHighest Secon

dThird Lowes

tTotal

Highest

503.98 587.22 662.54 222.26 1976

Second

575.39 670.43 756.43 253.75 2256

Third 626.40 729.87 823.48 276.25 2456

Lowest

201.23 234.47 264.55 88.75 789

Total 1907 2222 2507 841 7477

Page 45: Using Statistics To Make Inferences 8

8.4545

Short Cut

Contributions to the χ2 statistic,

for the top left cell the contribution is

expected

expectedobserved 2

32.2048

98.50398.5031520 2

Page 46: Using Statistics To Make Inferences 8

8.4646

Conclusion32.20482 calc 911 cr

ν p=0.1 p=0.05 p=0.025

p=0.01 p=0.005

p=0.002

9 14.684 16.919 19.023 21.666 23.589 26.056

92.1605.2

9

The above statistic makes it very clear that there is some relationship between the quality of the right and left eyes.

For the top left cell only.

Nine cells are free.

Page 47: Using Statistics To Make Inferences 8

8.4747

Highest

Second Third Lowest Total

Highest

2048.32

175.72 437.75 109.86

Second 202.55 1056.38

139.14 121.73

Third 414.25 185.41 1092.53

18.38

Lowest 135.67 99.15 27.66 1832.37

Total 8097

Total χ2

Page 48: Using Statistics To Make Inferences 8

8.4848

Conclusion87.80962 calc 911 cr

ν p=0.1 p=0.05 p=0.025

p=0.01 p=0.005

p=0.002

9 14.684 16.919 19.023 21.666 23.589 26.056

92.1605.2

9

The above statistic makes it very clear that there is some relationship between the quality of the right and left eyes.

For all cells.

Nine cells are free.

Page 49: Using Statistics To Make Inferences 8

8.4949

SPSS

Raw data

Page 50: Using Statistics To Make Inferences 8

8.5050

SPSSExpected cell frequencies

V1 * V2 Crosstabulation

1520 36 234 117 1907

504.0 201.2 575.4 626.4 1907.0

66 492 78 205 841

222.3 88.7 253.8 276.2 841.0

266 82 1512 362 2222

587.2 234.5 670.4 729.9 2222.0

124 179 432 1772 2507

662.5 264.5 756.4 823.5 2507.0

1976 789 2256 2456 7477

1976.0 789.0 2256.0 2456.0 7477.0

Count

Expected Count

Count

Expected Count

Count

Expected Count

Count

Expected Count

Count

Expected Count

Highest

Lowest

Second

Third

V1

Total

Highest Lowest Second Third

V2

Total

Page 51: Using Statistics To Make Inferences 8

8.5151

SPSS

Pearson Chi Square is the required statistic

Chi-Square Tests

8096.877a 9 .000

6671.512 9 .000

7477

Pearson Chi-Square

Likelihood Ratio

N of Valid Cases

Value dfAsymp. Sig.

(2-sided)

0 cells (.0%) have expected count less than 5. Theminimum expected count is 88.75.

a.

Page 52: Using Statistics To Make Inferences 8

8.5252

Poisson Distribution

The Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time and/or space if these events occur with a known average rate and independently of the time since the last event. The Poisson distribution can also be used for the number of events in other specified intervals such as distance, area or volume.

Typical applications are to queues/arrivals. The number of phone calls received per day.The occurrence of accidents/industrial injuries.More exotically, birth defects and the number of genetic mutations. The occurrence of rare diseases.

Page 53: Using Statistics To Make Inferences 8

8.5353

Poisson Distribution

1 discrete events which are independent.

2 events occur at a fixed rate λ per unit continuum.(λ lambda)

Page 54: Using Statistics To Make Inferences 8

8.5454

Poisson Distribution x successes

!

;Probxe

xx

e is approximately equal to 2.718

λ is the rate per unit continuum

the mean is λ the variance is λ

Page 55: Using Statistics To Make Inferences 8

8.5555

Casio 83ES

exp or “e”

exp(1) = 2.7182818

exp(2) = 7.389056

Its inverse, on the same key is ln, so

ln(2.7182818) = 1

ln(7.389056) = 2

Page 56: Using Statistics To Make Inferences 8

8.5656

Alternate applications

A similar approach may be employed to test if simple models are plausible.

Page 57: Using Statistics To Make Inferences 8

8.5757

χ2 Goodness of Fit Test

ii

iicalc E

EO 2

2

The degrees of freedom are ν = m – n – 1, where there are m frequencies left in the problem, after pooling, and n parameters have been fitted from the raw data.

For example…

Page 58: Using Statistics To Make Inferences 8

8.5858

Example

The number of Prussian army corps in which soldiers died from the kicks of a horse in a year.

 Typical “industrial injury” data

Page 59: Using Statistics To Make Inferences 8

8.5959

Which distribution is appropriate?

Is the data discrete or continuous?

Discrete, since a simple countccccccccccccccccccccccc

Page 60: Using Statistics To Make Inferences 8

8.6060

Check list of distributionsDiscrete Continuous

Binomial Normal

Poisson Exponential

Page 61: Using Statistics To Make Inferences 8

8.6161

Check list of distribution parameters

Discrete Continuous

Binomial Normal

Poisson Exponential

n p μ σ2

λ

cccccccccccccccccccccccccc

Discrete, no “n” implies Poissonccccccc

λcccccccccccccccccccccccccc

Page 62: Using Statistics To Make Inferences 8

8.6262

Observed Data

Number deaths in a corps

Observed frequency (Oi)

0 144

1 91

2 32

3 11

4 2

5 or more 0

Total 280

We need to estimate the Poisson parameter λ. Which is the mean of the distribution.

Page 63: Using Statistics To Make Inferences 8

8.6363

Observed Data

Number deaths in a corps

Observed frequency (Oi)

0 144

1 91

2 32

3 11

4 2

5 or more 0

Total 280

Page 64: Using Statistics To Make Inferences 8

8.6464

Mean

7.02113291144

241133229111440

ccccccccccccccccccccc280Total

05 or more24

113

322911

1440

Observed frequency (Oi)

Number deaths in a corps

280Total

05 or more24

113

322911

1440

Observed frequency (Oi)

Number deaths in a corps

Page 65: Using Statistics To Make Inferences 8

8.6565

Expected

Number deaths in a

corps

Poisson model

Expected probability

0 0.4966

1 0.3476

2 0.1217

3 0.0284

4 0.0050

5 or more By subtraction

?

Total 1 1

e e

!2/2 e!3/3 e

!4/4 e

λ = 0.7 and “e” is a constant on your calculator

Page 66: Using Statistics To Make Inferences 8

8.6666

Expected

Number deaths in a

corps

Poisson model

Expected probability

0 0.4966

1 0.3476

2 0.1217

3 0.0284

4 0.0050

5 or more By subtraction

0.0008

Total 1 1

e e

!2/2 e!3/3 e

!4/4 e

Page 67: Using Statistics To Make Inferences 8

8.6767

Expected FrequencyExpected frequency for no deaths 280 x 0.4966 =

139.04

Number deaths in a

corps

Expected probabilit

y

Expected frequency (Ei)

0 0.4966 139.04

1 0.3476

2 0.1217

3 0.0284

4 0.0050

5 or more 0.0008

Total 1

Page 68: Using Statistics To Make Inferences 8

8.6868

Expected FrequencyExpected frequency for remaining rows

280 × probability = frequency

Number deaths in a corps

Expected probability

Expected frequency (Ei)

0 0.4966 139.04

1 0.3476 97.33

2 0.1217 34.07

3 0.0284 7.95

4 0.0050 1.39

5 or more 0.0008 0.22

Total 1 280

Note the two expected frequencies less than 5!

Page 69: Using Statistics To Make Inferences 8

8.6969

χ2 Calculation

Number deaths

in a corps

Observed

frequency (Oi)

Expected

frequency (Ei)

0 144 139.04 0.18

1 91 97.33 0.41

2 32 34.07 0.13

3 or more

13 9.56 1.24

Total 280 280 1.95

i

ii

EEO 2

Pool to ensure all expected frequencies exceed 5

Page 70: Using Statistics To Make Inferences 8

8.7070

ConclusionHere m (frequencies) = 4, n (fitted parameters) = 1 then ν = m – n – 1 = 4 – 1 – 1 = 2

ν p=0.1 p=0.05 p=0.025

p=0.01 p=0.005

p=0.002

2 4.605 5.991 7.378 9.210 10.597 12.429

991.505.2

2 95.12 calc

The hypothesis, that the data comes from a Poisson distribution would be accepted (5.991 > 1.95).

Page 71: Using Statistics To Make Inferences 8

8.7171

Next Week

Bring your calculators next week

Page 72: Using Statistics To Make Inferences 8

8.7272

Read

Read Howitt and Cramer pages 134-152

Read Howitt and Cramer (e-text) pages 125-134

Read Russo (e-text) pages 100-119

Read Davis and Smith pages 434-448

Page 73: Using Statistics To Make Inferences 8

8.7373

Practical 8

This material is available from the module web page.

http://www.staff.ncl.ac.uk/mike.cox

Module Web Page

Page 74: Using Statistics To Make Inferences 8

8.7474

Practical 8

This material for the practical is available.

Instructions for the practical

Practical 8

Material for the practicalPractical 8

Page 75: Using Statistics To Make Inferences 8

8.7575

Assignment 2

You will find submission details on the module web site

Note the dialers lower down the page give access to your individual assignment. It is necessary to enter your student number exactly as it appears on your smart card.

Page 76: Using Statistics To Make Inferences 8

8.7676

Assignment 2As a general rule make sure you can perform the calculations manually.

It does no harm to check your calculations using a software package.

Some software employ non-standard definitions and should be used with caution.

Page 77: Using Statistics To Make Inferences 8

8.7777

Assignment 2

All submissions must be typed.

Page 78: Using Statistics To Make Inferences 8

8.7878

Whoops!

Researchers at Cardiff University School of Social Science claim errors made by the Hawk-Eye line - calling technology can be greater than 3.6mm - the average error quoted by the manufacturers.

Teletext, p388

12 June 2008

Page 79: Using Statistics To Make Inferences 8

8.7979

Whoops!

Kate Middleton 'marries Prince Harry' on souvenir mugThe Telegraph - Thursday 17 March 2011

Page 80: Using Statistics To Make Inferences 8

8.8080

Whoops!

Poldark - BBC - 8 March 2015