Download - Lesson 15 - 1 Nonparametric Statistics Overview. Objectives Understand Difference between Parametric and Nonparametric Statistical Procedures Nonparametric.

Transcript

Lesson 15 - 1

Nonparametric Statistics

Overview

Objectives

• Understand Difference between Parametric and Nonparametric Statistical Procedures

• Nonparametric methods use techniques to test claims that are distribution free

Page 3: Lesson 15 - 1 Nonparametric Statistics Overview. Objectives Understand Difference between Parametric and Nonparametric Statistical Procedures Nonparametric.

Vocabulary• Parametric statistical procedures – inferential procedures that

rely on testing claims regarding parameters such as the population mean μ, the population standard deviation, σ, or the population proportion, p. Many times certain requirements had to be met before we could use those procedures.

• Nonparametric statistical procedures – inferential procedures that are not based on parameters, which require fewer requirements be satisfied to perform the tests. They do not require that the population follow a specific type of distribution.

• Efficiency – compares sample size for a nonparametric test to the equivalent parametric test. Example: If a nonparametric statistical test has an efficiency of 0.85, a sample size of 100 would be required in the nonparametric test to achieve the same results a sample of 85 would produce in the equivalent parametric test.

Page 4: Lesson 15 - 1 Nonparametric Statistics Overview. Objectives Understand Difference between Parametric and Nonparametric Statistical Procedures Nonparametric.

Nonparametric Advantages

• Most of the tests have very few requirements, so it is unlikely that these tests will be used improperly.

• For some nonparametric procedures, the computations are fairly easy.

• The procedures can be used for count data or rank data, so nonparametric methods can be used on data such as rankings of a movie as excellent, good, fair, or poor.

Page 5: Lesson 15 - 1 Nonparametric Statistics Overview. Objectives Understand Difference between Parametric and Nonparametric Statistical Procedures Nonparametric.

Nonparametric Disadvantages

• The results of the test are typically less powerful. Recall that the power of a test refers to the probability of making a Type II error. A Type II error occurs when a researcher does not reject the null hypothesis when the alternative hypothesis is true.

• Nonparametric procedures are less efficient than parametric procedures. This means that a larger sample size is required when conducting a nonparametric procedure to have the same probability of a Type I error as the equivalent parametric procedure.

Page 6: Lesson 15 - 1 Nonparametric Statistics Overview. Objectives Understand Difference between Parametric and Nonparametric Statistical Procedures Nonparametric.

Power vs Efficiency

● The power of a test refers to the probability of a Type II error● Thus when both nonparametric and parametric procedures

apply, for the nonparametric method The researcher does not reject H0 when H1 is true, with higher

probability The researcher cannot distinguish between H0 and H1 as effectively

● The efficiency of a test refers to the sample size needed to achieve a certain Type I error

● Thus when both nonparametric and parametric procedures apply, for the nonparametric method The researcher requires larger sample sizes when using

nonparametric methods The researcher incurs higher costs associated with the larger

number of subjects

Page 7: Lesson 15 - 1 Nonparametric Statistics Overview. Objectives Understand Difference between Parametric and Nonparametric Statistical Procedures Nonparametric.

Nonparametric Test ParametricTest Efficiency of

Nonparametric Test

Sign test Single-sample z-test or t-test

0.955 (for small samples from a normal population) 0.75 (for samples of size 13 or larger if data are normal)

Mann–Whitney testInference about the difference of two means—independent samples

0.955 (if data are normal)

Wilcoxon matched-pairs test

Inference about the difference of two means—dependent samples

0.955 (if the differences are normal)

Kruskal–Wallis test One-way ANOVA

0.955 (if the data are normal) 0.864 (if the distributions are identical except for medians)

Spearman rank-correlation

Linear correlation 0.912 (if the data are bivariate coefficient normal)

Efficiency

Page 8: Lesson 15 - 1 Nonparametric Statistics Overview. Objectives Understand Difference between Parametric and Nonparametric Statistical Procedures Nonparametric.

Summary and Homework

• Summary– Nonparametric tests require few assumptions, and

thus are applicable in situations where parametric tests are not

– In particular, nonparametric tests can be used on rankings data (which cannot be analyzed by parametric tests)

– When both are applicable, nonparametric tests are less efficient than parametric tests

• Homework– problems 1-5 from CD

Page 9: Lesson 15 - 1 Nonparametric Statistics Overview. Objectives Understand Difference between Parametric and Nonparametric Statistical Procedures Nonparametric.

Lesson 15 - 2

Runs Test for Randomness

Objectives

• Perform a runs test for randomness

• Runs tests are used to test whether it is reasonable to conclude data occur randomly, not whether the data are collected randomly.

Page 11: Lesson 15 - 1 Nonparametric Statistics Overview. Objectives Understand Difference between Parametric and Nonparametric Statistical Procedures Nonparametric.

Vocabulary

• Runs test for randomness – used to test claims that data have been obtained or occur randomly

• Run – sequence of similar events, items, or symbols that is followed by an event, item, or symbol that is mutually exclusive from the first event, item, or symbol

• Length – number of events, items, or symbols in a run

Page 12: Lesson 15 - 1 Nonparametric Statistics Overview. Objectives Understand Difference between Parametric and Nonparametric Statistical Procedures Nonparametric.

Runs Test for Randomness

● Assume that we have a set of data, and we wish to know whether it is random, or not

● Some examples A researcher takes a systematic sample, choosing

every 10th person who passes by … and wants to check whether the gender of these people is random

A professor makes up a true/false test … and wants to check that the sequence of answers is random

Page 13: Lesson 15 - 1 Nonparametric Statistics Overview. Objectives Understand Difference between Parametric and Nonparametric Statistical Procedures Nonparametric.

Runs, Hits and Errors

• A run is a sequence of similar events– In flipping coins, the number of “heads” in a row– In a series of patients, the number of “female

patients” in a row– In a series of experiments, the number of

“measured value was more than 17.1” in a row

• It is unlikely that the number of runs is too small or too large

• This forms the basis of the runs test

Page 14: Lesson 15 - 1 Nonparametric Statistics Overview. Objectives Understand Difference between Parametric and Nonparametric Statistical Procedures Nonparametric.

Runs Test (small case)

• Small-Sample Case: If n1 ≤20 and n2 ≤ 20, the test statistic in the runs test for randomness is r, the number of runs.

• Critical Values for a Runs Test for Randomness: Use Table IX (critical value at α = 0.05)

Page 15: Lesson 15 - 1 Nonparametric Statistics Overview. Objectives Understand Difference between Parametric and Nonparametric Statistical Procedures Nonparametric.

Critical Values for a Runs Test for RandomnessUse Table IV, the standard normal table.

Large-Sample Case: If n1 > 20 or n2 > 20 the test statistic in the runs test for randomness is

r - μrz = ------- σr

where

2n1n2μr = -------- + 1 n

2n1n2 (2n1n2 – n) σr = ----------------------- n²(n – 1)

Let n represent the sample size of which there are two mutually exclusive types.Let n1 represent the number of observations of the first type.Let n2 represent the number of observations of the second type.Let r represent the number of runs.

Runs Test (large case)

Page 16: Lesson 15 - 1 Nonparametric Statistics Overview. Objectives Understand Difference between Parametric and Nonparametric Statistical Procedures Nonparametric.

Hypothesis Tests for Randomness Use Runs TestStep 0 Requirements:

1) sample is a sequence of observations recorded in order of their occurrence2) observations have two mutually exclusive categories.

Step 1 Hypotheses: H0: The sequence of data is random. H1: The sequence of data is not random.

Step 2 Level of Significance: (level of significance determines the critical value) Large-sample case: Determine a level of significance, based on the

seriousness of making a Type I error. Small-sample case: we must use the level of significance, α = 0.05.

Step 3 Compute Test Statistic:

Step 4 Critical Value Comparison: Reject H0 if

Step 5 Conclusion: Reject or Fail to Reject

r - μrz0 = ------- σr

Small-Sample: r Large Sample:

Small-Sample Case: r outside Critical intervalLarge-Sample Case: z0 < -zα/2 or z0 > zα/2

Page 17: Lesson 15 - 1 Nonparametric Statistics Overview. Objectives Understand Difference between Parametric and Nonparametric Statistical Procedures Nonparametric.

Example 1

The following sequence was observed when flipping a coin:

H, T, T, H, H, T, H, H, H, T, H, T, T, T, H, H

The coin was flipped 16 times with 9 heads and 7 tails. There were 9 runs observed.

Valuesn = 16n1 = 9n2 = 7r = 9

Critical values from table IX (9,7) = 4, 14

Since 4 < r = 9 < 14, then we Fail to reject and conclude that we don’t have enough evidence to say that it is not random.

Page 18: Lesson 15 - 1 Nonparametric Statistics Overview. Objectives Understand Difference between Parametric and Nonparametric Statistical Procedures Nonparametric.

Example 2The following sequence was observed when flipping a coin:

H, T, T, H, H, T, H, H, H, T, H, T, T, T, H, H, H, TT, T, T, H, H, T, T, T, H, T, T, H, H, T, T, H, T, T, T, T

The coin was flipped 38 times with 16 heads and 22 tails. There were 18 runs observed.

Valuesn = 38n1 = 16n2 = 22r = 18

r - μrz = --------- r

Since z (-0.515) > -Zα/2 (-2.32) we fail to reject and conclude that we don’t have enough evidence to say its not random.

2n1n2μr = -------- + 1 = 19.5263 n

2n1n2 (2n1n2 – n) σr = ----------------------- = 9.9624 n²(n – 1)

z = -0.515

Page 19: Lesson 15 - 1 Nonparametric Statistics Overview. Objectives Understand Difference between Parametric and Nonparametric Statistical Procedures Nonparametric.

Example 3, Using Confidence Intervals

Trey flipped a coin 100 times and got 54 heads and 46 tails, so

– n = 100

– n1 = 54

– n2 = 46

68501100

46542.r

944)1001(100

)10046542(465422

r - μrz = --------- r

We transform this into a confidence interval, PE +/- MOE.

Page 20: Lesson 15 - 1 Nonparametric Statistics Overview. Objectives Understand Difference between Parametric and Nonparametric Statistical Procedures Nonparametric.

Using Confidence Intervals

● The z-value for α = 0.05 level of significance is 1.96

LB: 50.68 – 1.96 • 4.94 = 41.0

UB: 50.68 + 1.96 • 4.94 = 60.4

● We reject the null hypothesis if there are 41 or fewer runs, or if there are 61 or more

● We do not reject the null hypothesis if there are 42 to 60 runs

Page 21: Lesson 15 - 1 Nonparametric Statistics Overview. Objectives Understand Difference between Parametric and Nonparametric Statistical Procedures Nonparametric.

Summary and Homework

• Summary– The runs test is a nonparametric test for the

independence of a sequence of observations– The runs test counts the number of runs of

consecutive similar observations– The critical values for small samples are given in

tables– The critical values for large samples can be

approximated by a calculation with the normal distribution

• Homework– problems 1, 2, 5, 6, 7, 8, 15 from the CD

Page 22: Lesson 15 - 1 Nonparametric Statistics Overview. Objectives Understand Difference between Parametric and Nonparametric Statistical Procedures Nonparametric.

Lesson 15 - 3

Inferences about Measures of Central Tendency

Objectives

• Conduct a one-sample sign test

Vocabulary

• One-sample sign test -- requires data converted to plus and minus signs to test a claim regarding the median– Change all data to + (above H0 value)

or – (below H0 value)

– Any values = to H0 value change to 0

Page 25: Lesson 15 - 1 Nonparametric Statistics Overview. Objectives Understand Difference between Parametric and Nonparametric Statistical Procedures Nonparametric.

Sign Test

● Like the runs test, the test statistic used depends on the sample size

● In the small sample case, where the number of observations n is 25 or less, we use the number of +’s and the number of –’s directly

● In the large sample case, where the number of observations n is more than 25, we use a normal approximation

Page 26: Lesson 15 - 1 Nonparametric Statistics Overview. Objectives Understand Difference between Parametric and Nonparametric Statistical Procedures Nonparametric.

Critical Values for a Runs Test for RandomnessSmall-Sample Case: Use Table VII to find critical value for a one-sample sign test

Large-Sample Case: Use Table IV, standard normal table (one-tailed -zα; two-tailed -zα/2).

Small-Sample Case: If n ≤ 25, the test statistic in the signs test is k, defined as below.

Large-Sample Case: If n > 25 the test statistic is (k + ½ ) – ½ nz0 = --------------------- ½ √n

Left-Tailed Two-Tailed Right-Tailed

H0: M = M0

H1: M < M0

H0: M = M0

H1: M ≠ M0

H0: M = M0

H1: M > M0

k = # of + signs k = smaller # of + or - signs

k = # of - signs

where k = is defined from above and n = number of + and – signs (zeros excluded)

Test Statistic

Signs Test for Central Tendencies

Page 27: Lesson 15 - 1 Nonparametric Statistics Overview. Objectives Understand Difference between Parametric and Nonparametric Statistical Procedures Nonparametric.

Hypothesis Tests for Central Tendency Using Signs Test

Step 0: Convert all data to +, - or 0 (based on H0)

Step 1 Hypotheses: Left-tailedTwo-Tailed Right-Tailed H0: Median = M0 H0: Median = M0 H0: Median = M0

H1: Median < M0 H1: Median ≠ M0 H1: Median > M0

Step 2 Level of Significance: (level of significance determines critical value) Determine a level of significance, based on the seriousness of making a

Type I error Small-sample case: Use Table X. Large-sample case: Use Table IV, standard normal (one-tailed -zα; two-

tailed -zα/2). Step 3 Compute Test Statistic:

Step 4 Critical Value Comparison: Reject H0 if Small-Sample Case: k ≤ critical value

Large-Sample Case: z0 < -zα/2 (two tailed) or z0 < -zα (one-tailed)

Step 5 Conclusion: Reject or Fail to Reject

Small-Sample: k Large-Sample: (k + ½ ) – ½ nz0 = --------------------- ½ √n

Page 28: Lesson 15 - 1 Nonparametric Statistics Overview. Objectives Understand Difference between Parametric and Nonparametric Statistical Procedures Nonparametric.

Small Number ExampleA recent article in the school newspaper reported that the typical credit-card debt of a student is $500. Professor McCraith claims that the median credit-card debt of students at Joliet Junior College is different from $500. To test this claim, he obtains a random sample of 20 students enrolled at the college and asks them to disclose their credit-card debt.

$6000 $0 $200 $0 $400 $1060 $0 $1200 $200 $250$250 $580 $1000 $0 $0 $200 $400 $800 $700 $1000$6000 $0 $200 $0 $400 $1060 $0 $1200 $200 $250$250 $580 $1000 $0 $0 $200 $400 $800 $700 $1000

+ = 8- = 12k = 8n = 20

CV = 5 (from table X)

Two-Tailed Test: (Med ≠ 300)so k = number of smaller of the signs

We reject H0 if k ≤ critical value (out in the tail). Since 8 > 5, we do not reject H0.

Page 29: Lesson 15 - 1 Nonparametric Statistics Overview. Objectives Understand Difference between Parametric and Nonparametric Statistical Procedures Nonparametric.

Large Number Example

(k + ½ ) – ½ nz0 = --------------------- ½ √n

285 310 300300 320 308310 293 329293 326 310297 301 315332 305 340242 310 312329 320 300311 286 309292 287 305

A sports reporter claims that the median weight of offensive linemenin the NFL is greater than 300 pounds. He obtains a random sample of 30 offensive linemen and obtains the data shown in Table 4. Test the reporter’s claim at the α = 0.1 level of significance.

+ = 19- = 80 = 3n = 30-3 = 27k = 8

z0 = -13/30 = -1.92

285 310 300300 320 308310 293 329293 326 310297 301 315332 305 340242 310 312329 320 300311 286 309292 287 305

Right-Tailed Test: (Med > 300)so k = number of - signs

Since z0 < zα (-1.28), we would reject the H0 (median = 300) in favor of the alternative, median > 300

Page 30: Lesson 15 - 1 Nonparametric Statistics Overview. Objectives Understand Difference between Parametric and Nonparametric Statistical Procedures Nonparametric.

Summary and Homework

• Summary– The sign test is a nonparametric test for the median, a

measure of central tendency– This test counts the number of observations higher

and lower than the assumed value of the median– The critical values for small samples are given in

tables– The critical values for large samples can be

approximated by a calculation with the normal distribution

• Homework– problems 5, 6, 10, 12 from the CD

Page 31: Lesson 15 - 1 Nonparametric Statistics Overview. Objectives Understand Difference between Parametric and Nonparametric Statistical Procedures Nonparametric.

Lesson 15 - 4

Inferences about the Differences between Two Medians:

Dependent Samples

Objectives

• Test a claim about the difference between the medians of two dependent samples

Vocabulary• Wilcoxon Matched-Pairs Signed-Ranks Test -- a

nonparametric procedure that is used to test the equality of two population medians by dependent sampling.

• Ranks – 1 through n with ties award the sum of the tied ranks divided by the number tied. For example, if 4th and 5th observation was tied then they would both receive a 4.5 ranking.

• Signed-Ranks – ranks are constructed with absolute values and then the sign of the value (positive or negative) is applied to the ranking.

Page 34: Lesson 15 - 1 Nonparametric Statistics Overview. Objectives Understand Difference between Parametric and Nonparametric Statistical Procedures Nonparametric.

Parametric vs Nonparametric

● For our parametric test for matched-pairs (dependent samples), we Compared the corresponding observations by

subtracting one from the other Performed a test of whether the mean is 0

● For our nonparametric case for matched-pairs (dependent samples), we will Compare the corresponding observations by

subtracting one from the other Perform a test of whether the median is 0

Page 35: Lesson 15 - 1 Nonparametric Statistics Overview. Objectives Understand Difference between Parametric and Nonparametric Statistical Procedures Nonparametric.

Hypothesis Tests Using Wilcoxon TestStep 0: Compute the differences in the matched-pairs observations. Rank the absolute value of all sample differences from smallest to largest after discarding those differences that equal 0. Handle ties by finding the mean of the ranks for tied values. Assign negative values to the ranks where the differences are negative and positive values to the ranks where the differences are positive.

Step 1 Hypotheses:

Step 2 Box Plot: Draw a boxplot of the differences to compare the sample data from the two populations. This helps to visualize the difference in the medians.

Step 3 Level of Significance: (level of significance determines the critical value) Determine a level of significance, based on the seriousness of making a Type I error Small-sample case: Use Table XI. Large-sample case: Use Table IV.

Step 4 Compute Test Statistic:

Step 5 Critical Value Comparison: Reject H0 if test statistic < critical value

Step 6 Conclusion: Reject or Fail to Reject

Left-Tailed Two-Tailed Right-Tailed

H0: MD = 0H1: MD < 0

H0: MD = 0H1: MD ≠ 0

H0: MD= 0H1: MD > 0

Page 36: Lesson 15 - 1 Nonparametric Statistics Overview. Objectives Understand Difference between Parametric and Nonparametric Statistical Procedures Nonparametric.

Test Statistic for the Wilcoxon Matched-Pairs Signed-Ranks Test

Small-Sample Case: (n ≤ 30)

Large-Sample Case: (n > 30)

n (n + 1) T – -------------- 4z0 = ----------------------------- n (n + 1)(2n + 1) ------------------------ 24

Left-Tailed Two-Tailed Right-Tailed

H0: MD = 0H1: MD < 0

H0: MD = 0H1: MD ≠ 0

H0: MD= 0H1: MD > 0

T = T+ T = smaller of T+ or |T-| T = |T-|

where T is the test statistic from the small-sample case.

Note: MD is the median of the differences of matched pairs.

T+ is the sum of the ranks of the positive differencesT- is the sum of the ranks of the negative differences

Page 37: Lesson 15 - 1 Nonparametric Statistics Overview. Objectives Understand Difference between Parametric and Nonparametric Statistical Procedures Nonparametric.

Small-Sample Case (n ≤ 30): Using α as the level of significance, the critical value is obtained from Table XI in Appendix A.

Large-Sample Case (n > 30): Using α as the level of significance, the critical value is obtained from Table IV in Appendix A. The critical value is always in the left tail of the standard normal distribution.

Left-Tailed Two-Tailed Right-Tailed

-zα -zα/2 -zα

Left-Tailed Two-Tailed Right-Tailed

-Tα -Tα/2 -Tα

Critical Value for Wilcoxon Matched-Pairs Signed-Ranks Test

Page 38: Lesson 15 - 1 Nonparametric Statistics Overview. Objectives Understand Difference between Parametric and Nonparametric Statistical Procedures Nonparametric.

Example 1 from 15.4X Y D = Y-X |D| Signed Rank

11.5 26.0 14.5 14.5 +7.5

14.1 26.2 12.1 12.1 +5

19.3 24.6 5.3 5.3 +3

35.0 30.8 -4.2 4.2 -2

15.9 37.5 21.6 21.6 +11

21.5 36.0 14.5 14.5 +7.5

11.7 25.9 14.2 14.2 +6

17.1 16.9 -0.2 0.2 -1

27.3 50.2 22.9 22.9 +12

13.8 33.1 19.3 19.3 +10

43.2 99.9 56.7 56.7 +14

11.2 26.1 14.9 14.9 +9

34.2 43.8 9.6 9.6 +4

26.7 63.8 37.1 37.1 +13

T+ = 88 T- = |-3| = 3

Page 39: Lesson 15 - 1 Nonparametric Statistics Overview. Objectives Understand Difference between Parametric and Nonparametric Statistical Procedures Nonparametric.

Example continued

H0: Meddif = 0

Ha: Meddif > 0 (Right tailed test)

Test Statistic: |T-| = 3

From Table XI: Tcritcal = T0.05 = 25

The test statistic is less than the critical value (3 < 25) so we reject the null hypothesis.

Page 40: Lesson 15 - 1 Nonparametric Statistics Overview. Objectives Understand Difference between Parametric and Nonparametric Statistical Procedures Nonparametric.

Summary and Homework

• Summary– The Wilcoxon sign test is a nonparametric test for

comparing the median of two dependent samples– This test is a weighted count of the differences in

signs between the paired observations– The critical values for small samples are given in

tables– The critical values for large samples can be

approximated by a calculation with the normal distribution

• Homework– problems 2, 4, 5, 8, 9, 15 from the CD

Page 41: Lesson 15 - 1 Nonparametric Statistics Overview. Objectives Understand Difference between Parametric and Nonparametric Statistical Procedures Nonparametric.

Lesson 15 - 5

Inferences about the Differences between Two Medians:

Independent Samples

Objectives

• Test a claim about the difference between the medians of two independent samples

Vocabulary

• Mann–Whitney Test – nonparametric procedure used to test the equality of two population medians from independent samples

• Population “X” – is the population of interest from the problem statement

Page 44: Lesson 15 - 1 Nonparametric Statistics Overview. Objectives Understand Difference between Parametric and Nonparametric Statistical Procedures Nonparametric.

Mann-Whitney Test

• Similar in structure to Wilcoxon Matched-Pairs Signed-Ranks Test

• Because samples are independent different test statistics and critical values are used

Page 45: Lesson 15 - 1 Nonparametric Statistics Overview. Objectives Understand Difference between Parametric and Nonparametric Statistical Procedures Nonparametric.

Hypothesis Tests Using Mann–Whitney TestStep 0 Requirements:

1. the samples are independent random samples and2. the shape of the distributions are the same (assume to met in our problems)

Step 1 Box Plots: Draw a side-by-side boxplot to compare the sample data from the two populations. This helps to visualize the difference in the medians.

Step 2 Hypotheses:

Step 3 Ranks: Rank all sample observations from smallest to largest. Handle ties by finding the mean of the ranks for tied values. Find the sum of the ranks for the sample from population X.

Step 4 Level of Significance: (level of significance determines the critical value) Determine a level of significance, based on the seriousness of making a Type I error Small-sample case: Use Table XII. Large-sample case: Use Table IV.

Step 5 Compute Test Statistic:

Step 6 Critical Value Comparison: Reject H0 if test statistic more extreme (further away from 0) than the critical value

Step 7 Conclusion: Reject or Fail to Reject

Left-Tailed Two-Tailed Right-Tailed

H0: Mx = My H1: Mx < My

H0: Mx = My H1: Mx ≠ My

H0: Mx= My H1: Mx > My

Page 46: Lesson 15 - 1 Nonparametric Statistics Overview. Objectives Understand Difference between Parametric and Nonparametric Statistical Procedures Nonparametric.

Test Statistic for the Mann–Whitney Test

The test statistic will depend on the size of the samples from each population. Let n1 represent the sample size for population X and represent the n2 sample size for population Y.

Small-Sample Case: (n1 ≤ 20 and n2 ≤ 20 )If S is the sum of the ranks corresponding to the sample from population X, then the test statistic, T, is given by

n1 (n1 - 1) T = S – --------------

Note: The value of S is always obtained by summing the ranks of the sample data that correspond to Mx , the median of population X, in the hypothesis.

Large-Sample Case: (n1 > 20 or n2 > 20 )From the Central Limit Theorem, the test statistic is given by

where T is the test statistic from the small-sample case.

n1 n2 T – ---------- 2z0 = ----------------------------- n1n2 (n1 + n2 + 1) ------------------------ 12

Page 47: Lesson 15 - 1 Nonparametric Statistics Overview. Objectives Understand Difference between Parametric and Nonparametric Statistical Procedures Nonparametric.

Small-Sample Case: (n1 ≤ 20 and n2 ≤ 20 )Using α as the level of significance, the critical value is obtained from Table XII in Appendix A.

Large-Sample Case: (n1 > 20 or n2 > 20 )Using α as the level of significance, the critical value is obtained from Table IV in Appendix A.

Left-Tailed Two-Tailed Right-Tailed

-zα zα/2

-zα/2

zα

Left-Tailed Two-Tailed Right-Tailed

wα wα/2

w1- α/2 = n1n2 - wα/2

w1-α = n1n2 - wα

Critical Value for Mann–Whitney Test

Page 48: Lesson 15 - 1 Nonparametric Statistics Overview. Objectives Understand Difference between Parametric and Nonparametric Statistical Procedures Nonparametric.

Example 1 from 15.5

H I Order Rank Order Rank

10 640 10 1.5 320 12.5

320 80 10 1.5 320 12.5

320 1280 80 3.5 320 12.5

320 160 80 3.5 320 12.5

320 640 160 7 640 18

80 640 160 7 640 18

160 1280 160 7 640 18

10 640 160 7 640 18

640 160 160 7 640 18

160 320 320 12.5 1280 21.5

320 160 320 12.5 1280 21.5

101 152 = S

Page 49: Lesson 15 - 1 Nonparametric Statistics Overview. Objectives Understand Difference between Parametric and Nonparametric Statistical Procedures Nonparametric.

Example Cont.

• Hypothesis: H0: Med ill = Med healthy

Ha: Med ill > Med healthy

• Test Statistic: T = S – ½ n1(n1+1) = 152 – ½ 11(12) = 86

where S is the sum of the ranks obtained from the sample observations from ill population and n1 is the number of observations from sample of ill people

• Critical Value: w1-α = n1n2 – wα = 121 – 41 = 80

• Conclusion: Since T > w1-α , we reject the null hypothesis and conclude that there is a difference between the median values for ill and healthy people

Page 50: Lesson 15 - 1 Nonparametric Statistics Overview. Objectives Understand Difference between Parametric and Nonparametric Statistical Procedures Nonparametric.

Summary and Homework

• Summary– The Mann-Whitney test is a nonparametric test for

comparing the median of two independent samples– This test is a sum of ranks of the combined dataset– The critical values for small samples are given in

tables– The critical values for large samples can be

approximated by a calculation with the normal distribution

• Homework– problems 4, 5, 9, 10, 12 from the CD

Page 51: Lesson 15 - 1 Nonparametric Statistics Overview. Objectives Understand Difference between Parametric and Nonparametric Statistical Procedures Nonparametric.

Lesson 15 - 6

Inferences Between Two Variables

Objectives

• Perform Spearman’s rank-correlation test

Vocabulary• Rank-correlation test -- nonparametric procedure

used to test claims regarding association between two variables.

• Spearman’s rank-correlation coefficient -- test statistic, rs

6Σdi² rs = 1 – --------------

n(n²- 1)

Page 54: Lesson 15 - 1 Nonparametric Statistics Overview. Objectives Understand Difference between Parametric and Nonparametric Statistical Procedures Nonparametric.

Association

● Parametric test for correlation: Assumption of bivariate normal is difficult to verify Used regression instead to test whether the slope

is significantly different from 0

● Nonparametric case for association: Compare the relationship between two variables

without assuming that they are bivariate normal Perform a nonparametric test of whether the

association is 0

Page 55: Lesson 15 - 1 Nonparametric Statistics Overview. Objectives Understand Difference between Parametric and Nonparametric Statistical Procedures Nonparametric.

Tale of Two Associations

Similar to our previous hypothesis tests, we can have a two-tailed, a left-tailed, or a right-tailed alternate hypothesis

– A two-tailed alternative hypothesis corresponds to a test of association

– A left-tailed alternative hypothesis corresponds to a test of negative association

– A right-tailed alternative hypothesis corresponds to a test of positive association

Page 56: Lesson 15 - 1 Nonparametric Statistics Overview. Objectives Understand Difference between Parametric and Nonparametric Statistical Procedures Nonparametric.

Test Statistic for Spearman’s Rank-Correlation Test

The test statistic will depend on the size of the sample, n, and on the sum of the squared differences (di²).

6Σdi² rs = 1 – --------------

n(n²- 1)

where di = the difference in the ranks of the two observations (Yi – Xi) in the ith ordered pair.

Spearman’s rank-correlation coefficient, rs, is our test statistic

z0 = rs √n – 1

Small Sample Case: (n ≤ 100)

Large Sample Case: (n > 100)

Page 57: Lesson 15 - 1 Nonparametric Statistics Overview. Objectives Understand Difference between Parametric and Nonparametric Statistical Procedures Nonparametric.

Critical Value for Spearman’s Rank-Correlation Test

Left-Tailed Two-Tailed Right-Tailed

Significance α α/2 α

Decision Rule

Reject if rs < -CV

Reject if rs < -CV or rs > CV

Reject if rs > CV

Using α as the level of significance, the critical value(s) is (are) obtained from Table XIII in Appendix A. For a two-tailed test, be sure to divide the level of significance, α, by 2.

Small Sample Case: (n ≤ 100)

Large Sample Case: (n > 100)

Page 58: Lesson 15 - 1 Nonparametric Statistics Overview. Objectives Understand Difference between Parametric and Nonparametric Statistical Procedures Nonparametric.

Hypothesis Tests Using Spearman’s Rank-Correlation TestStep 0 Requirements: 1. The data are a random sample of n ordered pairs. 2. Each pair of observations is two measurements taken on the same individual

Step 1 Hypotheses: (claim is made regarding relationship between two variables, X and Y) H0: see below H1: see below

Step 2 Ranks: Rank the X-values, and rank the Y-values. Compute the differences between ranks and then square these differences. Compute the sum of the squared differences.

Step 3 Level of Significance: (level of significance determines the critical value) Table XIII in Appendix A. (see below) Step 4 Compute Test Statistic:

Step 5 Critical Value Comparison: Left-Tailed Two-Tailed Right-Tailed

Significance α α/2 α

H0 not associated not associated not associated

H1 negatively associated associated positively associated

Decision Rule

Reject if rs < -CVReject if

rs < -CV or rs > CVReject if rs > CV

6Σdi² rs = 1 – --------------

n(n²- 1)

Page 59: Lesson 15 - 1 Nonparametric Statistics Overview. Objectives Understand Difference between Parametric and Nonparametric Statistical Procedures Nonparametric.

Expectations

• If X and Y were positively associated, then Small ranks of X would tend to correspond to small ranks of Y Large ranks of X would tend to correspond to large ranks of Y The differences would tend to be small positive and small

negative values The squared differences would tend to be small numbers

● If X and Y were negatively associated, then Small ranks of X would tend to correspond to large ranks of Y Large ranks of X would tend to correspond to small ranks of Y The differences would tend to be large positive and large

negative values The squared differences would tend to be large numbers

Page 60: Lesson 15 - 1 Nonparametric Statistics Overview. Objectives Understand Difference between Parametric and Nonparametric Statistical Procedures Nonparametric.

Example 1 from 15.6

S D S-Rank D-Rank d = X - Y d²

100 257 2.5 1 1.5 2.25

102 264 5 4 1 1

103 274 6 6 0 0

101 266 4 5 -1 1

105 277 7.5 8 -0.5 0.25

100 263 2.5 3 -0.5 0.25

99 258 1 2 -1 1

105 275 7.5 7 0.5 0.25

102 267 Ave Sum 6

Calculations:

Page 61: Lesson 15 - 1 Nonparametric Statistics Overview. Objectives Understand Difference between Parametric and Nonparametric Statistical Procedures Nonparametric.

Example 1 Continued

• Hypothesis: H0: X and Y are not associated Ha: X and Y are associated

• Test Statistic:

6 Σdi² 6 (6) 36 rs = 1 - ----------- = 1 – ------------- = 1 - -------- = 0.929 n(n² - 1) 8(64 - 1) 8(63)

• Critical Value: 0.738 (from table XIII)

• Conclusion: Since rs > CV, we reject H0; therefore there is a relationship between club-head speed and distance.

Page 62: Lesson 15 - 1 Nonparametric Statistics Overview. Objectives Understand Difference between Parametric and Nonparametric Statistical Procedures Nonparametric.

Summary and Homework

• Summary– The Spearman rank-correlation test is a

nonparametric test for testing the association of two variables

– This test is a comparison of the ranks of the paired data values

– The critical values for small samples are given in tables

– The critical values for large samples can be approximated by a calculation with the normal distribution

• Homework– problems 3, 6, 7, 10 from the CD

Page 63: Lesson 15 - 1 Nonparametric Statistics Overview. Objectives Understand Difference between Parametric and Nonparametric Statistical Procedures Nonparametric.

Lesson 15 - 7

Test to See if Samples Come From Same Population

Page 64: Lesson 15 - 1 Nonparametric Statistics Overview. Objectives Understand Difference between Parametric and Nonparametric Statistical Procedures Nonparametric.

Objectives• Test a claim using the Kruskal–Wallis test

Page 65: Lesson 15 - 1 Nonparametric Statistics Overview. Objectives Understand Difference between Parametric and Nonparametric Statistical Procedures Nonparametric.

Vocabulary• Kruskal–Wallis Test -- nonparametric procedure

used to test the claim that k (3 or more) independent samples come from populations with the same distribution.

Page 66: Lesson 15 - 1 Nonparametric Statistics Overview. Objectives Understand Difference between Parametric and Nonparametric Statistical Procedures Nonparametric.

Test of Means of 3 or more groups

● Parametric test of the means of three or more groups: Compared the corresponding observations by

subtracting one mean from the other Performed a test of whether the mean is 0

● Nonparametric case for three or more groups: Combine all of the samples and rank this combined

set of data Compare the rankings for the different groups

Page 67: Lesson 15 - 1 Nonparametric Statistics Overview. Objectives Understand Difference between Parametric and Nonparametric Statistical Procedures Nonparametric.

Kruskal-Wallis Test

● Assumptions: Samples are simple random samples from three or

more populations Data can be ranked

● We would expect that the values of the samples, when combined into one large dataset, would be interspersed with each other

● Thus we expect that the average relative ratings of each sample to be about the same

Page 68: Lesson 15 - 1 Nonparametric Statistics Overview. Objectives Understand Difference between Parametric and Nonparametric Statistical Procedures Nonparametric.

Test Statistic for Kruskal–Wallis Test

A computational formula for the test statistic is

where Ri is the sum of the ranks of the ith sample R²1 is the sum of the ranks squared for the first sample R²2 is the sum of the ranks squared for the second sample, and so on n1 is the number of observations in the first sample n2 is the number of observations in the second sample, and so on N is the total number of observations (N = n1 + n2 + … + nk) k is the number of populations being compared.

12 1 ni(N + 1) ²H = -------------- --- Ri - ------------ N(N + 1) ni 2

12 R²1 R²2 R²kH = ------------- ----- + ----- + … + ------- - 3(N + 1) N(N + 1) n1 n2 nk

Page 69: Lesson 15 - 1 Nonparametric Statistics Overview. Objectives Understand Difference between Parametric and Nonparametric Statistical Procedures Nonparametric.

Test Statistic (cont)

● Large values of the test statistic H indicate that the Ri’s are different than expected

● If H is too large, then we reject the null hypothesis that the distributions are the same

● This always is a right-tailed test

Page 70: Lesson 15 - 1 Nonparametric Statistics Overview. Objectives Understand Difference between Parametric and Nonparametric Statistical Procedures Nonparametric.

Critical Value for Kruskal–Wallis Test

Small-Sample CaseWhen three populations are being compared and when the sample size from each population is 5 or less, the critical value is obtained from Table XIV in Appendix A.

Large-Sample CaseWhen four or more populations are being compared or the sample size from one population is more than 5, the critical value is χ²α with k – 1 degrees of freedom, where k is the number of populations and α is the level of significance.

Page 71: Lesson 15 - 1 Nonparametric Statistics Overview. Objectives Understand Difference between Parametric and Nonparametric Statistical Procedures Nonparametric.

Hypothesis Tests Using Kruskal–Wallis TestStep 0 Requirements: 1. The samples are independent random samples. 2. The data can be ranked.

Step 1 Box Plots: Draw side-by-side boxplots to compare the sample data from the populations. Doing so helps to visualize the differences, if any, between the medians.

Step 2 Hypotheses: (claim is made regarding distribution of three or more populations) H0: the distributions of the populations are the same H1: the distributions of the populations are not the same

Step 3 Ranks: Rank all sample observations from smallest to largest. Handle ties by finding the mean of the ranks for tied values. Find the sum of the ranks for each sample.

Step 4 Level of Significance: (level of significance determines the critical value) The critical value is found from Table XIV for small samples. The critical value is χ²α with k – 1 degrees of freedom (found in Table VI) for large samples.

Step 5 Compute Test Statistic:

Step 6 Critical Value Comparison: We reject the null hypothesis if the test statistic is greater than the critical value.

12 R²1 R²2 R²kH = ------------- ----- + ----- + … + ------- - 3(N + 1) N(N + 1) n1 n2 nk

Page 72: Lesson 15 - 1 Nonparametric Statistics Overview. Objectives Understand Difference between Parametric and Nonparametric Statistical Procedures Nonparametric.

Kruskal–Wallis Test Hypothesis

• In this test, the hypotheses are

H0: The distributions of all of the populations are the same

H1: The distributions of all of the populations are not the same

• This is a stronger hypothesis than in ANOVA, where only the means (and not the entire distributions) are compared

Page 73: Lesson 15 - 1 Nonparametric Statistics Overview. Objectives Understand Difference between Parametric and Nonparametric Statistical Procedures Nonparametric.

Example 1 from 15.7

S 20-29 40-49 60-69

1 54 (29) 61 (31.5) 44 (18)

2 43 (16) 41 (14) 65 (34.5)

3 38 (11.5) 44 (18) 62 (33)

4 30 (2) 47 (21) 53 (27.5)

5 61 (31.5) 33 (3) 51 (26)

6 53 (27.5) 29 (1) 49 (22.5)

7 35 (7.5) 59 (30) 49 (22.5)

8 34 (4.5) 35 (7.5) 42 (15)

9 39 (13) 34 (4.5) 35 (7.5)

10 46 (20) 74 (36) 44 (18)

11 50 (24.5) 50 (24.5) 37 (10)

12 35 (7.5) 65 (34.5) 38 (11.5)

Medians (Sums)

41(194.5)

45.5(225.5)

46.5(246)

Page 74: Lesson 15 - 1 Nonparametric Statistics Overview. Objectives Understand Difference between Parametric and Nonparametric Statistical Procedures Nonparametric.

Example 1 (cont)

12 R²1 R²2 R²kH = ------------- ----- + ----- + … + ------- - 3(N + 1) N(N + 1) n1 n2 nk

12 194.5² 225.5² 246²H = ------------- ---------- + --------- + -------- - 3(36 + 1) = 1.009 36(36 + 1) 12 12 12

Critical Value: (Large-Sample Case)χ²α with 2 (3 – 1) degrees of freedom, where 3 is the number of populations and 0.05 is the level of significance

CV= 5.991

Conclusion: Since H < CV, therefore we FTR H0 (distributions are the same)

Page 75: Lesson 15 - 1 Nonparametric Statistics Overview. Objectives Understand Difference between Parametric and Nonparametric Statistical Procedures Nonparametric.

Summary and Homework

• Summary– The Kruskal-Wallis test is a nonparametric test for

comparing the distributions of three or more populations

– This test is a comparison of the rank sums of the populations

– Critical values for small samples are given in tables– The critical values for large samples can be

approximated by a calculation with the chi-square distribution

• Homework– problems 3, 6, 7, 10 from the CD

Page 76: Lesson 15 - 1 Nonparametric Statistics Overview. Objectives Understand Difference between Parametric and Nonparametric Statistical Procedures Nonparametric.

Lesson 15 - R

Chapter 15 Review

Objectives

• Summarize the chapter

• Define the vocabulary used

• Complete all objectives

• Successfully answer any of the review exercises

• Use the technology to compute statistical data in the chapter

Page 78: Lesson 15 - 1 Nonparametric Statistics Overview. Objectives Understand Difference between Parametric and Nonparametric Statistical Procedures Nonparametric.

Nonparametric vs Parametric

• The following nonparametric methods analyze similar problems as parametric methods

The Problem Parametric Nonparametric

Independence of observations

No corresponding procedure

Runs test

Test of central tendency

z-test for the mean

t-test for the mean

Sign test for the median

Test of dependent samples

t-test of the mean of the differences

Wilcoxon signed rank test

Test of independent samples

t-tests of the difference in means

Mann-Whitney test

Correlation Regression Spearman’s rank correlation

Centers of multiple groups

ANOVA (means) Kruskal-Wallis test of distributions

Page 79: Lesson 15 - 1 Nonparametric Statistics Overview. Objectives Understand Difference between Parametric and Nonparametric Statistical Procedures Nonparametric.

Large vs Small Case Sample Sizes

• The following table breaks down small versus large sample sizes for each nonparametric test

Nonparametric Small Sample Large Sample

Runs Test n1 ≤ 20 and n2 ≤ 20 n1 > 20 or n2 > 20

Sign Test n ≤ 25 n > 25

Wilcoxon Test n ≤ 30 n > 30

Mann-Whitney test n1 ≤ 20 and n2 ≤ 20 n1 > 20 or n2 > 20

Spearman’s Test n ≤ 100 n > 100

Kruskal-WallisTest k = 3 and ni ≤ 5 k > 3 or ni > 5

Page 80: Lesson 15 - 1 Nonparametric Statistics Overview. Objectives Understand Difference between Parametric and Nonparametric Statistical Procedures Nonparametric.

Problem 1

A difference between parametric statistical methods and nonparametric statistical methods is that

1) Nonparametric statistical methods are better

2) Parametric methods are always easier to compute

3) Nonparametric methods use few, if any, distribution assumptions

4) All of the above

Page 81: Lesson 15 - 1 Nonparametric Statistics Overview. Objectives Understand Difference between Parametric and Nonparametric Statistical Procedures Nonparametric.

Problem 2

Another name for nonparametric methods is

1) Distribution-free procedures

2) Normal distribution procedures

3) Mean and variance procedures

4) Descriptive methods

Page 82: Lesson 15 - 1 Nonparametric Statistics Overview. Objectives Understand Difference between Parametric and Nonparametric Statistical Procedures Nonparametric.

Problem 3

The runs test is a test of

1) Whether a set of data has a certain mean

2) Whether a set of data is random

3) Whether a set of data has a certain slope

4) Whether two sets of data have different slopes

Page 83: Lesson 15 - 1 Nonparametric Statistics Overview. Objectives Understand Difference between Parametric and Nonparametric Statistical Procedures Nonparametric.

Problem 4

In the runs test, if there are very few runs, then we are likely to

1) Reject or do not reject the null hypothesis depending on whether this is a two-tailed, left-tailed, or right-tailed test

2) Reject the null hypothesis that the data are random

3) Not reject the null hypothesis that the data are random

4) All of the above

Page 84: Lesson 15 - 1 Nonparametric Statistics Overview. Objectives Understand Difference between Parametric and Nonparametric Statistical Procedures Nonparametric.

Problem 5

The sign test is a test of

1) Whether a set of data has a certain mean

2) Whether a set of data is random

3) Whether a set of data has a certain slope

4) Whether a set of data has a certain median

Page 85: Lesson 15 - 1 Nonparametric Statistics Overview. Objectives Understand Difference between Parametric and Nonparametric Statistical Procedures Nonparametric.

Problem 6

The sign test counts

1) The number of values more than and less than the hypothesized mean

2) The sum of all the positive values of the variable

3) The z-score of the sample median

4) The number of values more than and less than the hypothesized median

Page 86: Lesson 15 - 1 Nonparametric Statistics Overview. Objectives Understand Difference between Parametric and Nonparametric Statistical Procedures Nonparametric.

Problem 7

The Wilcoxon signed-rank test is a test of

1) Whether two sets of data with matched observations have the same median

2) Whether one set of data has more positive or negative values

3) Whether one set of data has a certain slope

4) Whether two sets of data have the same slopes

Page 87: Lesson 15 - 1 Nonparametric Statistics Overview. Objectives Understand Difference between Parametric and Nonparametric Statistical Procedures Nonparametric.

Problem 8

To perform the Wilcoxon matched-pairs signed-ranks test, we

1) Compare the number of positive values for the two sets of data

2) Rank the differences of the matched observations

3) Switch the signs of all of the data values

4) All of the above

Page 88: Lesson 15 - 1 Nonparametric Statistics Overview. Objectives Understand Difference between Parametric and Nonparametric Statistical Procedures Nonparametric.

Problem 9

The Mann-Whitney test is a test of

1) Whether a set of data has a certain mean

2) Whether two sets of data are both random

3) Whether two independent samples have the same medians

4) Whether two independent samples have the same standard deviations

Page 89: Lesson 15 - 1 Nonparametric Statistics Overview. Objectives Understand Difference between Parametric and Nonparametric Statistical Procedures Nonparametric.

Problem 10

The Mann-Whitney test uses

1) The rankings of each sample’s observations when the two samples are combined

2) A sum of the ranks of one sample’s observations

3) Either a table or an approximation using the normal distribution to find the critical values

4) All of the above

Page 90: Lesson 15 - 1 Nonparametric Statistics Overview. Objectives Understand Difference between Parametric and Nonparametric Statistical Procedures Nonparametric.

Problem 11

Spearman's rank correlation test is a test of

1) Whether a set of ordered pairs of data has an association

2) Whether a set of ordered pairs of data is listed in a random order

3) Whether two sets of data have the same median

4) Whether two sets of data have the same slope

Page 91: Lesson 15 - 1 Nonparametric Statistics Overview. Objectives Understand Difference between Parametric and Nonparametric Statistical Procedures Nonparametric.

Problem 12

A group of students enter a contest that has two parts. If the X variable is the student’s rank on the first part and the Y variable is the student’s rank on the second part, then Spearman’s rank correlation test

1) Can be applied because the data is ordered

2) Can be applied because the data is bivariate normal

3) Cannot be applied because ranks cannot be added

4) Cannot be applied because the two variables are dependent