STAT 416 – Nonparametric Tests - University of Illinois...

5

Click here to load reader

Transcript of STAT 416 – Nonparametric Tests - University of Illinois...

Page 1: STAT 416 – Nonparametric Tests - University of Illinois …wangjing/stat416/NonparametricTest.pdfSTAT 416 – Nonparametric Tests 1. Test for Randomness (Trend) (1). Test based on

STAT 416 – Nonparametric Tests

1. Test for Randomness (Trend)

(1). Test based on total number of runs

(2). Runs Up and Down:

(3). Rank von Nueman Test

2. Goodness-of-fit test.

H0 : FX (x) = F0 (x) for all x, F0 (x) is specified.

(1). Chi-square Test.

Q =k∑

i=1

(fi − ei)

ei

∼ χ2 (k − 1) , under null hypothesis,

where fi is the observed frequency in i-th group, and ei is the ex-

pected frequency in i-th group, i = 1, ..., k.

(2). Kolmogrov-Smirnov Test.

Dn = supx‖Fn (x) − F0 (x)‖

where Fn (x) is the empirical cdf of the observed sample.

3. Location Test for One-Sample and Paired-Sample.

Data {Xi}ni=1 or {Di = Yi −Xi}n

i=1 for paired sample {(Xi, Yi)}ni=1.

Hypothesis H0 : M = M0 (θ = P (X > M0) = 0.5) , where M0 is

specific median.

(1). Sign Test

K =n∑

i=1

I {Xi > M0} ∼ Binomial (n, p = 0.5) , underH0

(2). Wilcoxon Signed-Rank Test

T+ =n∑

i=1

Zi · r (|Di|)

where Di = Xi −M0, Zi = I {Di > 0} .

1

Page 2: STAT 416 – Nonparametric Tests - University of Illinois …wangjing/stat416/NonparametricTest.pdfSTAT 416 – Nonparametric Tests 1. Test for Randomness (Trend) (1). Test based on

4. Test for General Two-Sample

Data: {X1, ...Xm} {Y1,..., Yn} two independent samples.

(1). Hypothesis H0 : FX (x) = FY (x) for all x. Kolmogrov-

Smirnov Test.

Dm,n = supx‖Fm,X (x) − Fn,Y (x)‖

where Fm,X and Fn,Y are the empirical cdfs of samples X and Y .

(2). Location Test for General Two-Sample from a Location Fam-

ily. Hypothesis H0 : MX = MY .

a). Median Test: U =m∑

i=1I {Xi < M} ,where M is the median of

pooled sample {X1, ...Xm, Y1,..., Yn}.b). Control Median (Y is a control sample): V =

m∑i=1

I {Xi < MY }c). Mann-Whitney Test (for tendency):

U =m∑

i=1

n∑j=1

Dij, Dij = I {YJ < Xi}

5. Linear Rank Test for Location Problems

Hypothesis H0 : FY (x) = FX (x) for all x, vs. H1 : FY (x) =

FX (x− θ) for all x and some θ 6= 0. Pooled sample size N=n+m.

Wilcoxon Rank Sum Test. H0 : θ = 0, vs. H1 : θ 6= 0,

WN =N∑

i=1

iZi,

where Zi = 1 if the i-th ordered random variable is an X; otherwise

Zi = 0.

6. Linear Rank Test for Scale Problems

Hypothesis H0 : FY (x) = FX (x) for all x, vs. H1 : FY (x) =

FX (x · θ) for all x and some θ > 0, θ 6= 1.

(1). Mood Test:

MN =N∑

i=1

(i− N + 1

2

)2

Zi

2

Page 3: STAT 416 – Nonparametric Tests - University of Illinois …wangjing/stat416/NonparametricTest.pdfSTAT 416 – Nonparametric Tests 1. Test for Randomness (Trend) (1). Test based on

(2). Siegel-Tukey Test

SN =N∑

i=1

aiZi, where ai =

2i for i even, , 1 < i ≤ N/2

2i− 1 for i odd, , 1 < i ≤ N/2

2 (N − i) + 2 for i even, , N/2 < i ≤ N

2 (N − i) + 1 for i odd, , N/2 < i ≤ N

(3). Sukhatme Test

T =m∑

i=1

n∑j=1

Dij, where Dij = I {YJ < Xi < 0, or 0 < Xi < Yj}

7. Test for Equality of k Independent Samples

All samples are from a location model F (x− θi) , i = 1, ...k.

Hypothesis: H0 : θ1 = θ2 = ... = θk vs H1 : θi 6= θj, for at least

one pair.

(1). Extension of Median Test and Control Median Test

(3). Kruskal-Wallis Test:

H =12

N (N + 1)

k∑i=1

1

ni

[Ri −

ni (N + 1)

2

]2

where Ri is the rank sum of i-th sample, ni is the size of i-th sample

(4). Multiple Comparison

Zij =

∣∣∣R̄i − R̄j

∣∣∣√N(N+1)

12

(1ni

+ 1nj

)Compare the above statistic with normal score z∗ = Φ−1 (α∗/ (k (k − 1))) ,for

multiple comparison α∗ = 0.20.

8. Measure of Association for Bivariate Sample

Data {(Xi, Yi) , i = 1, ..., n} . Hypothesis H0 : two samples are

independent

3

Page 4: STAT 416 – Nonparametric Tests - University of Illinois …wangjing/stat416/NonparametricTest.pdfSTAT 416 – Nonparametric Tests 1. Test for Randomness (Trend) (1). Test based on

(1). Kendall’s Tau Statistic for τ = pc − pd (difference of proba-

bilities of concordance and discordance)

T =

n∑i=1

n∑j=1

Aij

n (n− 1), where Aij = sign (Xj −Xi) sign (Yj − Yi) ,

(2). Spearman’s Rho coefficient of rank correlation

R =12

n∑i=1

(Ri − R̄

) (Si − S̄

)n (n2 − 1)

= 1 −6

n∑i=1

D2i

n (n2 − 1)

where Ri and Si are the ranks of Xi and Yi respectively, Di = Ri−Si.

9. Friedman’s ANOVA Test by Ranks

A set of observations is collected over k blocks and n treatments

(complete randomized block design), its rank Rij is the rank of ob-

servation in i-th block. Rj =k∑

i=1Rij is the rank of j-th treatment

Hypothesis on treatment effect. H0 : θ1 = θ2 = ... = θn

Friedman’s Test Statistic

S =n∑

j=1

(Rj −

k (n + 1)

2

)2

Q =12 · S

kn (n + 1)∼ χ2 (n− 1) under H0.

10. Kendall’s Coefficient of Concordance of k sets of n

objects

There are k sets of observations collected, and each set includes

n objects. Rank Rij is the rank of observation in i-th set, and

Rj =k∑

i=1Rij.

Hypothesis H0 : k sets are independent (or there is no associa-

tion).

4

Page 5: STAT 416 – Nonparametric Tests - University of Illinois …wangjing/stat416/NonparametricTest.pdfSTAT 416 – Nonparametric Tests 1. Test for Randomness (Trend) (1). Test based on

The deviation statistics is

S =n∑

j=1

(Rj −

k (n + 1)

2

)2

Q =12 · S

kn (n + 1)∼ χ2 (n− 1) under H0.

Kendall’s Coefficient of Concordance (ratio statistic):

W =12 · S

k2n (n− 1)

where 0 ≤ W ≤ 1.

11. Chisquare Test for Independence (Count Data)

Two-dimensional contingency table lists count number Xij at i-th

level of factor A (Ai) and j-th level of factor B (Bj). Denote Xi· and

X·j be the row total and column total. Let θij = P (Ai ∩Bj) , θi· =∑j θij = P (Ai) , θ·j =

∑i θij = P (Bj) , which is subject to restric-

tion∑

i θi· =∑

j θ·j = 1.

The hypothesis of independence:

H0 : θij = θi·θ·j for all i and all j

Under the null hypothesis, test statistic

Q =r∑

i=1

k∑j=1

(NXij −Xi·X·j)2

NXi·X·j∼ χ2 ((r − 1) (k − 1)) .

12. Fisher’s Exact Test

Two independent bionomial random samples, Yi ∼ Bin (ni, θi) , i =

1, 2. Under null hypothesis H0 : θ1 = θ2 = θ, the exact distribution

given that Y = Y1 + Y2

P (Y1 = y1|Y = y) =

n1

y1

n2

y − y1

N

y

where N = n1 + n2.

5