STAT 416 – Nonparametric Tests - University of Illinois...
Click here to load reader
Transcript of STAT 416 – Nonparametric Tests - University of Illinois...
STAT 416 – Nonparametric Tests
1. Test for Randomness (Trend)
(1). Test based on total number of runs
(2). Runs Up and Down:
(3). Rank von Nueman Test
2. Goodness-of-fit test.
H0 : FX (x) = F0 (x) for all x, F0 (x) is specified.
(1). Chi-square Test.
Q =k∑
i=1
(fi − ei)
ei
∼ χ2 (k − 1) , under null hypothesis,
where fi is the observed frequency in i-th group, and ei is the ex-
pected frequency in i-th group, i = 1, ..., k.
(2). Kolmogrov-Smirnov Test.
Dn = supx‖Fn (x) − F0 (x)‖
where Fn (x) is the empirical cdf of the observed sample.
3. Location Test for One-Sample and Paired-Sample.
Data {Xi}ni=1 or {Di = Yi −Xi}n
i=1 for paired sample {(Xi, Yi)}ni=1.
Hypothesis H0 : M = M0 (θ = P (X > M0) = 0.5) , where M0 is
specific median.
(1). Sign Test
K =n∑
i=1
I {Xi > M0} ∼ Binomial (n, p = 0.5) , underH0
(2). Wilcoxon Signed-Rank Test
T+ =n∑
i=1
Zi · r (|Di|)
where Di = Xi −M0, Zi = I {Di > 0} .
1
4. Test for General Two-Sample
Data: {X1, ...Xm} {Y1,..., Yn} two independent samples.
(1). Hypothesis H0 : FX (x) = FY (x) for all x. Kolmogrov-
Smirnov Test.
Dm,n = supx‖Fm,X (x) − Fn,Y (x)‖
where Fm,X and Fn,Y are the empirical cdfs of samples X and Y .
(2). Location Test for General Two-Sample from a Location Fam-
ily. Hypothesis H0 : MX = MY .
a). Median Test: U =m∑
i=1I {Xi < M} ,where M is the median of
pooled sample {X1, ...Xm, Y1,..., Yn}.b). Control Median (Y is a control sample): V =
m∑i=1
I {Xi < MY }c). Mann-Whitney Test (for tendency):
U =m∑
i=1
n∑j=1
Dij, Dij = I {YJ < Xi}
5. Linear Rank Test for Location Problems
Hypothesis H0 : FY (x) = FX (x) for all x, vs. H1 : FY (x) =
FX (x− θ) for all x and some θ 6= 0. Pooled sample size N=n+m.
Wilcoxon Rank Sum Test. H0 : θ = 0, vs. H1 : θ 6= 0,
WN =N∑
i=1
iZi,
where Zi = 1 if the i-th ordered random variable is an X; otherwise
Zi = 0.
6. Linear Rank Test for Scale Problems
Hypothesis H0 : FY (x) = FX (x) for all x, vs. H1 : FY (x) =
FX (x · θ) for all x and some θ > 0, θ 6= 1.
(1). Mood Test:
MN =N∑
i=1
(i− N + 1
2
)2
Zi
2
(2). Siegel-Tukey Test
SN =N∑
i=1
aiZi, where ai =
2i for i even, , 1 < i ≤ N/2
2i− 1 for i odd, , 1 < i ≤ N/2
2 (N − i) + 2 for i even, , N/2 < i ≤ N
2 (N − i) + 1 for i odd, , N/2 < i ≤ N
(3). Sukhatme Test
T =m∑
i=1
n∑j=1
Dij, where Dij = I {YJ < Xi < 0, or 0 < Xi < Yj}
7. Test for Equality of k Independent Samples
All samples are from a location model F (x− θi) , i = 1, ...k.
Hypothesis: H0 : θ1 = θ2 = ... = θk vs H1 : θi 6= θj, for at least
one pair.
(1). Extension of Median Test and Control Median Test
(3). Kruskal-Wallis Test:
H =12
N (N + 1)
k∑i=1
1
ni
[Ri −
ni (N + 1)
2
]2
where Ri is the rank sum of i-th sample, ni is the size of i-th sample
(4). Multiple Comparison
Zij =
∣∣∣R̄i − R̄j
∣∣∣√N(N+1)
12
(1ni
+ 1nj
)Compare the above statistic with normal score z∗ = Φ−1 (α∗/ (k (k − 1))) ,for
multiple comparison α∗ = 0.20.
8. Measure of Association for Bivariate Sample
Data {(Xi, Yi) , i = 1, ..., n} . Hypothesis H0 : two samples are
independent
3
(1). Kendall’s Tau Statistic for τ = pc − pd (difference of proba-
bilities of concordance and discordance)
T =
n∑i=1
n∑j=1
Aij
n (n− 1), where Aij = sign (Xj −Xi) sign (Yj − Yi) ,
(2). Spearman’s Rho coefficient of rank correlation
R =12
n∑i=1
(Ri − R̄
) (Si − S̄
)n (n2 − 1)
= 1 −6
n∑i=1
D2i
n (n2 − 1)
where Ri and Si are the ranks of Xi and Yi respectively, Di = Ri−Si.
9. Friedman’s ANOVA Test by Ranks
A set of observations is collected over k blocks and n treatments
(complete randomized block design), its rank Rij is the rank of ob-
servation in i-th block. Rj =k∑
i=1Rij is the rank of j-th treatment
Hypothesis on treatment effect. H0 : θ1 = θ2 = ... = θn
Friedman’s Test Statistic
S =n∑
j=1
(Rj −
k (n + 1)
2
)2
Q =12 · S
kn (n + 1)∼ χ2 (n− 1) under H0.
10. Kendall’s Coefficient of Concordance of k sets of n
objects
There are k sets of observations collected, and each set includes
n objects. Rank Rij is the rank of observation in i-th set, and
Rj =k∑
i=1Rij.
Hypothesis H0 : k sets are independent (or there is no associa-
tion).
4
The deviation statistics is
S =n∑
j=1
(Rj −
k (n + 1)
2
)2
Q =12 · S
kn (n + 1)∼ χ2 (n− 1) under H0.
Kendall’s Coefficient of Concordance (ratio statistic):
W =12 · S
k2n (n− 1)
where 0 ≤ W ≤ 1.
11. Chisquare Test for Independence (Count Data)
Two-dimensional contingency table lists count number Xij at i-th
level of factor A (Ai) and j-th level of factor B (Bj). Denote Xi· and
X·j be the row total and column total. Let θij = P (Ai ∩Bj) , θi· =∑j θij = P (Ai) , θ·j =
∑i θij = P (Bj) , which is subject to restric-
tion∑
i θi· =∑
j θ·j = 1.
The hypothesis of independence:
H0 : θij = θi·θ·j for all i and all j
Under the null hypothesis, test statistic
Q =r∑
i=1
k∑j=1
(NXij −Xi·X·j)2
NXi·X·j∼ χ2 ((r − 1) (k − 1)) .
12. Fisher’s Exact Test
Two independent bionomial random samples, Yi ∼ Bin (ni, θi) , i =
1, 2. Under null hypothesis H0 : θ1 = θ2 = θ, the exact distribution
given that Y = Y1 + Y2
P (Y1 = y1|Y = y) =
n1
y1
n2
y − y1
N
y
where N = n1 + n2.
5