Checking data for outliers 2. 3σedit rule and Hampel’s method · Checking data for outliers 2....
Click here to load reader
-
Upload
vuongquynh -
Category
Documents
-
view
215 -
download
3
Transcript of Checking data for outliers 2. 3σedit rule and Hampel’s method · Checking data for outliers 2....
Checking data for outliers2. 3σ edit rule and Hampel’s method
7th Seminar on statistics in seed testing
Gregoire, Laffont, Remund
ISTA Statistics Committee 2
0.1 0.2 0.3 0.4 0.5 0.6
Overview• Consider this set of real time PCR results (%):
• Can we detect this exotic value with an automatic method?
Alarm from the boxplot
xi 0.1193 0.1038 0.0923 0.1173 0.1494 0.1229 0.1125 0.1061 0.0940 0.1213 0.1314 0.1151 0.1159 0.1298 0.5977
i 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
ISTA Statistics Committee 3
3σ edit rule
If x ~ N(μ, σ2)
x
P(x
)
3σ 3σ
then P(|x-μ|>3σ) ≈ 0.0027
The probability that an observed value is outside the range [μ-3σ ; μ+3σ] is very small
3σ edit rule
ISTA Statistics Committee 4
3σ edit rule
1. Estimate the mean x and the standard-deviation s:x = 0.1486 s = 0.1251
xi 0.1193 0.1038 0.0923 0.1173 0.1494 0.1229 0.1125 0.1061 0.0940 0.1213 0.1314 0.1151 0.1159 0.1298 0.5977
i 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
sxxz i
i−
=2. For each value xi in the dataset, compute:
zi -0.2341 -0.3581 -0.4500 -0.2501 0.0065 -0.2054 -0.2885 -0.3397 -0.4364 -0.2181 -0.1374 -0.2677 -0.2613 -0.1502 3.5905
3. Identify xi as an outlier if |zi| > 3
ISTA Statistics Committee 5
3σ edit rule
Problem: the mean and the standard-deviation are sensitiveto outliers
• The mean moves towards outliers• The standard-deviation is inflated
Hampel’s method preferred
ISTA Statistics Committee 6
Hampel’s method
• An outlier-resistant alternative to the mean: the median x~
• An outlier-resistant alternative to the standard-deviation: the MAD
MAD = { }x~xmedian i −
• Hampel’s method: identify xi as an outlier if MAD2.5x~x i >−
ISTA Statistics Committee 7
xi 0.1193 0.1038 0.0923 0.1173 0.1494 0.1229 0.1125 0.1061 0.0940 0.1213 0.1314 0.1151 0.1159 0.1298 0.5977
i 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Hampel’s method
1. Compute the median x and the MAD:x = 0.1173 MAD = 0.0112
2. For each value xi in the dataset, compute: x~xd ii −=di 0.0020 0.0135 0.0250 0.0000 0.0321 0.0056 0.0048 0.0112 0.0233 0.0040 0.0141 0.0022 0.0014 0.0125 0.4804
3. Identify xi as an outlier if di is greater than 5.2 MAD = 0.05824.
ISTA Statistics Committee 8
Exercise
Using the 3σ edit rule and the Hampel’s method, do youidentify any outliers in the following dataset?
xi0.29540.14360.10750.17310.17710.1259