Checking data for outliers 2. 3σedit rule and Hampel’s method · Checking data for outliers 2....

8

Click here to load reader

Transcript of Checking data for outliers 2. 3σedit rule and Hampel’s method · Checking data for outliers 2....

Page 1: Checking data for outliers 2. 3σedit rule and Hampel’s method · Checking data for outliers 2. 3σedit rule and Hampel’s method 7th Seminar on statistics ... XXX Created Date:

Checking data for outliers2. 3σ edit rule and Hampel’s method

7th Seminar on statistics in seed testing

Gregoire, Laffont, Remund

Page 2: Checking data for outliers 2. 3σedit rule and Hampel’s method · Checking data for outliers 2. 3σedit rule and Hampel’s method 7th Seminar on statistics ... XXX Created Date:

ISTA Statistics Committee 2

0.1 0.2 0.3 0.4 0.5 0.6

Overview• Consider this set of real time PCR results (%):

• Can we detect this exotic value with an automatic method?

Alarm from the boxplot

xi 0.1193 0.1038 0.0923 0.1173 0.1494 0.1229 0.1125 0.1061 0.0940 0.1213 0.1314 0.1151 0.1159 0.1298 0.5977

i 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Page 3: Checking data for outliers 2. 3σedit rule and Hampel’s method · Checking data for outliers 2. 3σedit rule and Hampel’s method 7th Seminar on statistics ... XXX Created Date:

ISTA Statistics Committee 3

3σ edit rule

If x ~ N(μ, σ2)

x

P(x

)

3σ 3σ

then P(|x-μ|>3σ) ≈ 0.0027

The probability that an observed value is outside the range [μ-3σ ; μ+3σ] is very small

3σ edit rule

Page 4: Checking data for outliers 2. 3σedit rule and Hampel’s method · Checking data for outliers 2. 3σedit rule and Hampel’s method 7th Seminar on statistics ... XXX Created Date:

ISTA Statistics Committee 4

3σ edit rule

1. Estimate the mean x and the standard-deviation s:x = 0.1486 s = 0.1251

xi 0.1193 0.1038 0.0923 0.1173 0.1494 0.1229 0.1125 0.1061 0.0940 0.1213 0.1314 0.1151 0.1159 0.1298 0.5977

i 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

sxxz i

i−

=2. For each value xi in the dataset, compute:

zi -0.2341 -0.3581 -0.4500 -0.2501 0.0065 -0.2054 -0.2885 -0.3397 -0.4364 -0.2181 -0.1374 -0.2677 -0.2613 -0.1502 3.5905

3. Identify xi as an outlier if |zi| > 3

Page 5: Checking data for outliers 2. 3σedit rule and Hampel’s method · Checking data for outliers 2. 3σedit rule and Hampel’s method 7th Seminar on statistics ... XXX Created Date:

ISTA Statistics Committee 5

3σ edit rule

Problem: the mean and the standard-deviation are sensitiveto outliers

• The mean moves towards outliers• The standard-deviation is inflated

Hampel’s method preferred

Page 6: Checking data for outliers 2. 3σedit rule and Hampel’s method · Checking data for outliers 2. 3σedit rule and Hampel’s method 7th Seminar on statistics ... XXX Created Date:

ISTA Statistics Committee 6

Hampel’s method

• An outlier-resistant alternative to the mean: the median x~

• An outlier-resistant alternative to the standard-deviation: the MAD

MAD = { }x~xmedian i −

• Hampel’s method: identify xi as an outlier if MAD2.5x~x i >−

Page 7: Checking data for outliers 2. 3σedit rule and Hampel’s method · Checking data for outliers 2. 3σedit rule and Hampel’s method 7th Seminar on statistics ... XXX Created Date:

ISTA Statistics Committee 7

xi 0.1193 0.1038 0.0923 0.1173 0.1494 0.1229 0.1125 0.1061 0.0940 0.1213 0.1314 0.1151 0.1159 0.1298 0.5977

i 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Hampel’s method

1. Compute the median x and the MAD:x = 0.1173 MAD = 0.0112

2. For each value xi in the dataset, compute: x~xd ii −=di 0.0020 0.0135 0.0250 0.0000 0.0321 0.0056 0.0048 0.0112 0.0233 0.0040 0.0141 0.0022 0.0014 0.0125 0.4804

3. Identify xi as an outlier if di is greater than 5.2 MAD = 0.05824.

Page 8: Checking data for outliers 2. 3σedit rule and Hampel’s method · Checking data for outliers 2. 3σedit rule and Hampel’s method 7th Seminar on statistics ... XXX Created Date:

ISTA Statistics Committee 8

Exercise

Using the 3σ edit rule and the Hampel’s method, do youidentify any outliers in the following dataset?

xi0.29540.14360.10750.17310.17710.1259