Outlier Detection with Dirichlet Process Mixtures · Outlier Detection with Dirichlet Process...

14
Outlier Detection with Dirichlet Process Mixtures Matthew S. Shotwell 1 Elizabeth H. Slate 2 Vanderbilt University 1 Medical University of South Carolina 2 August 3, 2011

Transcript of Outlier Detection with Dirichlet Process Mixtures · Outlier Detection with Dirichlet Process...

Page 1: Outlier Detection with Dirichlet Process Mixtures · Outlier Detection with Dirichlet Process Mixtures ... outlier detection with Dirichlet process mixtures is ... I we proposed a

Outlier Detection with Dirichlet Process Mixtures

Matthew S. Shotwell1 Elizabeth H. Slate2

Vanderbilt University1

Medical University of South Carolina2

August 3, 2011

Page 2: Outlier Detection with Dirichlet Process Mixtures · Outlier Detection with Dirichlet Process Mixtures ... outlier detection with Dirichlet process mixtures is ... I we proposed a

Dirichlet Process Mixture (DPM)

yi|θi ∼ L(θi; yi) i = 1 .. nθi ∼ GG ∼ DP (α,G0)

I DP is a distribution over distributions

I G is discrete ⇒ P (θj = θk) > 0

I if θj = θk, then yj and yk are clustered

Page 3: Outlier Detection with Dirichlet Process Mixtures · Outlier Detection with Dirichlet Process Mixtures ... outlier detection with Dirichlet process mixtures is ... I we proposed a

Product Partition Model (PPM)

yi|zi = k, φk ∼ L(φk; yi) i = 1 .. nφk ∼ G0(φk) k = 1 .. r

P (z) ∝r∏

k=1

αΓ(nk)

I z is the data partition parameter

I estimating z ‘partitions’, or ‘clusters’ the data

I if zj = zk, then yj and yk are clustered

I [Hartigan, 1990]

Page 4: Outlier Detection with Dirichlet Process Mixtures · Outlier Detection with Dirichlet Process Mixtures ... outlier detection with Dirichlet process mixtures is ... I we proposed a

Outlier Detection Using PartitioningSteps:

1. set a “small” cluster threshold (e.g. 1% of n)2. estimate the data partition (i.e. cluster the data)3. “small” clusters are considered outlying

I an outlier partition contains one or more small outlier clusters

Histogram

Den

sity

0 5 10

0.0

0.1

0.2

0.3

Page 5: Outlier Detection with Dirichlet Process Mixtures · Outlier Detection with Dirichlet Process Mixtures ... outlier detection with Dirichlet process mixtures is ... I we proposed a

Quantifying Evidence to Detect Outliers: Questions

If the data partition (z) is estimated, and outlier clustersare discovered, how much evidence suggests that theseclusters are truely different from the others?

Can the partition estimate be restricted such that aminimum level of evidence is required to identify outlierclusters? Yes!

Page 6: Outlier Detection with Dirichlet Process Mixtures · Outlier Detection with Dirichlet Process Mixtures ... outlier detection with Dirichlet process mixtures is ... I we proposed a

A Criterion for Outlier Detection: Setup

Consider an outlier partition zo (n = 10):

zo = [1, 1, 1, 1, 1, 1, 1, 1, 2, 3]

zm1 = [1, 1, 1, 1, 1, 1, 1, 1, 1, 3]

zm2 = [1, 1, 1, 1, 1, 1, 1, 1, 2, 1]

zm3 = [1, 1, 1, 1, 1, 1, 1, 1, 2, 2]

zm4 = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]

I zm· are formed by merging the outliers in zo.

I outlier detection is a decision between zo and zm·.

I denote the collection zm· as Mo

Page 7: Outlier Detection with Dirichlet Process Mixtures · Outlier Detection with Dirichlet Process Mixtures ... outlier detection with Dirichlet process mixtures is ... I we proposed a

A Criterion for Outlier Detection: The Trick

zo is favored if, for all zm ∈Mo

P (zo|y) > P (zm|y)

P (y|zo)P (y|zm)

>P (zm)

P (zo)

BFom >P (zm)

P (zo)

BFom >1

ανβom

I ν is the number of clusters merged to arrive at zm

I βom (a ratio involving Γ(·)) is always ≥ 1 for zm ∈Mo

I to favor zo, BFom must exceed 1αν

I BFom must increase 1/α fold for each outlier

Page 8: Outlier Detection with Dirichlet Process Mixtures · Outlier Detection with Dirichlet Process Mixtures ... outlier detection with Dirichlet process mixtures is ... I we proposed a

A Criterion for Outlier Detection: How to Fix α

I set the criteria by fixing α

I use Jeffrey’s scale of evidence for Bayes factors

I [Efron and Gous, 2001]

Evidence for zo1/α < 1 negative

1 ≤ 1/α < 3 barely worth a mention3 ≤ 1/α < 20 positive

20 ≤ 1/α < 150 strong150 ≤ 1/α very strong

Page 9: Outlier Detection with Dirichlet Process Mixtures · Outlier Detection with Dirichlet Process Mixtures ... outlier detection with Dirichlet process mixtures is ... I we proposed a

A Criterion for Outlier Detection: Nice Properties

MAP partition estimates automatically satisfy thecriterion for fixed α. Hence, no special or novelcomputational methods are required.

Because the DPM accommodates any data likelihood,outlier detection with Dirichlet process mixtures ispossible with any statistical model that specifies alikelihood function.

Page 10: Outlier Detection with Dirichlet Process Mixtures · Outlier Detection with Dirichlet Process Mixtures ... outlier detection with Dirichlet process mixtures is ... I we proposed a

Microarray Time Series in Cell Cycle Synchronized YeastI [Spellman et al., 1998]I 66 minute period, 2 cycles

G1 Phase

Time (min)

−2

−1

0

1

2

0 20 40 60 80 100 120

Norm

aliz

ed E

xpre

ssio

n

Page 11: Outlier Detection with Dirichlet Process Mixtures · Outlier Detection with Dirichlet Process Mixtures ... outlier detection with Dirichlet process mixtures is ... I we proposed a

Microarray Time Series in Cell Cycle Synchronized Yeast

α = 1150

G1 Phase

Time (min)

Nor

mal

ized

Exp

ress

ion

−2

−10

12

20 60 100

n=107 n=81

20 60 100

n=20 n=6

20 60 100

n=16

n=11 n=4 n=4 n=6

−2−10

12n=10

−2−1

012 n=5 n=1 n=2 n=4 n=1

n=3 n=4 n=2 n=2

−2−1

012n=1

−2−1

01

2 n=1

20 60 100

n=2 n=2

20 60 100

n=1 n=1

Page 12: Outlier Detection with Dirichlet Process Mixtures · Outlier Detection with Dirichlet Process Mixtures ... outlier detection with Dirichlet process mixtures is ... I we proposed a

MAP Estimation for z

I Agglomeration [Ward, 1963]

I Polya Urn Gibbs Sampler [MacEachern, 1994]

I Split-Merge Sampler [Jain and Neal, 2004]

I SUGS [Wang and Dunson, 2010]

I sampling is overkill for MAP estimationI we proposed a stochastic algorithm:

I consists of ‘Explode’ and ‘Merge’ stepsI consistent for the MAP estimateI avoids complexity of samplingI facilitates parallel search of partition space

I R package profdpm

Page 13: Outlier Detection with Dirichlet Process Mixtures · Outlier Detection with Dirichlet Process Mixtures ... outlier detection with Dirichlet process mixtures is ... I we proposed a

Outlier Detection with Finite Mixtures

I [Fraley and Raftery, 2002]

I select z that maximizes the BIC

I requires BFom > nρ2ν

I i.e. BFom must increase nρ2 fold for each outlier

I DPM outlier detection is generally more conservative.

Page 14: Outlier Detection with Dirichlet Process Mixtures · Outlier Detection with Dirichlet Process Mixtures ... outlier detection with Dirichlet process mixtures is ... I we proposed a

Efron, B. and Gous, A. (2001).

Scales of evidence for model selection: Fisher versus jeffreys.In Lahiri, P., editor, Model Selection: Lecture Notes–Monograph Series, volume 38, pages 208–246.Institute of Mathematical Statistics, Beachwood, OH.

Fraley, C. and Raftery, A. E. (2002).

Model-based clustering, descriminant analysis, and density estimation.Journal of the American Statistical Association, 97:611–631.

Hartigan, J. A. (1990).

Partition models.Communications in Statistics, Theory and Methods, 19:2745–2756.

Jain, S. and Neal, R. M. (2004).

A split-merge markov chain monte carlo procedure for the dirichlet process mixture model.Journal of Computational and Graphical Statistics, 13(1):158–182.

MacEachern, S. N. (1994).

Estimating normal means with a conjugate style dirichlet process prior.Communications in Statistics B, 23:727–741.

Spellman, P. T., Sherlock, G., Zhang, M. Q. Iyer, V. R., Anders, K., Eisen, M. B., Brown, P. O., Botstein,

D., and Futcher, B. (1998).Comprehensive identification of cell cycle-regulated genes of the yeast saccharomyces cerevisiae bymicroarray hybridization.Molecular Biology of the Cell, 9:3273–3297.

Wang, L. and Dunson, D. B. (2010).

Fast bayesian inference in dirichlet process mixture models.Journal of Computational and Graphical Statistics, In Press.

Ward, J. H. (1963).

Hierarchical grouping to optimize an objective function.Journal of the American Statistical Association, 58(301):236–244.