Database Implementation of a Model-Free Classifier

62
Database Implementation of a Model-Free Classifier Konstantinos Morfonios ADBIS 2007 University of Athens

description

University of Athens. ADBIS 2007. Database Implementation of a Model-Free Classifier. Konstantinos Morfonios. Introduction. Motivation. LOCUS. Parallel Execution. Experimental Evaluation. Conclusions & Future Work. Introduction. Motivation. LOCUS. Parallel Execution. - PowerPoint PPT Presentation

Transcript of Database Implementation of a Model-Free Classifier

Page 1: Database Implementation of a Model-Free Classifier

Database Implementation of a Model-Free Classifier

Konstantinos Morfonios

ADBIS 2007

University of Athens

Page 2: Database Implementation of a Model-Free Classifier

Introduction

LOCUS

Parallel Execution

Experimental Evaluation

Conclusions & Future Work

Motivation

Page 3: Database Implementation of a Model-Free Classifier

Introduction

LOCUS

Parallel Execution

Experimental Evaluation

Conclusions & Future Work

Motivation

Page 4: Database Implementation of a Model-Free Classifier

Introduction

ω1 = ω2 =

Classification

x = <x1, x2, …, xD> ω = f(x)

Page 5: Database Implementation of a Model-Free Classifier

<x1,1, x1,2, …, x1,D, ω1><x2,1, x2,2, …, x2,D, ω2><x3,1, x3,2, …, x3,D, ω1><x4,1, x4,2, …, x4,D, ω1>

.

.

.“Lazy”“Eager”

Introduction

x1 = <x1, x2, …, xD>x2 = <x1, x2, …, xD>

(+) Faster decisions( - ) Large/complex datasets( - ) Dynamic datasets( - ) Dynamic models

(Nearest Neighbors)(Decision Trees)

Page 6: Database Implementation of a Model-Free Classifier

Introduction

LOCUS

Parallel Execution

Experimental Evaluation

Conclusions & Future Work

Motivation

Page 7: Database Implementation of a Model-Free Classifier

Introduction

LOCUS

Parallel Execution

Experimental Evaluation

Conclusions & Future Work

Motivation

Page 8: Database Implementation of a Model-Free Classifier

Motivation

Large/complex datasets

Page 9: Database Implementation of a Model-Free Classifier

Motivation

Page 10: Database Implementation of a Model-Free Classifier

Motivation

Large/complex datasets Dynamic datasets

Page 11: Database Implementation of a Model-Free Classifier

Motivation

Page 12: Database Implementation of a Model-Free Classifier

Motivation

Large/complex datasets Dynamic datasets Dynamic models

Page 13: Database Implementation of a Model-Free Classifier

Motivation

Page 14: Database Implementation of a Model-Free Classifier

Motivation

Large/complex datasets Dynamic datasets Dynamic models

Lazy (model-free)

Page 15: Database Implementation of a Model-Free Classifier

Motivation

Large/complex datasets Dynamic datasets Dynamic models

Lazy (model-free)

Nearest Neighbors

Disk-based

Page 16: Database Implementation of a Model-Free Classifier

Motivation

Nearest Neighbors

Suffers from “curse of dimensionality”• Not reliable [Beyer et al., ICDT 1999]• Not indexable [Shaft et al., ICDT 2005]

LOCUS(Lazy Optimal Classifier of Unlimited Scalability)

Page 17: Database Implementation of a Model-Free Classifier

Motivation

• Category?

LOCUS(Lazy Optimal Classifier of Unlimited Scalability)

Page 18: Database Implementation of a Model-Free Classifier

Motivation

• Lazy

LOCUS(Lazy Optimal Classifier of Unlimited Scalability)

Page 19: Database Implementation of a Model-Free Classifier

Motivation

• Lazy

LOCUS(Lazy Optimal Classifier of Unlimited Scalability)

• Scaling?

Page 20: Database Implementation of a Model-Free Classifier

Motivation

• Lazy

LOCUS(Lazy Optimal Classifier of Unlimited Scalability)

• Based on simple SQL queries

Page 21: Database Implementation of a Model-Free Classifier

Motivation

• Lazy

LOCUS(Lazy Optimal Classifier of Unlimited Scalability)

• Based on simple SQL queries• Accuracy?

Page 22: Database Implementation of a Model-Free Classifier

Motivation

• Lazy

LOCUS(Lazy Optimal Classifier of Unlimited Scalability)

• Based on simple SQL queries• Converges to optimal Bayes Classifier

Page 23: Database Implementation of a Model-Free Classifier

Motivation

• Lazy

LOCUS(Lazy Optimal Classifier of Unlimited Scalability)

• Based on simple SQL queries• Converges to optimal Bayes Classifier• Other features?

Page 24: Database Implementation of a Model-Free Classifier

Motivation

• Lazy

LOCUS(Lazy Optimal Classifier of Unlimited Scalability)

• Based on simple SQL queries• Converges to optimal Bayes Classifier• Parallelizable

Page 25: Database Implementation of a Model-Free Classifier

Introduction

LOCUS

Parallel Execution

Experimental Evaluation

Conclusions & Future Work

Motivation

Page 26: Database Implementation of a Model-Free Classifier

Introduction

LOCUS

Parallel Execution

Experimental Evaluation

Conclusions & Future Work

Motivation

Page 27: Database Implementation of a Model-Free Classifier

LOCUS

x = <f1, f2>

ω2 =

ω1 =

(f1 [0, 20], f2 [0, 10])

f2

f1

Example

Page 28: Database Implementation of a Model-Free Classifier

LOCUSf2

f1

Ideally: Dense space

Page 29: Database Implementation of a Model-Free Classifier

LOCUS

0

1

2

3

4

5

6

7

8

9

10

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

f2

f1

ω(<7, 4>) = ?Ideally: Dense space

Page 30: Database Implementation of a Model-Free Classifier

LOCUS

0

1

2

3

4

5

6

7

8

9

10

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

f2

f1

ω(<7, 4>) =

Page 31: Database Implementation of a Model-Free Classifier

LOCUS

0

1

2

3

4

5

6

7

8

9

10

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

f2

f1

Reality:• Many features• Large domains Sparse space

Page 32: Database Implementation of a Model-Free Classifier

Reality:• Many features• Large domains Sparse space

0

1

2

3

4

5

6

7

8

9

10

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

LOCUSf2

f1

ω(<7, 4>) = ?

?

Page 33: Database Implementation of a Model-Free Classifier

0

1

2

3

4

5

6

7

8

9

10

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

LOCUSf2

f1

ω(<7, 4>) = ?ω1: 2ω2: 1

3-NN

Page 34: Database Implementation of a Model-Free Classifier

0

1

2

3

4

5

6

7

8

9

10

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

LOCUSf2

f1

ω(<7, 4>) = ω1: 2ω2: 1

3-NN

Page 35: Database Implementation of a Model-Free Classifier

0

1

2

3

4

5

6

7

8

9

10

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

LOCUSf2

f1

ω(<7, 4>) = ?

LOCUS

Page 36: Database Implementation of a Model-Free Classifier

0

1

2

3

4

5

6

7

8

9

10

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

LOCUSf2

f1

ω(<7, 4>) = ?ω1: 7ω2: 3

LOCUS

Page 37: Database Implementation of a Model-Free Classifier

0

1

2

3

4

5

6

7

8

9

10

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

LOCUSf2

f1

ω(<7, 4>) = ω1: 7ω2: 3

LOCUS

Page 38: Database Implementation of a Model-Free Classifier

0

1

2

3

4

5

6

7

8

9

10

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

LOCUSf2

f1

Disk-based implementation

LOCUS

Page 39: Database Implementation of a Model-Free Classifier

0

1

2

3

4

5

6

7

8

9

10

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

LOCUS

2δ1

2δ2

SELECT ω, count(*)FROM RWHERE f1≥x1-δ1 AND f1≤x1+δ1 AND f2≥x2-δ2 AND f2≤x2+δ2

GROUP BY ωR(f1, f2, ω)

<x1, x2>

ω1: 7ω2: 3

ω(<7, 4>) =

Page 40: Database Implementation of a Model-Free Classifier

LOCUS SELECT ω, count(*)FROM RWHERE f1≥x1-δ1 AND f1≤x1+δ1 AND f2≥x2-δ2 AND f2≤x2+δ2

GROUP BY ωR(f1, f2, ω)

What if R is large?

Classical optimization techniques for a well-known type of aggregate queries

• Indexing

• Presorting• Materialized views

Page 41: Database Implementation of a Model-Free Classifier

LOCUS SELECT ω, count(*)FROM RWHERE f1≥x1-δ1 AND f1≤x1+δ1 AND f2≥x2-δ2 AND f2≤x2+δ2

GROUP BY ωR(f1, f2, ω)

Method reliability?

LOCUS converges to the optimal Bayes classifier as the size of the dataset increases (proof in the paper)

Page 42: Database Implementation of a Model-Free Classifier

LOCUS SELECT ω, count(*)FROM RWHERE f1≥x1-δ1 AND f1≤x1+δ1 AND f2≥x2-δ2 AND f2≤x2+δ2

GROUP BY ωR(f1, f2, ω)

What if a feature, say f2, is categorical? (e.g. sex)

Page 43: Database Implementation of a Model-Free Classifier

LOCUS SELECT ω, count(*)FROM RWHERE f1≥x1-δ1 AND f1≤x1+δ1 AND f2=x2

GROUP BY ωR(f1, f2, ω)

Not a problem, since generally in practice:

• Combinations of categorical and numeric features• Categorical features have small domains

Hence, they do not contribute to sparsity

What if a feature, say f2, is categorical? (e.g. sex)

Page 44: Database Implementation of a Model-Free Classifier

Introduction

LOCUS

Parallel Execution

Experimental Evaluation

Conclusions & Future Work

Motivation

Page 45: Database Implementation of a Model-Free Classifier

Introduction

LOCUS

Parallel Execution

Experimental Evaluation

Conclusions & Future Work

Motivation

Page 46: Database Implementation of a Model-Free Classifier

SELECT SELECT

SELECT

SELECT

Parallel ExecutionR1

R2

R3

R4

R = R1 R2 R3 R4

Page 47: Database Implementation of a Model-Free Classifier

Parallel Execution

ω1: 5ω2: 2

ω1: 7ω2: 1

ω1: 5ω2: 1

ω1: 6ω2: 0

R1

R2

R3

R4

Count: distributive function

ω1: 23ω2: 4

52

123

183

234

Page 48: Database Implementation of a Model-Free Classifier

ω1: 7ω2: 1

ω1: 5ω2: 1

ω1: 6ω2: 0

ω1: 5ω2: 2

Parallel Execution

Small network traffic Load balancing Lightweight operations on the main serverSELECT SELECT

SELECT

SELECT

R1

R2

R3R4

ω1: 7ω2: 1

ω1: 5ω2: 1

ω1: 6ω2: 0

ω1: 5ω2: 2

52

123

183

234

Page 49: Database Implementation of a Model-Free Classifier

Introduction

LOCUS

Parallel Execution

Experimental Evaluation

Conclusions & Future Work

Motivation

Page 50: Database Implementation of a Model-Free Classifier

Introduction

LOCUS

Parallel Execution

Experimental Evaluation

Conclusions & Future Work

Motivation

Page 51: Database Implementation of a Model-Free Classifier

Experimental Evaluation

LOCUS vs DTs and NNs (weka) Synthetic datasets

Ten functions [Agrawal et al., IEEE TKDE 1993]D = 9N [5103, 5106]

Real-world datasetsUCI Repository

Page 52: Database Implementation of a Model-Free Classifier

Experimental Evaluation

0

2

4

6

8

10

12

1 2 3 4 5 6 7 8 9 10

Function

Err

or

rate

(%

)

LOCUS DT

Classification error rate (synthetic datasets, N = 5104)

Page 53: Database Implementation of a Model-Free Classifier

Experimental Evaluation

0

1

2

3

4

5

6

1 2 3 4 5 6 7 8 9 10

Function

Err

or

rate

(%

)

5000 50000 500000 5000000

Effect of dataset size on classification error rate of LOCUS (synthetic datasets, N [5103, 5106])

Page 54: Database Implementation of a Model-Free Classifier

Experimental Evaluation

0

100

200

300

400

500

600

1.E+03 1.E+04 1.E+05 1.E+06 1.E+07

Training Set Size

Ave

rag

e D

ecis

ion

Tim

e (m

sec)

Effect of dataset size on time scalability of LOCUS (synthetic datasets, N [5103, 5106])

Page 55: Database Implementation of a Model-Free Classifier

Experimental Evaluation

0

5

10

15

20

25

30

35

40

Patien

t

Glas

sLi

ver

Breas

tCan

cer

Diabe

tes

Lette

rs

CovTy

pe 5

0000

Err

or

rate

(%

)

LOCUS DT

Classification error rate (real-world datasets)

Page 56: Database Implementation of a Model-Free Classifier

0

5

10

15

20

25

30

35

DT LOCUS

Err

or

rate

(%

)

5000 50000 500000

Experimental EvaluationEffect of dataset size on classification error rate

(dataset CovType, N [5103, 5105])

Page 57: Database Implementation of a Model-Free Classifier

Introduction

LOCUS

Parallel Execution

Experimental Evaluation

Conclusions & Future Work

Motivation

Page 58: Database Implementation of a Model-Free Classifier

Introduction

LOCUS

Parallel Execution

Experimental Evaluation

Conclusions & Future Work

Motivation

Page 59: Database Implementation of a Model-Free Classifier

Conclusions & Future Work

LOCUSLazy (complex/dynamic datasets and models)Efficient (based on simple SQL queries)Reliable (converging to optimal)Parallelizable

Page 60: Database Implementation of a Model-Free Classifier

Conclusions & Future Work

Similar techniques for feature selectionregression

Implementation of a parallel version

Page 61: Database Implementation of a Model-Free Classifier

Questions?

Page 62: Database Implementation of a Model-Free Classifier

Thank you!