Advances in Deep Learning with Applications in Text and ......C. Alippi, G. Boracchi and M. Roveri,...

Post on 16-Jul-2020

6 views 0 download

Transcript of Advances in Deep Learning with Applications in Text and ......C. Alippi, G. Boracchi and M. Roveri,...

• 𝐾𝑡

• 𝐾𝑡

• 𝜙𝑡(𝒙, 𝑦)

𝑆𝑅𝑊

𝑥

𝑡

𝑥

𝑆𝑅𝑊

𝑆𝑅𝑊

ob

se

rva

tio

ns

-5

0

5

10 class ωclass ωT*

Classification error as a function of timeC

lassific

ation

Err

or

(%)

1000 2000 3000 4000 5000 6000 7000 8000 9000

27

28

29

30

31

32

33

34

35

T

JIT classifierContinuous Update ClassifierSliding Window ClassifierBayes error

Dataset

1

2

a)

b)

1000 2000 3000 4000 5000 6000 7000 8000 9000 T

𝑅𝑊

𝑆

• 𝑅𝑊

• 𝑆

• 𝑆 𝑅𝑊

• 𝑅𝑊 𝑆

• 𝑅𝑊 𝑆

• 𝑆 𝑅𝑊

• (𝑆)

• (𝑅𝑤)

• 𝑆

• 𝑅𝑤

• 𝑹𝒘 𝑆𝑤

𝑆 𝑅𝑤

𝜖𝑡 𝑤

𝑆

𝜃𝑤 𝑅𝑊

𝑆

𝑅𝑊 𝑆

𝑡 −𝑤

𝐶𝑖 = (𝑍𝑖 , 𝐹𝑖 , 𝐷𝑖)

𝑍𝑖 = 𝒙𝟎, 𝑦0 , … , 𝒙𝒏, 𝑦𝑛 :

𝑖th

𝐹𝑖 𝑝(𝒙) 𝑖th

• 𝑀 ⋅

• 𝑉(⋅)

𝐷𝑖• 𝑀 ⋅

• 𝑉(⋅)

• 𝑝𝑡(⋅)

𝐶 = (𝑍, 𝐹, 𝐷)

• 𝑍

• 𝐹

• 𝐷

𝐶0

𝑡

𝐶0

𝑇𝑅

𝐶0• 𝜙(𝒙) 𝜙 𝑦|𝒙

𝐶 = (𝑍, 𝐹, 𝐷)

• 𝑍

• 𝐹

• 𝐷

• 𝒟

• Υ

• ℰ

• 𝒰

𝐹𝑖

𝑍𝑖

𝑡

𝐶0

𝑇𝑅

𝐶0• 𝑍0 𝑝(𝑦|𝒙)

• 𝐹0 𝑝(𝒙)

• 𝐷0

𝒟

𝑫

• 𝒟

• 𝜙 𝑦 𝒙 𝜙(𝒙)

• 𝑇

𝑡𝑇

𝐶0𝒟(𝐶0) = 1

𝒟 𝐶𝑖 ∈ {0,1}

𝐷𝑖𝑫𝒊

𝒳

𝜏Ƹ𝜏 𝑇

𝑡𝑇Ƹ𝜏

Ƹ𝜏

𝑡𝑇Ƹ𝜏

1

𝐶1𝐶0

Υ(𝐶0) = (𝐶0, 𝐶1)

𝐹𝑖

𝑇 𝜏

ቊ𝐻0: "𝐹𝑖 contains i. i. d. samples"𝐻1: "𝐹𝑖 contains a change point"

𝐹𝑖

𝜏𝑇.

𝐶𝑗

• 𝐹 𝜙 𝒙 𝐶𝑚 𝐶𝑛

• 𝐶𝑚 𝐶𝑛 𝜙 𝑦 𝒙

𝑡𝑇

𝐶𝑛𝐶𝑚

ℰ 𝐶𝑚, 𝐶𝑛 = 1

Ƹ𝜏

𝑯𝟎

𝐹0 𝐹1

𝐻0𝐻0

𝐾

• 𝜙𝑡(𝒙, 𝑦)

P. Domingos and G. Hulton, “Mining high-speed data streams” in Proc. of the sixth ACM SIGKDD international conference on

Knowledge discovery and data mining, pp. 71–80, 2000.

G. Hulten, L. Spencer, and P. Domingos, “Mining time-changing data streams” in Proc. of Conference on Knowledge Discovery in

Data, pp. 97–106, 2001.

L. Cohen, G. Avrahami-Bakish, M. Last, A. Kandel, and O. Kipersztok, "Real-time data mining of non-stationary data streams from

sensor networks", Information Fusion, vol. 9, no. 3, pp. 344–353, 2008.

Y. Ye, S. Squartini, and F. Piazza, "Online sequential extreme learning machine in nonstationary environments", Neurocomputing, vol.

116, no. 20, pp. 94–101, 2013

ℋ = ℎ0, … , ℎ𝑁

ℎ𝑖 , 𝑖 = 1,… ,𝑁

ℋ 𝒙𝒕 = argmax𝝎∈𝚲

𝒉𝒊∈𝓗

𝛼𝑖 ℎ𝑖 𝒙𝑡 = 𝜔

ℋ = ℎ0, … , ℎ𝑁

ℎ𝑖 , 𝑖 = 1,… ,𝑁

ℋ 𝒙𝒕 = argmax𝝎∈𝚲

𝒉𝒊∈𝓗

𝛼𝑖 ℎ𝑖 𝒙𝑡 = 𝜔

𝛼𝑖 ℎ𝑖

ℎ𝑖ℎ𝑖

ℎ𝑖

• 𝛼𝑖

ℎ𝑡 ℎ𝑡−1

• ℎ𝑡

• ℎ𝑡−1

W. N. Street and Y. Kim, "A streaming ensemble algorithm (SEA) for large scale classification", in Proceedings to the 7th ACM SIGKDD

International Conference on Knowledge Discovery & Data Mining, pp. 377–382, 2001

𝑆 = 𝒙𝟎𝒕 , 𝑦0

𝑡 , 𝒙𝟏𝒕 , 𝑦1

𝑡 , … , 𝒙𝑩𝒕 , 𝑦𝐵

𝑡

• ℎ𝑡 𝑆

• ℎ𝑡−1 𝑆

• #ℋ < 𝑁 ℎ𝑡−1 ℋ

• ℎ𝑖 ∈ ℋ 𝑆ℎ𝑡−1

ℎ𝑡

W. N. Street and Y. Kim, "A streaming ensemble algorithm (SEA) for large scale classification", in Proceedings to the 7th ACM SIGKDD

International Conference on Knowledge Discovery & Data Mining, pp. 377–382, 2001

W. N. Street and Y. Kim, "A streaming ensemble algorithm (SEA) for large scale classification", in Proceedings to the 7th ACM SIGKDD

International Conference on Knowledge Discovery & Data Mining, pp. 377–382, 2001

Kolter, J. and Maloof, M. "Dynamic weighted majority: An ensemble method for drifting concepts". Journal of Machine Learning

Research 8, 2755–2790. 2007

ℎ𝑖 ℎ𝑘

𝑄𝑖,𝑘 =𝑁11𝑁00 −𝑁01𝑁10

𝑁11𝑁00 + 𝑁01𝑁10

𝑁𝑎,𝑏 = # 𝒙, ℎ𝑖 𝒙 = 𝑎 and ℎ𝑘 𝒙 = 𝑏 0, 1

ℎ𝑖 ℎ𝑘 𝑄𝑖,𝑘 = 1 𝑄𝑖,𝑘

Minku, L. L.; Yao, X. "DDD: A New Ensemble Approach For Dealing With Concept Drift", IEEE Transactions on Knowledge and Data

Engineering, IEEE, v. 24, n. 4, p. 619-633, April 2012,

ℎ𝑖 ℎ𝑘

𝑄𝑖,𝑘 =𝑁11𝑁00 −𝑁01𝑁10

𝑁11𝑁00 + 𝑁01𝑁10

𝑁𝑎,𝑏 = # 𝒙, ℎ𝑖 𝒙 = 𝑎 and ℎ𝑘 𝒙 = 𝑏 0, 1

ℎ𝑖 ℎ𝑘 𝑄𝑖,𝑘 = 1 𝑄𝑖,𝑘

Minku, L. L.; Yao, X. "DDD: A New Ensemble Approach For Dealing With Concept Drift", IEEE Transactions on Knowledge and Data

Engineering, IEEE, v. 24, n. 4, p. 619-633, April 2012,

𝑄𝑖,𝑘

Minku, L. L.; Yao, X. "DDD: A New Ensemble Approach For Dealing With Concept Drift", IEEE Transactions on Knowledge and Data

Engineering, IEEE, v. 24, n. 4, p. 619-633, April 2012,

𝑇0

𝑇0

Initially LabeledData

Receive UnlabeledData

Classify Using SSL Construct aBoundary

Compact the Boundary

Extract CoreSet

Initially LabeledData

Receive UnlabeledData

Classify Using SSL Construct aBoundary

Compact the Boundary

Extract CoreSet

Initially LabeledData

Receive UnlabeledData

Classify Using SSL Construct aBoundary

Compact the Boundary

Extract CoreSet

Initially LabeledData

Receive UnlabeledData

Classify Using SSL Construct aBoundary

Compact the Boundary

Extract CoreSet

Initially LabeledData

Receive UnlabeledData

Classify Using SSL Construct aBoundary

Compact the Boundary

Extract CoreSet

Initially LabeledData

Receive UnlabeledData

Classify Using SSL Construct aBoundary

Compact the Boundary

Extract CoreSet

73

Time

74

𝑡

𝑝𝑒𝑟𝑓(𝑡) =𝑝𝑒𝑟𝑓𝑒𝑥

(𝑡), if t=1

(𝑡 − 1)𝑝𝑒𝑟𝑓(𝑡−1) + 𝑝𝑒𝑟𝑓𝑒𝑥(𝑡)

𝑡, otherwise

75

perf (𝑡) =perf𝑒𝑥

(𝑡), if t = 1

𝜂 ⋅ perf (𝑡−1) + (1 − 𝜂) ⋅ perf𝑒𝑥(𝑡), otherwise

𝑡

• 𝜏

• 𝜏

D. M. Hawkins, P. Qiu, and C. W. Kang, “The changepoint model for statistical process control” Journal of Quality Technology, 2003.𝑡𝑇Ƹ𝜏

C. Alippi, G. Boracchi and M. Roveri, “Just In Time Classifiers for Recurrent Concepts” IEEE Transactions on Neural Networks and

Learning Systems, 2013. vol. 24, no.4, pp. 620 -634