Alpha Algorithm: Limitations - courses.edsa-project.eu

35
©Wil van der Aalst & TU/e (use only with permission & acknowledgements) Process Mining: Data Science in Action Alpha Algorithm: Limitations prof.dr.ir. Wil van der Aalst www.processmining.org

Transcript of Alpha Algorithm: Limitations - courses.edsa-project.eu

Page 1: Alpha Algorithm: Limitations - courses.edsa-project.eu

©Wil van der Aalst & TU/e (use only with permission & acknowledgements)

Process Mining: Data Science in Action

Alpha Algorithm: Limitations

prof.dr.ir. Wil van der Aalst www.processmining.org

Page 2: Alpha Algorithm: Limitations - courses.edsa-project.eu

Let L be an event log over T. α(L) is defined as follows. 1. TL = { t ∈ T | ∃σ ∈ L t ∈ σ}, 2. TI = { t ∈ T | ∃σ ∈ L t = first(σ) }, 3. TO = { t ∈ T | ∃σ ∈ L t = last(σ) }, 4. XL = { (A,B) | A ⊆ TL ∧ A ≠ ø ∧ B ⊆ TL ∧ B

≠ ø ∧ ∀a ∈ A∀b ∈ B a →L b ∧ ∀a1,a2 ∈ A a1#L a2 ∧ ∀b1,b2 ∈ B b1#L b2 },

5. YL = { (A,B) ∈ XL | ∀(A′,B′) ∈ XL A ⊆ A′ ∧B ⊆ B′⇒ (A,B) = (A′,B′) },

6. PL = { p(A,B) | (A,B) ∈ YL } ∪{iL,oL}, 7. FL = { (a,p(A,B)) | (A,B) ∈ YL ∧ a ∈ A } ∪ {

(p(A,B),b) | (A,B) ∈ YL ∧ b ∈ B } ∪{ (iL,t) | t ∈ TI} ∪{ (t,oL) | t ∈ TO}, and

8. α(L) = (PL,TL,FL).

Page 3: Alpha Algorithm: Limitations - courses.edsa-project.eu

©Wil van der Aalst & TU/e (use only with permission & acknowledgements)

Page 4: Alpha Algorithm: Limitations - courses.edsa-project.eu

©Wil van der Aalst & TU/e (use only with permission & acknowledgements)

Alpha algorithm

start register request

examine thoroughly

examine casually

check ticket

decide

pay compensation

reject request

reinitiate request

end

c1

c2

c3

c4

c5

Page 5: Alpha Algorithm: Limitations - courses.edsa-project.eu

©Wil van der Aalst & TU/e (use only with permission & acknowledgements)

Limitation of the α algorithm: Implicit places

g

a

c

d

e

fb

p1

p2

p1 and p2 are implicit places!

PAGE 4

Page 6: Alpha Algorithm: Limitations - courses.edsa-project.eu

©Wil van der Aalst & TU/e (use only with permission & acknowledgements)

Limitation of the α algorithm: Loops of length 1

a c

b

a c

b

PAGE 5

a>b a>c b>b b>c

a→b a→c b→c

b||b a#a c#c

desired model:

Page 7: Alpha Algorithm: Limitations - courses.edsa-project.eu

©Wil van der Aalst & TU/e (use only with permission & acknowledgements)

Limitation of the α algorithm: Loops of length 2

a b d

c

a

b

d

cPAGE 6

a>b b>c b>d c>b

a→b b→d

b||c

desired model:

Page 8: Alpha Algorithm: Limitations - courses.edsa-project.eu

©Wil van der Aalst & TU/e (use only with permission & acknowledgements)

Limitation of the α algorithm: Non-local dependencies

PAGE 7

b

c

a

e

d??

??

Page 9: Alpha Algorithm: Limitations - courses.edsa-project.eu

©Wil van der Aalst & TU/e (use only with permission & acknowledgements)

Limitation of the α algorithm: Non-local dependencies

b

c

a

e

dp1

p2

p1 and p2 are not discovered! PAGE 8

Page 10: Alpha Algorithm: Limitations - courses.edsa-project.eu

©Wil van der Aalst & TU/e (use only with permission & acknowledgements)

Two event logs: Same discovered model

PAGE 9

b

c

a

e

d

Page 11: Alpha Algorithm: Limitations - courses.edsa-project.eu

©Wil van der Aalst & TU/e (use only with permission & acknowledgements)

Difficult constructs for the Alpha algorithm

a

b

c

PAGE 10

Page 12: Alpha Algorithm: Limitations - courses.edsa-project.eu

©Wil van der Aalst & TU/e (use only with permission & acknowledgements)

Question

PAGE 11

Consider the event log:

What model will the Alpha algorithm create?

Give a sound WF-net that can produce the observed behavior and nothing more?

Page 13: Alpha Algorithm: Limitations - courses.edsa-project.eu

©Wil van der Aalst & TU/e (use only with permission & acknowledgements)

Answer (1/2): Model generated by Alpha algorithm

b

c

a

e

d

Model generated by Alpha algorithm also allows for trace starting with b and ending with d!

Page 14: Alpha Algorithm: Limitations - courses.edsa-project.eu

©Wil van der Aalst & TU/e (use only with permission & acknowledgements)

Answer (2/2): A sound WF-net that can produce the observed behavior and nothing more

b

c

a

e

d

e

Note the duplicated e transition! The Alpha algorithm will never create a WF-net with two transitions having the same label.

Page 15: Alpha Algorithm: Limitations - courses.edsa-project.eu

©Wil van der Aalst & TU/e (use only with permission & acknowledgements)

Limitation of the α algorithm: representational bias

a a

start endp

There is no WF-net with unique visible labels that exhibits this behavior.

Page 16: Alpha Algorithm: Limitations - courses.edsa-project.eu

©Wil van der Aalst & TU/e (use only with permission & acknowledgements)

Another example

There is no WF-net with unique visible labels that exhibits this behavior.

a b

start endp1

c

τ

p1

a b

start endp1

c

p1

a

a b

start endp1

c

p1

Page 17: Alpha Algorithm: Limitations - courses.edsa-project.eu

©Wil van der Aalst & TU/e (use only with permission & acknowledgements)

OR-split/join model

• Let us take an event log containing all possible full firing sequences and apply the Alpha algorithm.

• What will happen?

a

b

start end

d

c

b

c

Page 18: Alpha Algorithm: Limitations - courses.edsa-project.eu

©Wil van der Aalst & TU/e (use only with permission & acknowledgements)

Applying the Alpha algorithm using ProM

???

Page 19: Alpha Algorithm: Limitations - courses.edsa-project.eu

©Wil van der Aalst & TU/e (use only with permission & acknowledgements)

Region-based miner (with label splitting)

Page 20: Alpha Algorithm: Limitations - courses.edsa-project.eu

©Wil van der Aalst & TU/e (use only with permission & acknowledgements)

Limitation of the α algorithm: resulting model does not need to be a sound WF-net

b

a

c

d

e

f

The discovered model is not sound (has deadlock).

Page 21: Alpha Algorithm: Limitations - courses.edsa-project.eu

©Wil van der Aalst & TU/e (use only with permission & acknowledgements)

Challenge: Noise and Incompleteness • To discover a suitable process model it is assumed that

the event log contains a representative sample of behavior.

• Two related phenomena: − Noise: the event log contains rare and infrequent

behavior not representative for the typical behavior of the process.

− Incompleteness: the event log contains too few events to be able to discover some of the underlying control-flow structures.

Page 22: Alpha Algorithm: Limitations - courses.edsa-project.eu

©Wil van der Aalst & TU/e (use only with permission & acknowledgements)

Flower model

g

ac

d

ef

b

start end

h

Page 23: Alpha Algorithm: Limitations - courses.edsa-project.eu

©Wil van der Aalst & TU/e (use only with permission & acknowledgements)

What is the best model?

⟨a,c,d⟩99

⟨b,c,e⟩85

a d

c

eb

a d

c

eb

Page 24: Alpha Algorithm: Limitations - courses.edsa-project.eu

©Wil van der Aalst & TU/e (use only with permission & acknowledgements)

What is the best model? a d

c

eb

a d

c

eb

⟨a,c,d⟩99

⟨a,c,e⟩50

⟨b,c,e⟩85

⟨b,c,d⟩48

Page 25: Alpha Algorithm: Limitations - courses.edsa-project.eu

©Wil van der Aalst & TU/e (use only with permission & acknowledgements)

What is the best model? a d

c

eb

a d

c

eb

⟨a,c,d⟩99

⟨a,c,e⟩1

⟨b,c,e⟩85

⟨b,c,d⟩2

Page 26: Alpha Algorithm: Limitations - courses.edsa-project.eu

©Wil van der Aalst & TU/e (use only with permission & acknowledgements)

Alpha algorithm cannot deal with noise (rare behavior, i.e., outliers)

Page 27: Alpha Algorithm: Limitations - courses.edsa-project.eu

©Wil van der Aalst & TU/e (use only with permission & acknowledgements)

Process models are like maps: we may not want to see all paths and only see the highways

Page 28: Alpha Algorithm: Limitations - courses.edsa-project.eu

©Wil van der Aalst & TU/e (use only with permission & acknowledgements)

Related to noise: Completeness

PAGE 27

start register request

examine thoroughly

examine casually

check ticket

decide

pay compensation

reject request

reinitiate request

end

c1

c2

c3

c4

c5

Infinitely many possible traces, 7 possible states

a

start

b1

c

...

b2

b3

bk

end

k number of states: 2k+2

number of different traces: k!

1 4 1 2 6 2 5 34 120

10 1026 3628800 20 1048578 2.432902e+18

Page 29: Alpha Algorithm: Limitations - courses.edsa-project.eu

©Wil van der Aalst & TU/e (use only with permission & acknowledgements)

Alpha algorithm depends on the directly follows relation

a

start

b1

c

...

b2

b3

bk

end

k number of states: 2k+2

number of different traces: k!

1 4 1 2 6 2 5 34 120

10 1026 3628800 20 1048578 2.432902e+18

Only k(k-1) observations are needed to discover the concurrent part. However, if one of these is missing, the result will be incorrect.

Page 30: Alpha Algorithm: Limitations - courses.edsa-project.eu

cannot assume completeness

Page 31: Alpha Algorithm: Limitations - courses.edsa-project.eu

©Wil van der Aalst & TU/e (use only with permission & acknowledgements)

Limitations (1/2)

• Implicit places (places that are redundant): harmless and be solved through preprocessing.

• Loops of length 1: can be solved in multiple ways (change of algorithm or pre/post-processing).

• Loops of length 2: idem. • Non-local dependencies: foundational problem,

not specific for Alpha algorithm.

Page 32: Alpha Algorithm: Limitations - courses.edsa-project.eu

©Wil van der Aalst & TU/e (use only with permission & acknowledgements)

Limitations (2/2)

• Representational bias (cannot discover transitions with duplicate or invisible labels): other algorithms may have a different bias.

• Discovered model does not need to be sound: some algorithms ensure this.

• Noise: foundational problem, not specific for Alpha algorithm.

• Incompleteness: also a foundational problem.

Page 33: Alpha Algorithm: Limitations - courses.edsa-project.eu

©Wil van der Aalst & TU/e (use only with permission & acknowledgements)

How to measure the quality of a discovered model?

• There may be conflicting requirements (simplicity versus accuracy).

• Confusion matrix and F1-score have the problem that we do not have negative examples.

• Topics will be discussed later. • For the moment, we only mention the

rediscovery problem as a quality criterion.

Page 34: Alpha Algorithm: Limitations - courses.edsa-project.eu

©Wil van der Aalst & TU/e (use only with permission & acknowledgements)

Rediscovering process models

The rediscovery problem: Is the discovered model N’ "equivalent" to the original model N?

discoveredprocessmodel

original process model event

log

simulate discover

N=N’ ?N N’

Page 35: Alpha Algorithm: Limitations - courses.edsa-project.eu

©Wil van der Aalst & TU/e (use only with permission & acknowledgements)

Part I: PreliminariesChapter 2 Process Modeling and Analysis

Chapter 3Data Mining

Part II: From Event Logs to Process ModelsChapter 4 Getting the Data

Chapter 5 Process Discovery: An Introduction

Chapter 6 Advanced Process Discovery Techniques

Part III: Beyond Process DiscoveryChapter 7 Conformance Checking

Chapter 8 Mining Additional Perspectives

Chapter 9 Operational Support

Part IV: Putting Process Mining to WorkChapter 10 Tool Support

Chapter 11 Analyzing “Lasagna Processes”

Chapter 12 Analyzing “Spaghetti Processes”

Part V: ReflectionChapter 13Cartography and Navigation

Chapter 14Epilogue

Chapter 1 Introduction

©Wil van der Aalst & TU/e (use only with permission & acknowledgements)