©Wil van der Aalst & TU/e (use only with permission & acknowledgements)
Process Mining: Data Science in Action
Alpha Algorithm: Limitations
prof.dr.ir. Wil van der Aalst www.processmining.org
Let L be an event log over T. α(L) is defined as follows. 1. TL = { t ∈ T | ∃σ ∈ L t ∈ σ}, 2. TI = { t ∈ T | ∃σ ∈ L t = first(σ) }, 3. TO = { t ∈ T | ∃σ ∈ L t = last(σ) }, 4. XL = { (A,B) | A ⊆ TL ∧ A ≠ ø ∧ B ⊆ TL ∧ B
≠ ø ∧ ∀a ∈ A∀b ∈ B a →L b ∧ ∀a1,a2 ∈ A a1#L a2 ∧ ∀b1,b2 ∈ B b1#L b2 },
5. YL = { (A,B) ∈ XL | ∀(A′,B′) ∈ XL A ⊆ A′ ∧B ⊆ B′⇒ (A,B) = (A′,B′) },
6. PL = { p(A,B) | (A,B) ∈ YL } ∪{iL,oL}, 7. FL = { (a,p(A,B)) | (A,B) ∈ YL ∧ a ∈ A } ∪ {
(p(A,B),b) | (A,B) ∈ YL ∧ b ∈ B } ∪{ (iL,t) | t ∈ TI} ∪{ (t,oL) | t ∈ TO}, and
8. α(L) = (PL,TL,FL).
©Wil van der Aalst & TU/e (use only with permission & acknowledgements)
©Wil van der Aalst & TU/e (use only with permission & acknowledgements)
Alpha algorithm
start register request
examine thoroughly
examine casually
check ticket
decide
pay compensation
reject request
reinitiate request
end
c1
c2
c3
c4
c5
©Wil van der Aalst & TU/e (use only with permission & acknowledgements)
Limitation of the α algorithm: Implicit places
g
a
c
d
e
fb
p1
p2
p1 and p2 are implicit places!
PAGE 4
©Wil van der Aalst & TU/e (use only with permission & acknowledgements)
Limitation of the α algorithm: Loops of length 1
a c
b
a c
b
PAGE 5
a>b a>c b>b b>c
a→b a→c b→c
b||b a#a c#c
desired model:
©Wil van der Aalst & TU/e (use only with permission & acknowledgements)
Limitation of the α algorithm: Loops of length 2
a b d
c
a
b
d
cPAGE 6
a>b b>c b>d c>b
a→b b→d
b||c
desired model:
©Wil van der Aalst & TU/e (use only with permission & acknowledgements)
Limitation of the α algorithm: Non-local dependencies
PAGE 7
b
c
a
e
d??
??
©Wil van der Aalst & TU/e (use only with permission & acknowledgements)
Limitation of the α algorithm: Non-local dependencies
b
c
a
e
dp1
p2
p1 and p2 are not discovered! PAGE 8
©Wil van der Aalst & TU/e (use only with permission & acknowledgements)
Two event logs: Same discovered model
PAGE 9
b
c
a
e
d
©Wil van der Aalst & TU/e (use only with permission & acknowledgements)
Difficult constructs for the Alpha algorithm
a
b
c
PAGE 10
©Wil van der Aalst & TU/e (use only with permission & acknowledgements)
Question
PAGE 11
Consider the event log:
What model will the Alpha algorithm create?
Give a sound WF-net that can produce the observed behavior and nothing more?
©Wil van der Aalst & TU/e (use only with permission & acknowledgements)
Answer (1/2): Model generated by Alpha algorithm
b
c
a
e
d
Model generated by Alpha algorithm also allows for trace starting with b and ending with d!
©Wil van der Aalst & TU/e (use only with permission & acknowledgements)
Answer (2/2): A sound WF-net that can produce the observed behavior and nothing more
b
c
a
e
d
e
Note the duplicated e transition! The Alpha algorithm will never create a WF-net with two transitions having the same label.
©Wil van der Aalst & TU/e (use only with permission & acknowledgements)
Limitation of the α algorithm: representational bias
a a
start endp
There is no WF-net with unique visible labels that exhibits this behavior.
©Wil van der Aalst & TU/e (use only with permission & acknowledgements)
Another example
There is no WF-net with unique visible labels that exhibits this behavior.
a b
start endp1
c
τ
p1
a b
start endp1
c
p1
a
a b
start endp1
c
p1
©Wil van der Aalst & TU/e (use only with permission & acknowledgements)
OR-split/join model
• Let us take an event log containing all possible full firing sequences and apply the Alpha algorithm.
• What will happen?
a
b
start end
d
c
b
c
©Wil van der Aalst & TU/e (use only with permission & acknowledgements)
Applying the Alpha algorithm using ProM
???
©Wil van der Aalst & TU/e (use only with permission & acknowledgements)
Region-based miner (with label splitting)
©Wil van der Aalst & TU/e (use only with permission & acknowledgements)
Limitation of the α algorithm: resulting model does not need to be a sound WF-net
b
a
c
d
e
f
The discovered model is not sound (has deadlock).
©Wil van der Aalst & TU/e (use only with permission & acknowledgements)
Challenge: Noise and Incompleteness • To discover a suitable process model it is assumed that
the event log contains a representative sample of behavior.
• Two related phenomena: − Noise: the event log contains rare and infrequent
behavior not representative for the typical behavior of the process.
− Incompleteness: the event log contains too few events to be able to discover some of the underlying control-flow structures.
©Wil van der Aalst & TU/e (use only with permission & acknowledgements)
Flower model
g
ac
d
ef
b
start end
h
©Wil van der Aalst & TU/e (use only with permission & acknowledgements)
What is the best model?
⟨a,c,d⟩99
⟨b,c,e⟩85
a d
c
eb
a d
c
eb
©Wil van der Aalst & TU/e (use only with permission & acknowledgements)
What is the best model? a d
c
eb
a d
c
eb
⟨a,c,d⟩99
⟨a,c,e⟩50
⟨b,c,e⟩85
⟨b,c,d⟩48
©Wil van der Aalst & TU/e (use only with permission & acknowledgements)
What is the best model? a d
c
eb
a d
c
eb
⟨a,c,d⟩99
⟨a,c,e⟩1
⟨b,c,e⟩85
⟨b,c,d⟩2
©Wil van der Aalst & TU/e (use only with permission & acknowledgements)
Alpha algorithm cannot deal with noise (rare behavior, i.e., outliers)
©Wil van der Aalst & TU/e (use only with permission & acknowledgements)
Process models are like maps: we may not want to see all paths and only see the highways
©Wil van der Aalst & TU/e (use only with permission & acknowledgements)
Related to noise: Completeness
PAGE 27
start register request
examine thoroughly
examine casually
check ticket
decide
pay compensation
reject request
reinitiate request
end
c1
c2
c3
c4
c5
Infinitely many possible traces, 7 possible states
a
start
b1
c
...
b2
b3
bk
end
k number of states: 2k+2
number of different traces: k!
1 4 1 2 6 2 5 34 120
10 1026 3628800 20 1048578 2.432902e+18
©Wil van der Aalst & TU/e (use only with permission & acknowledgements)
Alpha algorithm depends on the directly follows relation
a
start
b1
c
...
b2
b3
bk
end
k number of states: 2k+2
number of different traces: k!
1 4 1 2 6 2 5 34 120
10 1026 3628800 20 1048578 2.432902e+18
Only k(k-1) observations are needed to discover the concurrent part. However, if one of these is missing, the result will be incorrect.
cannot assume completeness
©Wil van der Aalst & TU/e (use only with permission & acknowledgements)
Limitations (1/2)
• Implicit places (places that are redundant): harmless and be solved through preprocessing.
• Loops of length 1: can be solved in multiple ways (change of algorithm or pre/post-processing).
• Loops of length 2: idem. • Non-local dependencies: foundational problem,
not specific for Alpha algorithm.
©Wil van der Aalst & TU/e (use only with permission & acknowledgements)
Limitations (2/2)
• Representational bias (cannot discover transitions with duplicate or invisible labels): other algorithms may have a different bias.
• Discovered model does not need to be sound: some algorithms ensure this.
• Noise: foundational problem, not specific for Alpha algorithm.
• Incompleteness: also a foundational problem.
©Wil van der Aalst & TU/e (use only with permission & acknowledgements)
How to measure the quality of a discovered model?
• There may be conflicting requirements (simplicity versus accuracy).
• Confusion matrix and F1-score have the problem that we do not have negative examples.
• Topics will be discussed later. • For the moment, we only mention the
rediscovery problem as a quality criterion.
©Wil van der Aalst & TU/e (use only with permission & acknowledgements)
Rediscovering process models
The rediscovery problem: Is the discovered model N’ "equivalent" to the original model N?
discoveredprocessmodel
original process model event
log
simulate discover
N=N’ ?N N’
©Wil van der Aalst & TU/e (use only with permission & acknowledgements)
Part I: PreliminariesChapter 2 Process Modeling and Analysis
Chapter 3Data Mining
Part II: From Event Logs to Process ModelsChapter 4 Getting the Data
Chapter 5 Process Discovery: An Introduction
Chapter 6 Advanced Process Discovery Techniques
Part III: Beyond Process DiscoveryChapter 7 Conformance Checking
Chapter 8 Mining Additional Perspectives
Chapter 9 Operational Support
Part IV: Putting Process Mining to WorkChapter 10 Tool Support
Chapter 11 Analyzing “Lasagna Processes”
Chapter 12 Analyzing “Spaghetti Processes”
Part V: ReflectionChapter 13Cartography and Navigation
Chapter 14Epilogue
Chapter 1 Introduction
©Wil van der Aalst & TU/e (use only with permission & acknowledgements)
Top Related