Minimization of Symbolic Automata Presented By: Loris D’Antoni Joint work with: Margus Veanes...

Minimization of Symbolic Automata

Presented By:

Loris D’Antoni

Joint work with:

Margus Veanes

01/24/14, POPL14

2

What is automata minimization?

Deterministic Finite Automaton

3

a

b

q0 q

ab

A = (Q,q0,F,δ,Σ)

4

Automata Minimization

Minimization = find and collapse equivalent states

p

q

s

s

Non final

Final

distinguishable

0

1 2

3 4 6

55

6

a

b a

a,ba,b

b

a

b

a

b

a

b

0 1,3 2,4 65,6a,b a,ba,b a,b

A simple Application: Random Password generation

Given constraints:• Length is k: "^.{5,20}$"• Contains 2 capital letters: "[A-Z].*[A-Z]"• Contains a digit: "\d“Generate random instances with uniform distribution that match all the above conditions.

6

Key idea

^.{5,20}$

[A-Z].*[A-Z]

\d

7

∩

Problems

8

Big automaton Minimization

Big alphabet 216 characters

in UTF16

Symbolic Automata

Symbolic Finite Automaton (SFA)

9

λx. x mod 2=0

λx. x mod 2=1

q0 q

λx. x mod 2=0λx. x mod 2=1

A = (Q,q0,F,δ,σ) Input sort: in this case int

Separate theory for the input

alphabetSMT SOLVER

Symbolic Finite Automata (SFA)

10

λx. x mod 2=0

λx. x mod 2=1

p q

λx. x mod 2 =0λx. x mod 2=1

1 2 5 3

p p q p p

p is final accept the input

Exe

cuti

on

E

xam

ple

11

Advantages of Symbolic Automata

• Alphabet is represented symbolically– UTF16 abstracted using BDDs – Integer using predicates over integers

• Succinctness– at most n2 transitions– One transition captures many symbols

• BUT: do DFA algorithms generalize to SFAs?

An example: SFA intersection

12

p1 q1

1

p2 q2

2

A1:

A2:

p1

p2

12A1A2:q1

q2X

delete when 12 unsatisfiable

REQUIREMENTS:Input theory must be a Boolean algebra, and

decidable

13

Moore’s algorithm

p

q

p’

q’

distinguishable

a

a

distinguishable

n2 iterations over k symbolsO(kn2)

s

s

14

Symbolic Moore’s algorithm

Initially D = F x (Q\F) U (Q\F) x Ffor each (p’,q’) in D, (p,q) not in Dlet φ,ψ guards of δ(p,p’), δ(q,q’)

if(isSat(φ ∧ ψ))add (p,q) to D

p

q

p’

q’

distinguishable

φ

ψ

distinguishable φ ∧ ψ satisfiable

m transitionsO(m2 f(k))

k = size of biggest predicate in SFA

15

Sometimes Moore is LessFrom: Rani Abdellatif Sent: Tuesday, November 13, 2012 12:55 PM To: Margus Veanes Cc: Patrick McFalls Subject: RE: Password generation help Margus, I tested the perf of the sample you sent me with password lengths from 8 to 15 chars and here are the results:

Chars Time ms 8 171 9 406

10 1061 11 2044 12 3698 13 6271 14 11591 15 18362

This time is the time it takes to run sfa.Determinize(rex.Solver).Minimize(rex.Solver). The time required to create the SFA or generate samples once it’s created is quite small in comparison. We are expecting 15 characters to be on the shorter end of password we’ll generate, going up to 128 characters.

18 sec for 15 characters!

the culprit

should scale up to 128

characters!

16

Hopcroft’s algorithm: intuition

FQ\F

17


a

a

a

RA

S

18


P3

P2P1 P4

R

Keep partitioning with respect to Wfor every input symbol

b

b

19


R

Let’s assume I already split according to R

P2

P1

20


RQ


P2

P1

Do I need to consider both P1 and for P2 future splitting?

21


a

a

a

RQ


P2

P1


22


a

aa

RQ


P2

P1


23


a

a

a

RQ


P2

P1


NO I ONLY NEED ONE!

24

Hopcroft’s algorithm

P := {F, Q\F}W := {if |F|< |Q\F| then F else Q\F}while W != { }

R:=pickFrom(W)foreach a in Σ

S := δ-1(R,a)

while ∃ T ∈ P. T∩S ≠ {} ∧ T \S ≠ {}P,W := split(P, P∩S , P\S)

return partitioned DFA

log n iterationsO(kn log n)

Hopcroft’s algorithm example

0

1 2

3 4 6

55

6

a

b a

a,ba,b

P2P1

b

a

b

a

b

a

b

R

PARTITION: {P1, P2}

TO ANALYZE: {P2}


0

1 2

3 4 6

55

6

a

b a

a,ba,b

b

a

b

a

b

a

b

RP2P11 P12

PARTITION: {P11, P12, P2}

TO ANALYZE: {P2, P12}


0

1 2

3 4 6

55

6

a

b a

a,ba,b

b

a

b

a

b

a

b

R P2P11 P12

PARTITION: {P11, P12, P2}

TO ANALYZE: {P12}


0

1 2

3 4 6

55

6

a

b a

a,ba,b

b

a

b

a

b

a

b

0 1,3 2,4 65,6a,b a,ba,b a,b

29

Symbolic Hopcroft’s algorithm

P := {F, Q\F}W := {if |F|< |Q\F| then F else Q\F}while W != { }

R:=pickFrom(W)foreach a in Σ

S := δ-1(R,a)

while ∃ T ∈ P. T∩S ≠ {} ∧ T \S ≠ {}P,W := split(P, P∩S , P\S)


Alphabet might not be finite

30

Finitize the alphabet

φ1 φ2

φ3φ‘7

φ'3

φ‘1

φ‘4

φ‘2

φ‘5

φ‘6

φ‘8

Predicates:{x>5, x<10, x=3}

Minterms:{x=3, x≤5, 5<x<10, x≥10}

31

Symbolic Hopcroft’s algorithm

P := {F, Q\F}W := {if |F|< |Q\F| then F else Q\F}while W ≠ {}

R:=pickFrom(W)foreach φ in Minterms(A)

S := δ-1(R, φ)

while ∃ T ∈ P. T∩S ≠ {} ∧ T \S ≠

{}P,W := split(P, P∩S , P\S)


log n iterationsO(2mnlog n+2mf(mk))

We need something better

32

New Algorithm: Intuition

Φ

ψ

A R

P1

P2

p

q What if Φ ≠ ψ?

Φ\ψ

Example 1/2

0

1 2

3 4 6

55

6

x<0

x≥0

-2<x<5

-5<x<3-2<x<5

-5<x<3

truetrue

FQ\F

false ≠ -5<x<3

R

Example 1/2

0

1 2

3 4 6

55

6

x<0

x≥0

-2<x<5

-5<x<3-2<x<5

-5<x<3

truetrue

R

Example 2/2

r65p q true

x<2

x<5

x≥2

x≥5

Both p and q go to r, but…

x≥2 x≥5 ?? NO

Then p is distinguishable from q

R

37

New Algorithm

P := {F, Q\F}W := {if |F|< |Q\F| then F else Q\F}while W ≠ { }

R := pickFrom(W); S := δ-1(R, true);while ∃ A ∈ P. A∩S ≠ {} ∧ ∃p1,p2. δ-1(p1) ≠ δ-1(p2)

P,W := split(P, P∩S , P\S, witness(δ-1(p1) ≠ δ-

1(p2))


log n iterationsO(n2log n f(nk))

Experiments

1. Randomly generated DFAs SFAs using BDDs (sort = bitvec 7 bits)

2. SFAs generated from regexesSFAs using BDDs (sort = bitvec 16 bits)

3. A corner case of Minterm generationSFAs using BDDs (sort = bitvec 20 bits)

4. Randomly generated SFAs over string x intSFAs over using Z3 (sort = string x int)

5. Monadic second order logic to DFA transformationSFAs using BDDs (sort = bitvec 40 bits)

1) Randomly generated DFAs5 billion DFAs: 10 to 100 states, 2 to 50 symbols From [Almeida, Moreira, Reis, TR05]

2) SFAs generated from regexes (regexplib.com)

3000 regexes over UTF16 alphabet (216 elems)From [regexplib.com]

Both axis logscale

More States =>Moore Worse

3) A corner case of Minterm generation

This SFA has 2k minterms!!

brics.automata.dkUses intervals instead

of BDDs

Logscale

4) Randomly generated SFAs over string x int

Randomly generated 10 SFAs over string x int and minimized all the intersections, complement, difference, and union of such SFAs

Random generation causes many predicate overlaps minterms

5) MSO logic to DFA transformation

[IJFCS05]State of the art

for MSO

44

ConclusionResults• Adapted classical minimization algorithm to the

symbolic setting• New minimization algorithm for symbolic automata

(faster than previous ones)Future work• Extend to tree automata• Extend classical automata problems to SFAs

– Edit distance?– Regex for symbolic automata?

Minimization of Symbolic Automata Presented By: Loris D’Antoni Joint work with: Margus Veanes...

Documents

Transcript of Minimization of Symbolic Automata Presented By: Loris D’Antoni Joint work with: Margus Veanes...