Regular Expression Sub-Matching using Partial Derivativessuma0002/talks/ppdp12-part... · 2013. 8....

44
Regular Expression Sub-Matching using Partial Derivatives Martin Sulzmann Kenny Zhuo Ming Lu Hochschule Karlsruhe Nanyang Polytechnic Regular Expression Sub-Matching using Partial Derivatives – p. 1/1

Transcript of Regular Expression Sub-Matching using Partial Derivativessuma0002/talks/ppdp12-part... · 2013. 8....

Page 1: Regular Expression Sub-Matching using Partial Derivativessuma0002/talks/ppdp12-part... · 2013. 8. 7. · Regular Expression Sub-Matching using Partial Derivatives Martin Sulzmann

Regular Expression Sub-Matchingusing Partial Derivatives

Martin Sulzmann Kenny Zhuo Ming Lu

Hochschule Karlsruhe Nanyang Polytechnic

Regular Expression Sub-Matching using Partial Derivatives – p. 1/18

Page 2: Regular Expression Sub-Matching using Partial Derivativessuma0002/talks/ppdp12-part... · 2013. 8. 7. · Regular Expression Sub-Matching using Partial Derivatives Martin Sulzmann

Regular Expressions - The Basics

Words: w ::= Σ∗

Regular expressions

r ::= r + r Choice| rr Concatenation| r∗ Kleene star| ǫ Empty word| φ Empty language| l ∈ Σ Letters

(A + (BC))∗ denotes a regular language

L( (A + (BC))∗ ) = {ǫ, A, BC, ABC, ...}

Regular Expression Sub-Matching using Partial Derivatives – p. 2/18

Page 3: Regular Expression Sub-Matching using Partial Derivativessuma0002/talks/ppdp12-part... · 2013. 8. 7. · Regular Expression Sub-Matching using Partial Derivatives Martin Sulzmann

Regular ExpressionSub-Matching

Matching

w matches r iff w ∈ L(r)

ABAAC matches (A + AB)(BAA + A)(AC + C)

L( (A + AB)(BAA + A)(AC + C) ) ={ABAAAC, ABAAC, AAAC, AAC, ABBAAAC, ABBAAC, ABAC}

Regular Expression Sub-Matching using Partial Derivatives – p. 3/18

Page 4: Regular Expression Sub-Matching using Partial Derivativessuma0002/talks/ppdp12-part... · 2013. 8. 7. · Regular Expression Sub-Matching using Partial Derivatives Martin Sulzmann

Regular ExpressionSub-Matching

Matching

w matches r iff w ∈ L(r)

ABAAC matches (A + AB)(BAA + A)(AC + C)

L( (A + AB)(BAA + A)(AC + C) ) ={ABAAAC, ABAAC, AAAC, AAC, ABBAAAC, ABBAAC, ABAC}

Which sub-parts are matched?

Regular Expression Sub-Matching using Partial Derivatives – p. 3/18

Page 5: Regular Expression Sub-Matching using Partial Derivativessuma0002/talks/ppdp12-part... · 2013. 8. 7. · Regular Expression Sub-Matching using Partial Derivatives Martin Sulzmann

Regular ExpressionSub-Matching

Matching

w matches r iff w ∈ L(r)

ABAAC matches (A + AB)(BAA + A)(AC + C)

L( (A + AB)(BAA + A)(AC + C) ) ={ABAAAC, ABAAC, AAAC, AAC, ABBAAAC, ABBAAC, ABAC}

Which sub-parts are matched?

(x1 : (A + AB))(x2 : (BAA + A))(x3 : (AC + C))

Regular Expression Sub-Matching using Partial Derivatives – p. 3/18

Page 6: Regular Expression Sub-Matching using Partial Derivativessuma0002/talks/ppdp12-part... · 2013. 8. 7. · Regular Expression Sub-Matching using Partial Derivatives Martin Sulzmann

Regular ExpressionSub-Matching

Matching

w matches r iff w ∈ L(r)

ABAAC matches (A + AB)(BAA + A)(AC + C)

L( (A + AB)(BAA + A)(AC + C) ) ={ABAAAC, ABAAC, AAAC, AAC, ABBAAAC, ABBAAC, ABAC}

Which sub-parts are matched?

(x1 : (A + AB))(x2 : (BAA + A))(x3 : (AC + C))

Sub-matchings for ABAAC arex1 = AB

x2 = A

x3 = AC

Regular Expression Sub-Matching using Partial Derivatives – p. 3/18

Page 7: Regular Expression Sub-Matching using Partial Derivativessuma0002/talks/ppdp12-part... · 2013. 8. 7. · Regular Expression Sub-Matching using Partial Derivatives Martin Sulzmann

Regular ExpressionSub-Matching

Matching

w matches r iff w ∈ L(r)

ABAAC matches (A + AB)(BAA + A)(AC + C)

L( (A + AB)(BAA + A)(AC + C) ) ={ABAAAC, ABAAC, AAAC, AAC, ABBAAAC, ABBAAC, ABAC}

Which sub-parts are matched?

(x1 : (A + AB))(x2 : (BAA + A))(x3 : (AC + C))

Sub-matchings for ABAAC arex1 = AB

x2 = A

x3 = AC

Now that the difference is clear:Matching = Sub-matching

Regular Expression Sub-Matching using Partial Derivatives – p. 3/18

Page 8: Regular Expression Sub-Matching using Partial Derivativessuma0002/talks/ppdp12-part... · 2013. 8. 7. · Regular Expression Sub-Matching using Partial Derivatives Martin Sulzmann

Slow Regular Expression Matching

ExampleAn “A n-times”(x : A?n)(y : An) r? = ǫ + r

Regular Expression Sub-Matching using Partial Derivatives – p. 4/18

Page 9: Regular Expression Sub-Matching using Partial Derivativessuma0002/talks/ppdp12-part... · 2013. 8. 7. · Regular Expression Sub-Matching using Partial Derivatives Martin Sulzmann

Slow Regular Expression Matching

ExampleAn “A n-times”(x : A?n)(y : An) r? = ǫ + r

Consider n = 2

Regular Expression Sub-Matching using Partial Derivatives – p. 4/18

Page 10: Regular Expression Sub-Matching using Partial Derivativessuma0002/talks/ppdp12-part... · 2013. 8. 7. · Regular Expression Sub-Matching using Partial Derivatives Martin Sulzmann

Slow Regular Expression Matching

ExampleAn “A n-times”(x : A?n)(y : An) r? = ǫ + r

Consider n = 2

AA ⊢ (x : A?A?)(y : AA)

Regular Expression Sub-Matching using Partial Derivatives – p. 4/18

Page 11: Regular Expression Sub-Matching using Partial Derivativessuma0002/talks/ppdp12-part... · 2013. 8. 7. · Regular Expression Sub-Matching using Partial Derivatives Martin Sulzmann

Slow Regular Expression Matching

ExampleAn “A n-times”(x : A?n)(y : An) r? = ǫ + r

Consider n = 2

AA ⊢ (x : A?A?)(y : AA)

AA ⊢ (x : A?A?)(y : AA)

Regular Expression Sub-Matching using Partial Derivatives – p. 4/18

Page 12: Regular Expression Sub-Matching using Partial Derivativessuma0002/talks/ppdp12-part... · 2013. 8. 7. · Regular Expression Sub-Matching using Partial Derivatives Martin Sulzmann

Slow Regular Expression Matching

ExampleAn “A n-times”(x : A?n)(y : An) r? = ǫ + r

Consider n = 2

AA ⊢ (x : A?A?)(y : AA)

AA ⊢ (x : A?A?)(y : AA)

Fail ⇒ Backtrack

Regular Expression Sub-Matching using Partial Derivatives – p. 4/18

Page 13: Regular Expression Sub-Matching using Partial Derivativessuma0002/talks/ppdp12-part... · 2013. 8. 7. · Regular Expression Sub-Matching using Partial Derivatives Martin Sulzmann

Slow Regular Expression Matching

ExampleAn “A n-times”(x : A?n)(y : An) r? = ǫ + r

Consider n = 2

AA ⊢ (x : A?A?)(y : AA)

AA ⊢ (x : A?A?)(y : AA)

Fail ⇒ BacktrackAA ⊢ (x : A?A?)(y : AA)

Regular Expression Sub-Matching using Partial Derivatives – p. 4/18

Page 14: Regular Expression Sub-Matching using Partial Derivativessuma0002/talks/ppdp12-part... · 2013. 8. 7. · Regular Expression Sub-Matching using Partial Derivatives Martin Sulzmann

Slow Regular Expression Matching

ExampleAn “A n-times”(x : A?n)(y : An) r? = ǫ + r

Consider n = 2

AA ⊢ (x : A?A?)(y : AA)

AA ⊢ (x : A?A?)(y : AA)

Fail ⇒ BacktrackAA ⊢ (x : A?A?)(y : AA)

...

Regular Expression Sub-Matching using Partial Derivatives – p. 4/18

Page 15: Regular Expression Sub-Matching using Partial Derivativessuma0002/talks/ppdp12-part... · 2013. 8. 7. · Regular Expression Sub-Matching using Partial Derivatives Martin Sulzmann

Slow Regular Expression Matching

ExampleAn “A n-times”(x : A?n)(y : An) r? = ǫ + r

Consider n = 2

AA ⊢ (x : A?A?)(y : AA)

AA ⊢ (x : A?A?)(y : AA)

Fail ⇒ BacktrackAA ⊢ (x : A?A?)(y : AA)

...AA ⊢ (x : A?A?)(y : AA)

Regular Expression Sub-Matching using Partial Derivatives – p. 4/18

Page 16: Regular Expression Sub-Matching using Partial Derivativessuma0002/talks/ppdp12-part... · 2013. 8. 7. · Regular Expression Sub-Matching using Partial Derivatives Martin Sulzmann

Slow Regular Expression Matching

ExampleAn “A n-times”(x : A?n)(y : An) r? = ǫ + r

Consider n = 2

AA ⊢ (x : A?A?)(y : AA)

AA ⊢ (x : A?A?)(y : AA)

Fail ⇒ BacktrackAA ⊢ (x : A?A?)(y : AA)

...AA ⊢ (x : A?A?)(y : AA)

Success but exponential complexity

Regular Expression Sub-Matching using Partial Derivatives – p. 4/18

Page 17: Regular Expression Sub-Matching using Partial Derivativessuma0002/talks/ppdp12-part... · 2013. 8. 7. · Regular Expression Sub-Matching using Partial Derivatives Martin Sulzmann

Fast Regular Expression Matching

For brevity, we ignore sub-matching locations

Regular Expression Sub-Matching using Partial Derivatives – p. 5/18

Page 18: Regular Expression Sub-Matching using Partial Derivativessuma0002/talks/ppdp12-part... · 2013. 8. 7. · Regular Expression Sub-Matching using Partial Derivatives Martin Sulzmann

Fast Regular Expression Matching

For brevity, we ignore sub-matching locations

Convert A?A?AA to NFA

�������� A//�������� A

//�������� A//�������� A

//����������������

//��������

OO

//��������

OO

//��������

OO

Regular Expression Sub-Matching using Partial Derivatives – p. 5/18

Page 19: Regular Expression Sub-Matching using Partial Derivativessuma0002/talks/ppdp12-part... · 2013. 8. 7. · Regular Expression Sub-Matching using Partial Derivatives Martin Sulzmann

Fast Regular Expression Matching

For brevity, we ignore sub-matching locations

Convert A?A?AA to NFA

�������� A//�������� A

//�������� A//�������� A

//����������������

//��������

OO

//��������

OO

//��������

OO

Simultaneous search for match AA

Regular Expression Sub-Matching using Partial Derivatives – p. 5/18

Page 20: Regular Expression Sub-Matching using Partial Derivativessuma0002/talks/ppdp12-part... · 2013. 8. 7. · Regular Expression Sub-Matching using Partial Derivatives Martin Sulzmann

Fast Regular Expression Matching

For brevity, we ignore sub-matching locations

Convert A?A?AA to NFA

�������� A//• A

//• A//• A

//����������������

//��������

OO

//��������

OO

//��������

OO

Simultaneous search for match AA

AA

Regular Expression Sub-Matching using Partial Derivatives – p. 5/18

Page 21: Regular Expression Sub-Matching using Partial Derivativessuma0002/talks/ppdp12-part... · 2013. 8. 7. · Regular Expression Sub-Matching using Partial Derivatives Martin Sulzmann

Fast Regular Expression Matching

For brevity, we ignore sub-matching locations

Convert A?A?AA to NFA

�������� A//�������� A

//• A//• A

//•

//��������

OO

//��������

OO

//��������

OO

Simultaneous search for match AA

AA

Regular Expression Sub-Matching using Partial Derivatives – p. 5/18

Page 22: Regular Expression Sub-Matching using Partial Derivativessuma0002/talks/ppdp12-part... · 2013. 8. 7. · Regular Expression Sub-Matching using Partial Derivatives Martin Sulzmann

Fast Regular Expression Matching

For brevity, we ignore sub-matching locations

Convert A?A?AA to NFA

�������� A//�������� A

//• A//• A

//•

//��������

OO

//��������

OO

//��������

OO

Simultaneous search for match AA

AA

No backtracking, linear searchTo be tracked states linear in the size of regularexpressionLinear complexity!So far Thompson and Glushkov NFA constructionSee Russ Cox, Alain Frisch et. al., Ville Laurikari,...

Regular Expression Sub-Matching using Partial Derivatives – p. 5/18

Page 23: Regular Expression Sub-Matching using Partial Derivativessuma0002/talks/ppdp12-part... · 2013. 8. 7. · Regular Expression Sub-Matching using Partial Derivatives Martin Sulzmann

Our Contributions

Matching automata construction based onBrzozowski’s Derivatives (DFA)Antimirov’s Partial Derivatives (NFA)

Fast and elegant algorithms forPOSIX matchinggreedy left-most matching

Implementation in Haskell supporting real-world regularexpressions

Regular Expression Sub-Matching using Partial Derivatives – p. 6/18

Page 24: Regular Expression Sub-Matching using Partial Derivativessuma0002/talks/ppdp12-part... · 2013. 8. 7. · Regular Expression Sub-Matching using Partial Derivatives Martin Sulzmann

Brzozowski’s Derivatives

r\l “take away the leading l”

r\l derivative of r w.r.t. l

L(r\l) = {w | lw ∈ L(r)}

Compute r\l by induction, e.g.A\A = ǫ, B\A = φ, r∗\l = (r\l)r∗

r1r2\l = (r1\l)r2 + r2\l if ǫ ∈ L(r1)

Matching derivation:

r1

l→ r2 iff r2 = r1\l

w = l1...ln check if rl1→ ...

ln→ r′ where ǫ ∈ L(r′)

Regular Expression Sub-Matching using Partial Derivatives – p. 7/18

Page 25: Regular Expression Sub-Matching using Partial Derivativessuma0002/talks/ppdp12-part... · 2013. 8. 7. · Regular Expression Sub-Matching using Partial Derivatives Martin Sulzmann

Matching with Derivatives

Example: Match AB against (x : A + y : AB + z : B)∗

Regular Expression Sub-Matching using Partial Derivatives – p. 8/18

Page 26: Regular Expression Sub-Matching using Partial Derivativessuma0002/talks/ppdp12-part... · 2013. 8. 7. · Regular Expression Sub-Matching using Partial Derivatives Martin Sulzmann

Matching with Derivatives

Example: Match AB against (x : A + y : AB + z : B)∗

(A + AB + B)∗ {x : ǫ, y : ǫ, z : ǫ}

Regular Expression Sub-Matching using Partial Derivatives – p. 8/18

Page 27: Regular Expression Sub-Matching using Partial Derivativessuma0002/talks/ppdp12-part... · 2013. 8. 7. · Regular Expression Sub-Matching using Partial Derivatives Martin Sulzmann

Matching with Derivatives

Example: Match AB against (x : A + y : AB + z : B)∗

(A + AB + B)∗ {x : ǫ, y : ǫ, z : ǫ}

A→ (A + AB + B)∗\A

Regular Expression Sub-Matching using Partial Derivatives – p. 8/18

Page 28: Regular Expression Sub-Matching using Partial Derivativessuma0002/talks/ppdp12-part... · 2013. 8. 7. · Regular Expression Sub-Matching using Partial Derivatives Martin Sulzmann

Matching with Derivatives

Example: Match AB against (x : A + y : AB + z : B)∗

(A + AB + B)∗ {x : ǫ, y : ǫ, z : ǫ}

A→ (A + AB + B)∗\A

= (A + AB + B)\A (A + AB + B)∗

Regular Expression Sub-Matching using Partial Derivatives – p. 8/18

Page 29: Regular Expression Sub-Matching using Partial Derivativessuma0002/talks/ppdp12-part... · 2013. 8. 7. · Regular Expression Sub-Matching using Partial Derivatives Martin Sulzmann

Matching with Derivatives

Example: Match AB against (x : A + y : AB + z : B)∗

(A + AB + B)∗ {x : ǫ, y : ǫ, z : ǫ}

A→ (A + AB + B)∗\A

= (A + AB + B)\A (A + AB + B)∗

= (A\A + AB\A + B\A) (A + AB + B)∗

Regular Expression Sub-Matching using Partial Derivatives – p. 8/18

Page 30: Regular Expression Sub-Matching using Partial Derivativessuma0002/talks/ppdp12-part... · 2013. 8. 7. · Regular Expression Sub-Matching using Partial Derivatives Martin Sulzmann

Matching with Derivatives

Example: Match AB against (x : A + y : AB + z : B)∗

(A + AB + B)∗ {x : ǫ, y : ǫ, z : ǫ}

A→ (A + AB + B)∗\A

= (A + AB + B)\A (A + AB + B)∗

= (A\A + AB\A + B\A) (A + AB + B)∗

= (ǫ + B) (A + AB + B)∗

Regular Expression Sub-Matching using Partial Derivatives – p. 8/18

Page 31: Regular Expression Sub-Matching using Partial Derivativessuma0002/talks/ppdp12-part... · 2013. 8. 7. · Regular Expression Sub-Matching using Partial Derivatives Martin Sulzmann

Matching with Derivatives

Example: Match AB against (x : A + y : AB + z : B)∗

(A + AB + B)∗ {x : ǫ, y : ǫ, z : ǫ}

A→ (A + AB + B)∗\A

= (A + AB + B)\A (A + AB + B)∗

= (A\A + AB\A + B\A) (A + AB + B)∗

= (ǫ + B)1 (A + AB + B)∗ {x1 : A, y1 : A, z1 : ǫ}

Record matchings for each iteration

Paper records matchings within pattern

Regular Expression Sub-Matching using Partial Derivatives – p. 8/18

Page 32: Regular Expression Sub-Matching using Partial Derivativessuma0002/talks/ppdp12-part... · 2013. 8. 7. · Regular Expression Sub-Matching using Partial Derivatives Martin Sulzmann

Matching with Derivatives

Example: Match AB against (x : A + y : AB + z : B)∗

(A + AB + B)∗ {x : ǫ, y : ǫ, z : ǫ}

A→ (A + AB + B)∗\A

= (A + AB + B)\A (A + AB + B)∗

= (A\A + AB\A + B\A) (A + AB + B)∗

= (ǫ + B)1 (A + AB + B)∗ {x1 : A, y1 : A, z1 : ǫ}

B→ ((ǫ + B)1

︸ ︷︷ ︸

p1

(A + AB + B)∗︸ ︷︷ ︸

p2

)\B

(p1p2)\l = (p1\l, p2) + (empty(p1)p2\l) if p1 empty

Choice of matchings. Don’t drop p1,

keep p1 and make p1 “empty” (ǫ + B)1 ⇒ (ǫ + φ)1

so p1 won’t contribute further matchingsRegular Expression Sub-Matching using Partial Derivatives – p. 8/18

Page 33: Regular Expression Sub-Matching using Partial Derivativessuma0002/talks/ppdp12-part... · 2013. 8. 7. · Regular Expression Sub-Matching using Partial Derivatives Martin Sulzmann

Matching with Derivatives

Example: Match AB against (x : A + y : AB + z : B)∗

(A + AB + B)∗ {x : ǫ, y : ǫ, z : ǫ}

A→ (A + AB + B)∗\A

= (A + AB + B)\A (A + AB + B)∗

= (A\A + AB\A + B\A) (A + AB + B)∗

= (ǫ + B)1 (A + AB + B)∗ {x1 : A, y1 : A, z1 : ǫ}

B→ ((ǫ + B)1 (A + AB + B)∗)\B

= ((ǫ+B)1\B (A+AB +B)∗)+(ǫ+φ)1((A+AB +B)∗\B)

... = ǫ1(A + AB + B)∗ + (ǫ + φ)1ǫ2(A + AB + B)∗

Regular Expression Sub-Matching using Partial Derivatives – p. 8/18

Page 34: Regular Expression Sub-Matching using Partial Derivativessuma0002/talks/ppdp12-part... · 2013. 8. 7. · Regular Expression Sub-Matching using Partial Derivatives Martin Sulzmann

Matching with Derivatives

Example: Match AB against (x : A + y : AB + z : B)∗

(A + AB + B)∗ {x : ǫ, y : ǫ, z : ǫ}

A→ (A + AB + B)∗\A

= (A + AB + B)\A (A + AB + B)∗

= (A\A + AB\A + B\A) (A + AB + B)∗

= (ǫ + B)1 (A + AB + B)∗ {x1 : A, y1 : A, z1 : ǫ}

B→ ((ǫ + B)1 (A + AB + B)∗)\B

= ((ǫ+B)1\B (A+AB +B)∗)+(ǫ+φ)1((A+AB +B)∗\B)

... = ǫ1(A + AB + B)∗ + (ǫ + φ)1ǫ2(A + AB + B)∗

{y1 : AB} and {x1 : A, z2 : B}

POSIX and greedy left-most match

Regular Expression Sub-Matching using Partial Derivatives – p. 8/18

Page 35: Regular Expression Sub-Matching using Partial Derivativessuma0002/talks/ppdp12-part... · 2013. 8. 7. · Regular Expression Sub-Matching using Partial Derivatives Martin Sulzmann

Derivatives Matching Summary

Computes all matchings ⇒ exponential complexity

Optimization:

Simplify, e.g. aggressively to the left

(1) r + r ⇒ r keep left r (2) φ r ⇒ φ (3) ǫ r ⇒ r

ǫ1(A + AB + B)∗︸ ︷︷ ︸

p1

+ (ǫ + φ)1ǫ2(A + AB + B)∗︸ ︷︷ ︸

p2

⇒∗ ǫ1(A + AB + B)∗ + ǫ2(A + AB + B)∗

⇒ ǫ1(A + AB + B)∗

⇒ (A + AB + B)∗

Yields POSIX match

Regular Expression Sub-Matching using Partial Derivatives – p. 9/18

Page 36: Regular Expression Sub-Matching using Partial Derivativessuma0002/talks/ppdp12-part... · 2013. 8. 7. · Regular Expression Sub-Matching using Partial Derivatives Martin Sulzmann

Matching with Partial Derivatives

Derivatives represent states of a DFA·\· : r 7→ L 7→ r

(A + AB + B)∗\A = (ǫ + B) (A + AB + B)∗

On the fly DFA construction.

Partial derivatives represent states of an NFA·\p· : r 7→ L 7→ 2r

L(r\l) = L(r1 + ... + rn) where r\pl = {r1, ..., rn}

(A+AB+B)∗\pA = {ǫ(A+AB+B)∗, B(A+AB+B)∗}

Set of partial derivatives finite and linear in size ofregular expression.Build NFA match automata.

Regular Expression Sub-Matching using Partial Derivatives – p. 10/18

Page 37: Regular Expression Sub-Matching using Partial Derivativessuma0002/talks/ppdp12-part... · 2013. 8. 7. · Regular Expression Sub-Matching using Partial Derivatives Martin Sulzmann

Matching with Partial Derivatives

(Sketch of) NFA match automata

76540123p2

// 76540123p1A

//

A

>>|

||

||

||

||

76540123p3 ...

(A + AB + B)∗︸ ︷︷ ︸

p1

\pA = {ǫ(A + AB + B)∗︸ ︷︷ ︸

p2

, B(A + AB + B)∗︸ ︷︷ ︸

p3

}

Depth-first left-most traversal⇒ greedy left-most match

Not POSIX because structure is broken apart

See paper for details of construction

Regular Expression Sub-Matching using Partial Derivatives – p. 11/18

Page 38: Regular Expression Sub-Matching using Partial Derivativessuma0002/talks/ppdp12-part... · 2013. 8. 7. · Regular Expression Sub-Matching using Partial Derivatives Martin Sulzmann

POSIX versus Greedy Left-Most Match

Derivatives for POSIX matching

POSIX = maximal match w.r.t structureAB matches (A + AB + B)∗

(A + AB + B)∗\A = (ǫ + B) (A + AB + B)∗

Partial derivatives for greedy left-most matchingGreedy left-most = maximal match ignoring anystructureAB matches (A + AB + B)∗

(A + AB + B)∗\pA ={ǫ(A + AB + B)∗,

B(A + AB + B)∗}

Regular Expression Sub-Matching using Partial Derivatives – p. 12/18

Page 39: Regular Expression Sub-Matching using Partial Derivativessuma0002/talks/ppdp12-part... · 2013. 8. 7. · Regular Expression Sub-Matching using Partial Derivatives Martin Sulzmann

Implementation

Fine-tuned Haskell implementation of greedy left-mostusing partial derivative NFAs.

Real-world extensions:Group matchings, anchored match, ...

Competitive performance (see paper for details):C-based: RE2, PCREHaskell-based: Weighted, TDFA

Partial derivative NFA construction “smaller” comparedto Thompon and Glushkov NFA construction

Reference implementation of Thompon, Glushkovand Partial Derivative NFA construction

Regular Expression Sub-Matching using Partial Derivatives – p. 13/18

Page 40: Regular Expression Sub-Matching using Partial Derivativessuma0002/talks/ppdp12-part... · 2013. 8. 7. · Regular Expression Sub-Matching using Partial Derivatives Martin Sulzmann

Conclusion

First application of derivatives and partial derivatives forregular expression sub-matching

Future work:Implementation in other languagesEfficient POSIX implementation

Tricky for NFA but see “backwards scanning” trickby Russ CoxExploiting laziness of the on-the fly derivativeconstruction

Error explanationWhy is there no match?

Regular Expression Sub-Matching using Partial Derivatives – p. 14/18

Page 41: Regular Expression Sub-Matching using Partial Derivativessuma0002/talks/ppdp12-part... · 2013. 8. 7. · Regular Expression Sub-Matching using Partial Derivatives Martin Sulzmann

Errata - Simplifications

page 7, left colomn: The derivation should be as followswhere we have underlined the corrected parts andexpressions involving φ have been alredy removed.

(x|ǫ : A∗, y|ǫ : A∗)A→ (x|A : A∗, y|ǫ : A∗) + (x|ǫ : ǫ, y|A : A∗)A→ ((x|AA : A∗, y|ǫ : A∗) + (x|A : ǫ, y|A : A∗))+

(x|ǫ : ǫ, y|AA : A∗))A→ ...

Simplifications at the pattern and regular expression level.

Regular Expression Sub-Matching using Partial Derivatives – p. 15/18

Page 42: Regular Expression Sub-Matching using Partial Derivativessuma0002/talks/ppdp12-part... · 2013. 8. 7. · Regular Expression Sub-Matching using Partial Derivatives Martin Sulzmann

Errata - Derivative Matching

Figure 8 “Derivative Matching”:env(·) :: p → {Γ}

env((x|w : r)) =

8

<

:

{{(x, w)}} if ǫ ∈ L(r)

{} otherwise

env((x|w : p)) = {{(x, w)} ⊎ es|es ∈ env(p)}

env((p1, p2)) = {e1 ⊎ e2|e1 ∈ env(p1), e2 ∈ env(p2)}

env((p1 + p2)) = env(p1) ⊎ env(p2)

env(p∗) = env(p)

match(·, ·) :: p → w → {Γ}

match(p, w) = env(p\w)

There’s an issue:

In case env(p) yields {} but the pattern is empty, weshould actually return instead {{}}.

Regular Expression Sub-Matching using Partial Derivatives – p. 16/18

Page 43: Regular Expression Sub-Matching using Partial Derivativessuma0002/talks/ppdp12-part... · 2013. 8. 7. · Regular Expression Sub-Matching using Partial Derivatives Martin Sulzmann

Errata - Derivative Matching (2)

To explain the issue, consider the initial pattern(x|ǫ : (y|ǫ : A)∗).Building the derivative w.r.t A yields

(x|A : (y|A : ǫ, (y|ǫ : A)∗))

Applying env() on the subpattern (y|ǫ : A)∗ yields {}because the underlying pattern (y|ǫ : A) is not empty.Clearly, (y|ǫ : A)∗ contains empty (zero iterations). Hence, inthis situation, we shouldn’t return {} (“no match”) but ratherreport {{}} (“empty match”).

Regular Expression Sub-Matching using Partial Derivatives – p. 17/18

Page 44: Regular Expression Sub-Matching using Partial Derivativessuma0002/talks/ppdp12-part... · 2013. 8. 7. · Regular Expression Sub-Matching using Partial Derivatives Martin Sulzmann

Errata - Derivative Matching (3)

Here’s the fix:env(·) :: p → {Γ}

env((x|w : r)) =

8

<

:

{{(x, w)}} if ǫ ∈ L(r)

{} otherwise

env((x|w : p)) = envH (((x|w : p), {{(x, w)} ⊎ es|es ∈ env(p)}))

env((p1, p2)) = envH ((p1, p2), {e1 ⊎ e2|e1 ∈ env(p1), e2 ∈ env(p2)}))

env((p1 + p2)) = envH ((p1 + p2, env(p1) ⊎ env(p2)))

env(p∗) = envH ((p∗, env(p)))

envH ((p, e)) =

8

<

:

{{}} if ǫ ∈ L(p ↓) and e = {}

e otherwise

Regular Expression Sub-Matching using Partial Derivatives – p. 18/18