Regular Expressions - Cornell University€¦ · Regular Expressions CS 2800: Discrete Structures,...

34
Regular Expressions CS 2800: Discrete Structures, Fall 2014 Sid Chaudhuri

Transcript of Regular Expressions - Cornell University€¦ · Regular Expressions CS 2800: Discrete Structures,...

Page 1: Regular Expressions - Cornell University€¦ · Regular Expressions CS 2800: Discrete Structures, Fall 2014 Sid Chaudhuri. Regular expressions (“regex”-es) are defned inductively

Regular Expressions

CS 2800: Discrete Structures, Fall 2014

Sid Chaudhuri

Page 2: Regular Expressions - Cornell University€¦ · Regular Expressions CS 2800: Discrete Structures, Fall 2014 Sid Chaudhuri. Regular expressions (“regex”-es) are defned inductively

Regular expressions (“regex”-es) are defned inductively

(start with simple base expressions, construct more complicated ones recursively)

Page 3: Regular Expressions - Cornell University€¦ · Regular Expressions CS 2800: Discrete Structures, Fall 2014 Sid Chaudhuri. Regular expressions (“regex”-es) are defned inductively

Empty set

L(∅) = ∅

Page 4: Regular Expressions - Cornell University€¦ · Regular Expressions CS 2800: Discrete Structures, Fall 2014 Sid Chaudhuri. Regular expressions (“regex”-es) are defned inductively

Empty string

L(ε) = {ε}

Page 5: Regular Expressions - Cornell University€¦ · Regular Expressions CS 2800: Discrete Structures, Fall 2014 Sid Chaudhuri. Regular expressions (“regex”-es) are defned inductively

Literal character

L(x) = {x}

e.g. L(1) = {1}, L(2) = {2}, L(a) = {a}

Page 6: Regular Expressions - Cornell University€¦ · Regular Expressions CS 2800: Discrete Structures, Fall 2014 Sid Chaudhuri. Regular expressions (“regex”-es) are defned inductively

Concatenation

L(AB) = {ab | a ∈ L(A), b ∈ L(B)}

e.g. L(12) = {12}, L(aabb) = {aabb}

Page 7: Regular Expressions - Cornell University€¦ · Regular Expressions CS 2800: Discrete Structures, Fall 2014 Sid Chaudhuri. Regular expressions (“regex”-es) are defned inductively

Concatenation

L(AB) = {ab | a ∈ L(A), b ∈ L(B)}

e.g. L(12) = {12}, L(aabb) = {aabb},L(aε) = ?

Page 8: Regular Expressions - Cornell University€¦ · Regular Expressions CS 2800: Discrete Structures, Fall 2014 Sid Chaudhuri. Regular expressions (“regex”-es) are defned inductively

Concatenation

L(AB) = {ab | a ∈ L(A), b ∈ L(B)}

e.g. L(12) = {12}, L(aabb) = {aabb},L(aε) = {a}

Page 9: Regular Expressions - Cornell University€¦ · Regular Expressions CS 2800: Discrete Structures, Fall 2014 Sid Chaudhuri. Regular expressions (“regex”-es) are defned inductively

Concatenation

L(AB) = {ab | a ∈ L(A), b ∈ L(B)}

e.g. L(12) = {12}, L(aabb) = {aabb},L(aε) = {a}, L(a∅) = ?

Page 10: Regular Expressions - Cornell University€¦ · Regular Expressions CS 2800: Discrete Structures, Fall 2014 Sid Chaudhuri. Regular expressions (“regex”-es) are defned inductively

Concatenation

L(AB) = {ab | a ∈ L(A), b ∈ L(B)}

e.g. L(12) = {12}, L(aabb) = {aabb},L(aε) = {a}, L(a∅) = ∅

Page 11: Regular Expressions - Cornell University€¦ · Regular Expressions CS 2800: Discrete Structures, Fall 2014 Sid Chaudhuri. Regular expressions (“regex”-es) are defned inductively

Alternation

L(A|B) = L(A) ∪ L(B)

e.g. L(1|2) = {1, 2}, L(aa|bb) = {aa, bb}

Page 12: Regular Expressions - Cornell University€¦ · Regular Expressions CS 2800: Discrete Structures, Fall 2014 Sid Chaudhuri. Regular expressions (“regex”-es) are defned inductively

Alternation

L(A|B) = L(A) ∪ L(B)

e.g. L(1|2) = {1, 2}, L(aa|bb) = {aa, bb},L(a|ε) = ?

Page 13: Regular Expressions - Cornell University€¦ · Regular Expressions CS 2800: Discrete Structures, Fall 2014 Sid Chaudhuri. Regular expressions (“regex”-es) are defned inductively

Alternation

L(A|B) = L(A) ∪ L(B)

e.g. L(1|2) = {1, 2}, L(aa|bb) = {aa, bb},L(a|ε) = {a, ε}

Page 14: Regular Expressions - Cornell University€¦ · Regular Expressions CS 2800: Discrete Structures, Fall 2014 Sid Chaudhuri. Regular expressions (“regex”-es) are defned inductively

Alternation

L(A|B) = L(A) ∪ L(B)

e.g. L(1|2) = {1, 2}, L(aa|bb) = {aa, bb},L(a|ε) = {a, ε}, L(a|∅) = ?

Page 15: Regular Expressions - Cornell University€¦ · Regular Expressions CS 2800: Discrete Structures, Fall 2014 Sid Chaudhuri. Regular expressions (“regex”-es) are defned inductively

Alternation

L(A|B) = L(A) ∪ L(B)

e.g. L(1|2) = {1, 2}, L(aa|bb) = {aa, bb},L(a|ε) = {a, ε}, L(a|∅) = {a}

Page 16: Regular Expressions - Cornell University€¦ · Regular Expressions CS 2800: Discrete Structures, Fall 2014 Sid Chaudhuri. Regular expressions (“regex”-es) are defned inductively

Kleene star

L(A*) ={ε} ∪ { x

1x

2...x

n | n ∈ N, x

i ∈ L(A) }

e.g. L(a*) = {ε, a, aa, aaa, aaaa, ...}

Page 17: Regular Expressions - Cornell University€¦ · Regular Expressions CS 2800: Discrete Structures, Fall 2014 Sid Chaudhuri. Regular expressions (“regex”-es) are defned inductively

Kleene star

L(A*) ={ε} ∪ { x

1x

2...x

n | n ∈ N, x

i ∈ L(A) }

e.g. L(a*) = {ε, a, aa, aaa, aaaa, ...},L((ab)*) = ?

Page 18: Regular Expressions - Cornell University€¦ · Regular Expressions CS 2800: Discrete Structures, Fall 2014 Sid Chaudhuri. Regular expressions (“regex”-es) are defned inductively

Kleene star

L(A*) ={ε} ∪ { x

1x

2...x

n | n ∈ N, x

i ∈ L(A) }

e.g. L(a*) = {ε, a, aa, aaa, aaaa, ...},L((ab)*) = {ε, ab, abab, ababab, ...}

Page 19: Regular Expressions - Cornell University€¦ · Regular Expressions CS 2800: Discrete Structures, Fall 2014 Sid Chaudhuri. Regular expressions (“regex”-es) are defned inductively

Kleene star

L(A*) ={ε} ∪ { x

1x

2...x

n | n ∈ N, x

i ∈ L(A) }

e.g. L(a*) = {ε, a, aa, aaa, aaaa, ...},L((ab)*) = {ε, ab, abab, ababab, ...},

L((a|b)*) = ?

Page 20: Regular Expressions - Cornell University€¦ · Regular Expressions CS 2800: Discrete Structures, Fall 2014 Sid Chaudhuri. Regular expressions (“regex”-es) are defned inductively

Kleene star

L(A*) ={ε} ∪ { x

1x

2...x

n | n ∈ N, x

i ∈ L(A) }

e.g. L(a*) = {ε, a, aa, aaa, aaaa, ...},L((ab)*) = {ε, ab, abab, ababab, ...},

L((a|b)*) = {ε, a, b, aa, ab, ba, bb, aaa, aab, aba, ...}

Page 21: Regular Expressions - Cornell University€¦ · Regular Expressions CS 2800: Discrete Structures, Fall 2014 Sid Chaudhuri. Regular expressions (“regex”-es) are defned inductively

Kleene star

L(A*) ={ε} ∪ { x

1x

2...x

n | n ∈ N, x

i ∈ L(A) }

e.g. L(a*) = {ε, a, aa, aaa, aaaa, ...},L((ab)*) = {ε, ab, abab, ababab, ...},

L((a|b)*) = {ε, a, b, aa, ab, ba, bb, aaa, aab, aba, ...},L(ε*) = ?

Page 22: Regular Expressions - Cornell University€¦ · Regular Expressions CS 2800: Discrete Structures, Fall 2014 Sid Chaudhuri. Regular expressions (“regex”-es) are defned inductively

Kleene star

L(A*) ={ε} ∪ { x

1x

2...x

n | n ∈ N, x

i ∈ L(A) }

e.g. L(a*) = {ε, a, aa, aaa, aaaa, ...},L((ab)*) = {ε, ab, abab, ababab, ...},

L((a|b)*) = {ε, a, b, aa, ab, ba, bb, aaa, aab, aba, ...},L(ε*) = {ε}

Page 23: Regular Expressions - Cornell University€¦ · Regular Expressions CS 2800: Discrete Structures, Fall 2014 Sid Chaudhuri. Regular expressions (“regex”-es) are defned inductively

Kleene star

L(A*) ={ε} ∪ { x

1x

2...x

n | n ∈ N, x

i ∈ L(A) }

e.g. L(a*) = {ε, a, aa, aaa, aaaa, ...},L((ab)*) = {ε, ab, abab, ababab, ...},

L((a|b)*) = {ε, a, b, aa, ab, ba, bb, aaa, aab, aba, ...},L(ε*) = {ε}, L(∅*) = ?

Page 24: Regular Expressions - Cornell University€¦ · Regular Expressions CS 2800: Discrete Structures, Fall 2014 Sid Chaudhuri. Regular expressions (“regex”-es) are defned inductively

Kleene star

L(A*) ={ε} ∪ { x

1x

2...x

n | n ∈ N, x

i ∈ L(A) }

e.g. L(a*) = {ε, a, aa, aaa, aaaa, ...},L((ab)*) = {ε, ab, abab, ababab, ...},

L((a|b)*) = {ε, a, b, aa, ab, ba, bb, aaa, aab, aba, ...},L(ε*) = {ε}, L(∅*) = {ε}

Page 25: Regular Expressions - Cornell University€¦ · Regular Expressions CS 2800: Discrete Structures, Fall 2014 Sid Chaudhuri. Regular expressions (“regex”-es) are defned inductively

Playing with regexes

● http://regex101.com/

● http://rubular.com/

● http://www.google.com/search?q=online+regex+tester

Page 26: Regular Expressions - Cornell University€¦ · Regular Expressions CS 2800: Discrete Structures, Fall 2014 Sid Chaudhuri. Regular expressions (“regex”-es) are defned inductively

Kleene's Theorem

● A language is recognized by a regular expression (that is, it is a “regular language”) if and only if it is recognized by a fnite automaton

Page 27: Regular Expressions - Cornell University€¦ · Regular Expressions CS 2800: Discrete Structures, Fall 2014 Sid Chaudhuri. Regular expressions (“regex”-es) are defned inductively

Kleene's Theorem

● A language is recognized by a regular expression (that is, it is a “regular language”) if and only if it is recognized by a fnite automaton– Regex has FA

● Relatively simple construction

Page 28: Regular Expressions - Cornell University€¦ · Regular Expressions CS 2800: Discrete Structures, Fall 2014 Sid Chaudhuri. Regular expressions (“regex”-es) are defned inductively

Kleene's Theorem

● A language is recognized by a regular expression (that is, it is a “regular language”) if and only if it is recognized by a fnite automaton– Regex has FA

● Relatively simple construction

– FA has regex● Tricky to prove

Page 29: Regular Expressions - Cornell University€¦ · Regular Expressions CS 2800: Discrete Structures, Fall 2014 Sid Chaudhuri. Regular expressions (“regex”-es) are defned inductively

Regex → FA

● For every regular expression, there is a fnite automaton that recognizes the same language

Page 30: Regular Expressions - Cornell University€¦ · Regular Expressions CS 2800: Discrete Structures, Fall 2014 Sid Chaudhuri. Regular expressions (“regex”-es) are defned inductively

Regex → FA

● For every regular expression, there is a fnite automaton that recognizes the same language

● We will construct an ε-NFA

Page 31: Regular Expressions - Cornell University€¦ · Regular Expressions CS 2800: Discrete Structures, Fall 2014 Sid Chaudhuri. Regular expressions (“regex”-es) are defned inductively

Regex → FA

● For every regular expression, there is a fnite automaton that recognizes the same language

● We will construct an ε-NFA– … which can be converted to an NFA

Page 32: Regular Expressions - Cornell University€¦ · Regular Expressions CS 2800: Discrete Structures, Fall 2014 Sid Chaudhuri. Regular expressions (“regex”-es) are defned inductively

Regex → FA

● For every regular expression, there is a fnite automaton that recognizes the same language

● We will construct an ε-NFA– … which can be converted to an NFA– … which can be converted to a DFA

Page 33: Regular Expressions - Cornell University€¦ · Regular Expressions CS 2800: Discrete Structures, Fall 2014 Sid Chaudhuri. Regular expressions (“regex”-es) are defned inductively

Recap: NFAs with epsilon transitions

● Just like ordinary NFAs, but...– Can “instantaneously” change state without reading an input symbol

– Valid transitions of this type are shown by arcs labeled 'ε'

– Note that ε does not suddenly become a member of the alphabet. Instead, we assume ε does not belong to any alphabet – it's a special symbol.

q0

q2

q1

a ε

εb

Page 34: Regular Expressions - Cornell University€¦ · Regular Expressions CS 2800: Discrete Structures, Fall 2014 Sid Chaudhuri. Regular expressions (“regex”-es) are defned inductively

Why ε-NFAs?● Suitable for representing “or” relations● E.g. L = { an | n ∈ N is divisible by 2 or 3 }

● … but they're equivalent to NFAs and DFAs

p

r0

q0

q1

r1 r

2

ε

ε

a

aa a

a