Download - Finite Automata and Regular Expressions Haniel Barbosahomepage.divms.uiowa.edu/~hbarbosa/teaching/cs4330/... · Regular Expressions A regular expression (RE in short) is a string

Transcript
Page 1: Finite Automata and Regular Expressions Haniel Barbosahomepage.divms.uiowa.edu/~hbarbosa/teaching/cs4330/... · Regular Expressions A regular expression (RE in short) is a string

CS:4330 Theory of ComputationSpring 2018

Regular LanguagesFinite Automata and Regular Expressions

Haniel Barbosa

Page 2: Finite Automata and Regular Expressions Haniel Barbosahomepage.divms.uiowa.edu/~hbarbosa/teaching/cs4330/... · Regular Expressions A regular expression (RE in short) is a string

Readings for this lecture

Chapter 1 of [Sipser 1996], 3rd edition. Sections 1.1 and 1.3.

Page 3: Finite Automata and Regular Expressions Haniel Barbosahomepage.divms.uiowa.edu/~hbarbosa/teaching/cs4330/... · Regular Expressions A regular expression (RE in short) is a string

Abstraction of Problems

B Data: abstracted as a word in a given alphabet

I Σ: alphabet, a finite, non-empty set of symbolsI Σ∗: all the words of finite length built up using Σ

B Conditions: abstracted as a set of words, called language

I Any subset L⊆ Σ∗

B Unknown: Implicitly a Boolean variable: true if a word is in the language,false otherwise

I Given w ∈ Σ∗ and L⊆ Σ∗, does w ∈ L?

1 / 14

Page 4: Finite Automata and Regular Expressions Haniel Barbosahomepage.divms.uiowa.edu/~hbarbosa/teaching/cs4330/... · Regular Expressions A regular expression (RE in short) is a string

Finite Automata

B The simplest computational model is called a finite state machine or a finiteautomaton

B Representations:

I GraphicalI TabularI Mathematical

2 / 14

Page 5: Finite Automata and Regular Expressions Haniel Barbosahomepage.divms.uiowa.edu/~hbarbosa/teaching/cs4330/... · Regular Expressions A regular expression (RE in short) is a string

Computation of a Finite Automaton

B The automaton receives the input symbols one by one from left to right,changing the “active” state

B After reading each symbol, the “active state” moves from one state toanother along the transition that has that symbol as its label

B When the last symbol of the input is read the automaton produces theoutput: accept if the “active state” is in an accept state, or reject otherwise

3 / 14

Page 6: Finite Automata and Regular Expressions Haniel Barbosahomepage.divms.uiowa.edu/~hbarbosa/teaching/cs4330/... · Regular Expressions A regular expression (RE in short) is a string

Applications

B Finite automata are popular in parser construction of compilers

B Finite automata and their probabilistic counterparts, Markov chains, areuseful tools for pattern recognitionExample: speech processing and optical character recognition

B Markov chains have been used to model and predict price changes infinancial applications

4 / 14

Page 7: Finite Automata and Regular Expressions Haniel Barbosahomepage.divms.uiowa.edu/~hbarbosa/teaching/cs4330/... · Regular Expressions A regular expression (RE in short) is a string

Formal Definition of Finite Automata

A finite automaton is a 5-tuple (Q, Σ, δ, q0, F) in which:

B Q is a finite set called the states

B Σ is a finite set called the alphabet

B δ : Q×Σ→ Q is the transition function

B q0 ∈ Q is the start state, also called initial stat

B F ⊆ Q is the set of accepted states, also called the final states

5 / 14

Page 8: Finite Automata and Regular Expressions Haniel Barbosahomepage.divms.uiowa.edu/~hbarbosa/teaching/cs4330/... · Regular Expressions A regular expression (RE in short) is a string

Language Recognized by an Automaton

B If L is the set of all strings that a finite automaton M accepts, we say that Lis the language of the machine M and write L (M) = L

B An automaton may accept several strings, but it always recognizes only onelanguage

B If a machine accepts no strings, it still recognizes one language, namely theempty language ∅

6 / 14

Page 9: Finite Automata and Regular Expressions Haniel Barbosahomepage.divms.uiowa.edu/~hbarbosa/teaching/cs4330/... · Regular Expressions A regular expression (RE in short) is a string

Formal Definition of Acceptance

Let M = (Q, Σ, δ, q0, F) be a finite automaton and w = a0 . . .an be a string over Σ

Then M accepts w if a sequence of states r0, . . . , rn exists in Q such that:

1. r0 = q0

2. δ(ri, ai+1) = ri+1 for i = 0,1, . . . ,n−1

3. rn ∈ F

Condition (1) says where the machine starts

Condition (2) says that the machine goes from state to state according to its transitionfunction δ

Condition (3) says when the machine accepts its input: if it ends up in an accept state

7 / 14

Page 10: Finite Automata and Regular Expressions Haniel Barbosahomepage.divms.uiowa.edu/~hbarbosa/teaching/cs4330/... · Regular Expressions A regular expression (RE in short) is a string

Regular Languages

We say that a finite automaton M recognizes the language L ifL = {w |M accepts w}

Definition: A language is called regular language if there exists a finiteautomaton that recognizes it.

8 / 14

Page 11: Finite Automata and Regular Expressions Haniel Barbosahomepage.divms.uiowa.edu/~hbarbosa/teaching/cs4330/... · Regular Expressions A regular expression (RE in short) is a string

Designing Finite Automata

B Whether it be of automaton or artwork, design is a creative process.Consequently it cannot be reduced to a simple recipe or formula.

B The approach:

I Identify the finite pieces of information you need to solve the problem.These are the states

I Identify the condition (alphabet) to change from one state to anotherI Identify the start and final statesI Add missing transitions

9 / 14

Page 12: Finite Automata and Regular Expressions Haniel Barbosahomepage.divms.uiowa.edu/~hbarbosa/teaching/cs4330/... · Regular Expressions A regular expression (RE in short) is a string

Operations on Regular Languages

Let A and B be languages. We define regular operations union, concatenation,and start as follows

B Union: A∪B = {x | x ∈ A∨ x ∈ B}B Concatenation: A◦B = {xy | x ∈ A∧ y ∈ B}B Star: A∗ = {x1 . . .xk | k ≥ 0∧ xi ∈ A, 0≤ i≤ k}

Note:

1. ε ∈ A∗, no matter what A is

2. A+ denotes A◦A∗

10 / 14

Page 13: Finite Automata and Regular Expressions Haniel Barbosahomepage.divms.uiowa.edu/~hbarbosa/teaching/cs4330/... · Regular Expressions A regular expression (RE in short) is a string

Regular Expressions

A regular expression (RE in short) is a string of symbols that describes a regularlanguage.

B Three base cases:

I For any a ∈ Σ, a is a regular expression denoting the language {a}I ε is a regular expression denoting the language {ε}I ∅ is a regular expression denoting the language ∅;

B Three recursive cases: If r1 and r2 are regular expressions denotinglanguages L1 and L2, respectively, then

I Union: r1∪ r2 denotes L1∪L2I Concatenation: r1r2 denotes L1 ◦L2I Star: r∗1 denotes L∗1

11 / 14

Page 14: Finite Automata and Regular Expressions Haniel Barbosahomepage.divms.uiowa.edu/~hbarbosa/teaching/cs4330/... · Regular Expressions A regular expression (RE in short) is a string

Some useful notation

Let r be a regular expression:

B The string r+ represents rr∗, and it also holds that r+∪{ε}= r∗

B The string rk represents rr . . .r︸ ︷︷ ︸k times

B Recall that the symbol Σ represents the alphabet {a1, . . . , ak}

B As with automata, the language represented by r is denoted L (r)

12 / 14

Page 15: Finite Automata and Regular Expressions Haniel Barbosahomepage.divms.uiowa.edu/~hbarbosa/teaching/cs4330/... · Regular Expressions A regular expression (RE in short) is a string

Precedence Rules

B The star (∗) operation has the highest precedence

B The concatenation (◦) operation is second on precedence order

B The union (∪) or (+) operation is the least preferred

B Parenthesis can be omitted using these rules

13 / 14

Page 16: Finite Automata and Regular Expressions Haniel Barbosahomepage.divms.uiowa.edu/~hbarbosa/teaching/cs4330/... · Regular Expressions A regular expression (RE in short) is a string

Examples

0∗10∗ =

{w | w contains a single 1}Σ∗1Σ∗ = {w | w has at least a single 1}Σ∗(101)Σ∗ = {w | w contains 101 as a substring}1∗(01+)∗ = {w | every 0 in w is followed by at least a single 1}(ΣΣ)∗ = {w | w is of even length}

0Σ∗0∪1Σ∗1∪0∪1 − all words starting and ending with the same letter(0∪ ε)1∗ = 01∗∪1∗ − all strings of forms 1,11,1 . . .1 and 0,1,11,1 . . .1R∅ − ∅∅∗ − {ε}

14 / 14

Page 17: Finite Automata and Regular Expressions Haniel Barbosahomepage.divms.uiowa.edu/~hbarbosa/teaching/cs4330/... · Regular Expressions A regular expression (RE in short) is a string

Examples

0∗10∗ = {w | w contains a single 1}Σ∗1Σ∗ =

{w | w has at least a single 1}Σ∗(101)Σ∗ = {w | w contains 101 as a substring}1∗(01+)∗ = {w | every 0 in w is followed by at least a single 1}(ΣΣ)∗ = {w | w is of even length}

0Σ∗0∪1Σ∗1∪0∪1 − all words starting and ending with the same letter(0∪ ε)1∗ = 01∗∪1∗ − all strings of forms 1,11,1 . . .1 and 0,1,11,1 . . .1R∅ − ∅∅∗ − {ε}

14 / 14

Page 18: Finite Automata and Regular Expressions Haniel Barbosahomepage.divms.uiowa.edu/~hbarbosa/teaching/cs4330/... · Regular Expressions A regular expression (RE in short) is a string

Examples

0∗10∗ = {w | w contains a single 1}Σ∗1Σ∗ = {w | w has at least a single 1}Σ∗(101)Σ∗ =

{w | w contains 101 as a substring}1∗(01+)∗ = {w | every 0 in w is followed by at least a single 1}(ΣΣ)∗ = {w | w is of even length}

0Σ∗0∪1Σ∗1∪0∪1 − all words starting and ending with the same letter(0∪ ε)1∗ = 01∗∪1∗ − all strings of forms 1,11,1 . . .1 and 0,1,11,1 . . .1R∅ − ∅∅∗ − {ε}

14 / 14

Page 19: Finite Automata and Regular Expressions Haniel Barbosahomepage.divms.uiowa.edu/~hbarbosa/teaching/cs4330/... · Regular Expressions A regular expression (RE in short) is a string

Examples

0∗10∗ = {w | w contains a single 1}Σ∗1Σ∗ = {w | w has at least a single 1}Σ∗(101)Σ∗ = {w | w contains 101 as a substring}1∗(01+)∗ =

{w | every 0 in w is followed by at least a single 1}(ΣΣ)∗ = {w | w is of even length}

0Σ∗0∪1Σ∗1∪0∪1 − all words starting and ending with the same letter(0∪ ε)1∗ = 01∗∪1∗ − all strings of forms 1,11,1 . . .1 and 0,1,11,1 . . .1R∅ − ∅∅∗ − {ε}

14 / 14

Page 20: Finite Automata and Regular Expressions Haniel Barbosahomepage.divms.uiowa.edu/~hbarbosa/teaching/cs4330/... · Regular Expressions A regular expression (RE in short) is a string

Examples

0∗10∗ = {w | w contains a single 1}Σ∗1Σ∗ = {w | w has at least a single 1}Σ∗(101)Σ∗ = {w | w contains 101 as a substring}1∗(01+)∗ = {w | every 0 in w is followed by at least a single 1}(ΣΣ)∗ =

{w | w is of even length}

0Σ∗0∪1Σ∗1∪0∪1 − all words starting and ending with the same letter(0∪ ε)1∗ = 01∗∪1∗ − all strings of forms 1,11,1 . . .1 and 0,1,11,1 . . .1R∅ − ∅∅∗ − {ε}

14 / 14

Page 21: Finite Automata and Regular Expressions Haniel Barbosahomepage.divms.uiowa.edu/~hbarbosa/teaching/cs4330/... · Regular Expressions A regular expression (RE in short) is a string

Examples

0∗10∗ = {w | w contains a single 1}Σ∗1Σ∗ = {w | w has at least a single 1}Σ∗(101)Σ∗ = {w | w contains 101 as a substring}1∗(01+)∗ = {w | every 0 in w is followed by at least a single 1}(ΣΣ)∗ = {w | w is of even length}

0Σ∗0∪1Σ∗1∪0∪1 −

all words starting and ending with the same letter(0∪ ε)1∗ = 01∗∪1∗ − all strings of forms 1,11,1 . . .1 and 0,1,11,1 . . .1R∅ − ∅∅∗ − {ε}

14 / 14

Page 22: Finite Automata and Regular Expressions Haniel Barbosahomepage.divms.uiowa.edu/~hbarbosa/teaching/cs4330/... · Regular Expressions A regular expression (RE in short) is a string

Examples

0∗10∗ = {w | w contains a single 1}Σ∗1Σ∗ = {w | w has at least a single 1}Σ∗(101)Σ∗ = {w | w contains 101 as a substring}1∗(01+)∗ = {w | every 0 in w is followed by at least a single 1}(ΣΣ)∗ = {w | w is of even length}

0Σ∗0∪1Σ∗1∪0∪1 − all words starting and ending with the same letter(0∪ ε)1∗ = 01∗∪1∗ −

all strings of forms 1,11,1 . . .1 and 0,1,11,1 . . .1R∅ − ∅∅∗ − {ε}

14 / 14

Page 23: Finite Automata and Regular Expressions Haniel Barbosahomepage.divms.uiowa.edu/~hbarbosa/teaching/cs4330/... · Regular Expressions A regular expression (RE in short) is a string

Examples

0∗10∗ = {w | w contains a single 1}Σ∗1Σ∗ = {w | w has at least a single 1}Σ∗(101)Σ∗ = {w | w contains 101 as a substring}1∗(01+)∗ = {w | every 0 in w is followed by at least a single 1}(ΣΣ)∗ = {w | w is of even length}

0Σ∗0∪1Σ∗1∪0∪1 − all words starting and ending with the same letter(0∪ ε)1∗ = 01∗∪1∗ − all strings of forms 1,11,1 . . .1 and 0,1,11,1 . . .1R∅ −

∅∅∗ − {ε}

14 / 14

Page 24: Finite Automata and Regular Expressions Haniel Barbosahomepage.divms.uiowa.edu/~hbarbosa/teaching/cs4330/... · Regular Expressions A regular expression (RE in short) is a string

Examples

0∗10∗ = {w | w contains a single 1}Σ∗1Σ∗ = {w | w has at least a single 1}Σ∗(101)Σ∗ = {w | w contains 101 as a substring}1∗(01+)∗ = {w | every 0 in w is followed by at least a single 1}(ΣΣ)∗ = {w | w is of even length}

0Σ∗0∪1Σ∗1∪0∪1 − all words starting and ending with the same letter(0∪ ε)1∗ = 01∗∪1∗ − all strings of forms 1,11,1 . . .1 and 0,1,11,1 . . .1R∅ − ∅∅∗ −

{ε}

14 / 14

Page 25: Finite Automata and Regular Expressions Haniel Barbosahomepage.divms.uiowa.edu/~hbarbosa/teaching/cs4330/... · Regular Expressions A regular expression (RE in short) is a string

Examples

0∗10∗ = {w | w contains a single 1}Σ∗1Σ∗ = {w | w has at least a single 1}Σ∗(101)Σ∗ = {w | w contains 101 as a substring}1∗(01+)∗ = {w | every 0 in w is followed by at least a single 1}(ΣΣ)∗ = {w | w is of even length}

0Σ∗0∪1Σ∗1∪0∪1 − all words starting and ending with the same letter(0∪ ε)1∗ = 01∗∪1∗ − all strings of forms 1,11,1 . . .1 and 0,1,11,1 . . .1R∅ − ∅∅∗ − {ε}

14 / 14