     • date post

23-Aug-2020
• Category

## Documents

• view

0

0

Embed Size (px)

### Transcript of Finite Automata and Regular Expressions Haniel hbarbosa/teaching/cs4330/...آ  Regular Expressions...

• CS:4330 Theory of Computation Spring 2018

Regular Languages Finite Automata and Regular Expressions

Haniel Barbosa

Chapter 1 of [Sipser 1996], 3rd edition. Sections 1.1 and 1.3.

• Abstraction of Problems

B Data: abstracted as a word in a given alphabet

I Σ: alphabet, a finite, non-empty set of symbols I Σ∗: all the words of finite length built up using Σ

B Conditions: abstracted as a set of words, called language

I Any subset L⊆ Σ∗

B Unknown: Implicitly a Boolean variable: true if a word is in the language, false otherwise

I Given w ∈ Σ∗ and L⊆ Σ∗, does w ∈ L?

1 / 14

• Finite Automata

B The simplest computational model is called a finite state machine or a finite automaton

B Representations:

I Graphical I Tabular I Mathematical

2 / 14

• Computation of a Finite Automaton

B The automaton receives the input symbols one by one from left to right, changing the “active” state

B After reading each symbol, the “active state” moves from one state to another along the transition that has that symbol as its label

B When the last symbol of the input is read the automaton produces the output: accept if the “active state” is in an accept state, or reject otherwise

3 / 14

• Applications

B Finite automata are popular in parser construction of compilers

B Finite automata and their probabilistic counterparts, Markov chains, are useful tools for pattern recognition Example: speech processing and optical character recognition

B Markov chains have been used to model and predict price changes in financial applications

4 / 14

• Formal Definition of Finite Automata

A finite automaton is a 5-tuple (Q, Σ, δ, q0, F) in which: B Q is a finite set called the states

B Σ is a finite set called the alphabet B δ : Q×Σ→ Q is the transition function B q0 ∈ Q is the start state, also called initial stat B F ⊆ Q is the set of accepted states, also called the final states

5 / 14

• Language Recognized by an Automaton

B If L is the set of all strings that a finite automaton M accepts, we say that L is the language of the machine M and write L (M) = L

B An automaton may accept several strings, but it always recognizes only one language

B If a machine accepts no strings, it still recognizes one language, namely the empty language ∅

6 / 14

• Formal Definition of Acceptance

Let M = (Q, Σ, δ, q0, F) be a finite automaton and w = a0 . . .an be a string over Σ

Then M accepts w if a sequence of states r0, . . . , rn exists in Q such that:

1. r0 = q0 2. δ(ri, ai+1) = ri+1 for i = 0,1, . . . ,n−1 3. rn ∈ F

Condition (1) says where the machine starts

Condition (2) says that the machine goes from state to state according to its transition function δ

Condition (3) says when the machine accepts its input: if it ends up in an accept state

7 / 14

• Regular Languages

We say that a finite automaton M recognizes the language L if L = {w |M accepts w}

Definition: A language is called regular language if there exists a finite automaton that recognizes it.

8 / 14

• Designing Finite Automata

B Whether it be of automaton or artwork, design is a creative process. Consequently it cannot be reduced to a simple recipe or formula.

B The approach:

I Identify the finite pieces of information you need to solve the problem. These are the states

I Identify the condition (alphabet) to change from one state to another I Identify the start and final states I Add missing transitions

9 / 14

• Operations on Regular Languages

Let A and B be languages. We define regular operations union, concatenation, and start as follows

B Union: A∪B = {x | x ∈ A∨ x ∈ B} B Concatenation: A◦B = {xy | x ∈ A∧ y ∈ B} B Star: A∗ = {x1 . . .xk | k ≥ 0∧ xi ∈ A, 0≤ i≤ k}

Note:

1. � ∈ A∗, no matter what A is 2. A+ denotes A◦A∗

10 / 14

• Regular Expressions

A regular expression (RE in short) is a string of symbols that describes a regular language.

B Three base cases:

I For any a ∈ Σ, a is a regular expression denoting the language {a} I � is a regular expression denoting the language {�} I ∅ is a regular expression denoting the language ∅;

B Three recursive cases: If r1 and r2 are regular expressions denoting languages L1 and L2, respectively, then I Union: r1∪ r2 denotes L1∪L2 I Concatenation: r1r2 denotes L1 ◦L2 I Star: r∗1 denotes L

∗ 1

11 / 14

• Some useful notation

Let r be a regular expression:

B The string r+ represents rr∗, and it also holds that r+∪{�}= r∗

B The string rk represents rr . . .r︸ ︷︷ ︸ k times

B Recall that the symbol Σ represents the alphabet {a1, . . . , ak}

B As with automata, the language represented by r is denoted L (r)

12 / 14

• Precedence Rules

B The star (∗) operation has the highest precedence

B The concatenation (◦) operation is second on precedence order

B The union (∪) or (+) operation is the least preferred

B Parenthesis can be omitted using these rules

13 / 14

• Examples

0∗10∗ =

{w | w contains a single 1} Σ∗1Σ∗ = {w | w has at least a single 1} Σ∗(101)Σ∗ = {w | w contains 101 as a substring} 1∗(01+)∗ = {w | every 0 in w is followed by at least a single 1} (ΣΣ)∗ = {w | w is of even length}

0Σ∗0∪1Σ∗1∪0∪1 − all words starting and ending with the same letter (0∪ �)1∗ = 01∗∪1∗ − all strings of forms 1,11,1 . . .1 and 0,1,11,1 . . .1 R∅ − ∅ ∅∗ − {�}

14 / 14

• Examples

0∗10∗ = {w | w contains a single 1} Σ∗1Σ∗ =

{w | w has at least a single 1} Σ∗(101)Σ∗ = {w | w contains 101 as a substring} 1∗(01+)∗ = {w | every 0 in w is followed by at least a single 1} (ΣΣ)∗ = {w | w is of even length}

0Σ∗0∪1Σ∗1∪0∪1 − all words starting and ending with the same letter (0∪ �)1∗ = 01∗∪1∗ − all strings of forms 1,11,1 . . .1 and 0,1,11,1 . . .1 R∅ − ∅ ∅∗ − {�}

14 / 14

• Examples

0∗10∗ = {w | w contains a single 1} Σ∗1Σ∗ = {w | w has at least a single 1} Σ∗(101)Σ∗ =

{w | w contains 101 as a substring} 1∗(01+)∗ = {w | every 0 in w is followed by at least a single 1} (ΣΣ)∗ = {w | w is of even length}

0Σ∗0∪1Σ∗1∪0∪1 − all words starting and ending with the same letter (0∪ �)1∗ = 01∗∪1∗ − all strings of forms 1,11,1 . . .1 and 0,1,11,1 . . .1 R∅ − ∅ ∅∗ − {�}

14 / 14

• Examples

0∗10∗ = {w | w contains a single 1} Σ∗1Σ∗ = {w | w has at least a single 1} Σ∗(101)Σ∗ = {w | w contains 101 as a substring} 1∗(01+)∗ =

{w | every 0 in w is followed by at least a single 1} (ΣΣ)∗ = {w | w is of even length}

0Σ∗0∪1Σ∗1∪0∪1 − all words starting and ending with the same letter (0∪ �)1∗ = 01∗∪1∗ − all strings of forms 1,11,1 . . .1 and 0,1,11,1 . . .1 R∅ − ∅ ∅∗ − {�}

14 / 14

• Examples

0∗10∗ = {w | w contains a single 1} Σ∗1Σ∗ = {w | w has at least a single 1} Σ∗(101)Σ∗ = {w | w contains 101 as a substring} 1∗(01+)∗ = {w | every 0 in w is followed by at least a single 1} (ΣΣ)∗ =

{w | w is of even length}

0Σ∗0∪1Σ∗1∪0∪1 − all words starting and ending with the same letter (0∪ �)1∗ = 01∗∪1∗ − all strings of forms 1,11,1 . . .1 and 0,1,11,1 . . .1 R∅ − ∅ ∅∗ − {�}

14 / 14

• Examples

0∗10∗ = {w | w contains a single 1} Σ∗1Σ∗ = {w | w has at least a single 1} Σ∗(101)Σ∗ = {w | w contains 101 as a substring} 1∗(01+)∗ = {w | every 0 in w is followed by at least a single 1} (ΣΣ)∗ = {w | w is of even length}

0Σ∗0∪1Σ∗1∪0∪1 −

all words starting and ending with the same letter (0∪ �)1∗ = 01∗∪1∗ − all strings of forms 1,11,1 . . .1 and 0,1,11,1 . . .1 R∅ − ∅ ∅∗ − {�}

14 / 14

• Examples

0∗10∗ = {w | w contains a single 1} Σ∗1Σ∗ = {w | w has at least a single 1} Σ∗(101)Σ∗ = {w | w contains 101 as a substring} 1∗(01+)∗ = {w | every 0 in w is followed by at least a single 1} (ΣΣ)∗ = {w | w is of even length}

0Σ∗0∪1Σ∗1∪0∪1 − all words starting and ending with the same letter (0∪ �)1∗ = 01∗∪1∗ −

all strings of forms 1,11,1 . . .1 and 0,1,11,1 . . .1 R∅ − ∅ ∅∗ − {�}

14 / 14

• Examples

0∗10∗ = {w | w contains a single 1} Σ∗1Σ∗ = {w | w has at least a single 1} Σ∗(101)Σ∗ = {w | w contains 101 as a substring} 1∗(01+)∗ = {w | every 0 in w is followed by at least a single 1} (ΣΣ)∗ = {w | w is of even length}

0Σ∗0∪1Σ∗1∪0∪1 − all words starting and ending with the same letter (0∪ �)1∗ = 01∗∪1∗ − all strings of forms 1,11,1 . . .1 and 0,1,11,1 . . .1 R∅ −

∅ ∅∗ − {�}

14 / 14