Uses of finite automata - University of Bathag248/CM10020/Automata.pdfWe shall study finite...

32
Automata . – p.26/39 Automata in computer science In computer science: automaton = abstract computing device, or “machine” . – p.27/39 Automata in this lecture Turing machines (1937) and abacus machines (1960s): have all capabilities of today’s computers. Used to study the boundary between computable and uncomputable. Finite automata (also called finite state machines, emerged during the 1940’s and 1950’s): useful e.g. text search, protocol verification, compilers, descriptions of certain formal grammars (N. Chomsky, 1950’s). . – p.28/39 Finite automata We shall study finite automata first, because they can be seen as a first step towards Turing ma- chines and abacus machines. . – p.29/39 Uses of finite automata Used in software for verifying all kinds of systems with a finite number of states, such as communication protocols Used in software for scanning text, to find certain patterns Used in “Lexical analyzers” of compilers (to turn program text into “tokens”, e.g. identifiers, keywords, brackets, punctuation) Part of Turing machines and abacus machines . – p.30/39 Example: comm. protocol Store Customer Bank redeem ship cancel pay transfer Customer, Store, and Bank will be finite automata. . – p.31/39 Communication protocol a b d f c g 2 1 3 4 pay redeem transfer e redeem transfer ship ship ship cancel redeem transfer Start Start Start Store Bank Customer pay, cancel . – p.32/39 Simulating the whole system Idea: running Customer, Store, and Bank “in parallel”. Initially, each automaton is in its start position. The system can move on for every action that is possible in each of the three automata. . – p.33/39 1

Transcript of Uses of finite automata - University of Bathag248/CM10020/Automata.pdfWe shall study finite...

Page 1: Uses of finite automata - University of Bathag248/CM10020/Automata.pdfWe shall study finite automata first, ... exactly one) following state, δ ... This DFA accepts 0011, but not

Automata

. – p.26/39

Automata incomputer science

In computer science:

automaton = abstract computing device, or“machine”

. – p.27/39

Automata in thislecture

Turing machines (1937) and abacusmachines (1960s): have all capabilities oftoday’s computers. Used to study theboundary between computable anduncomputable.Finite automata (also called finite statemachines, emerged during the 1940’s and1950’s): useful e.g. text search, protocolverification, compilers, descriptions of certainformal grammars (N. Chomsky, 1950’s).

. – p.28/39

Finite automataWe shall study finite automata first, because theycan be seen as a first step towards Turing ma-chines and abacus machines.

. – p.29/39

Uses of finiteautomata

Used in software for verifying all kinds ofsystems with a finite number of states, suchas communication protocols

Used in software for scanning text, to findcertain patterns

Used in “Lexical analyzers” of compilers (toturn program text into “tokens”, e.g.identifiers, keywords, brackets, punctuation)

Part of Turing machines and abacusmachines

. – p.30/39

Example: comm.protocol

Store

Customer Bank

redeem

ship

cancel

pay

transfer

Customer, Store, and Bank will be finite automata.

. – p.31/39

Communicationprotocol

a b d f

c g

2

1 3 4

pay redeem transfer

eredeem transfer

ship ship ship

cancel

redeem transferStart Start

Start

Store

BankCustomer

pay, cancel

. – p.32/39

Simulating the wholesystem

Idea: running Customer, Store, and Bank “inparallel”.

Initially, each automaton is in its start position.

The system can move on for every action thatis possible in each of the three automata.

. – p.33/39

1

Page 2: Uses of finite automata - University of Bathag248/CM10020/Automata.pdfWe shall study finite automata first, ... exactly one) following state, δ ... This DFA accepts 0011, but not

The missingirrelevant actions

Problem: Bank gets stuck during the payaction, although paying is only betweenCustomer and Store.

Solution: we need to add a loop labeled “pay”to state 1 of Bank.

More generally, we need loops for all suchirrelevant actions.

But illegal actions should remain impossible.E.g. Bank should not allow “redeem” after“cancel”.

. – p.34/39

Adding irrelevantactions

a b d f

c g

2

1 3 4

pay redeem transfer

eredeem transfer

ship ship ship

cancel

redeem transferStart Start

Start

Store

BankCustomer

pay, cancel

. – p.35/39

Adding irrelevantactions

a b d f

c g

2

1 3 4

pay redeem transfer

eredeem transfer

ship ship ship

cancel

redeem transferStart Start

Start

Store

BankCustomer

cancel pay, cancel pay, cancel pay, cancel

pay, cancel pay, cancel pay, cancel

ship, redeem, transfer,pay, cancel

pay,ship

pay, ship

pay, redeem,cancel, ship

pay, redeem,cancel, ship

. – p.35/39

Simulating the wholesystem

Simulation by product automaton.

Its states are pairs (StoreState,BankState),e.g. (a,1) or (c,3). (Because Customer hasonly one state and allows every action, it canbe neglected.)

It has a transition(StoreState,BankState)

action� (StoreState′, BankState′)whenever Store has a transitionStoreState

action� StoreState′ and Bank has atransition BankState

action� BankState′.

. – p.36/39

Product automaton

p s s s

c c c c c c

p s s s

p ss

s

p s s s

r r

ttr r

rr

a b c d e f g

1

2

3

4

p

pppp

ppppp

pp

c

c

p,cp,cp,cp,cp,c

p,c

p,cp,cp,cp,cp,cp,c

Start

c

. – p.37/39

Without unreachablestates

p s

c c

p s

s

s

tt

rr

a b c d e f g

1

2

3

4

pp

pp

p,cp,c

p,cp,c

Start

c

. – p.38/39

Usefulness forprotocol verification

We can now answer all kinds of interestingquestions, e.g. “Can it happen that Storeships the product and never receives themoney transfer?”

Yes! If Customer has indicated to pay, butsent a cancellation message to the Bank, weare in state (b,2). If Store ships then, wemake a transition into (c,2), and the Store willnever receive a money transfer!

So store should never ship before redeeming.

. – p.39/39

Automata incomputer science

In computer science, an automaton is anabstract computing machine.

“Abstract” means here that it need not exist inphysical form, but only as aprecisely-described idea.

. – p.8/37

2

Page 3: Uses of finite automata - University of Bathag248/CM10020/Automata.pdfWe shall study finite automata first, ... exactly one) following state, δ ... This DFA accepts 0011, but not

We saw an example of a communicationprotocol that we could simulate with finitestate automaton.

We drew the finite state automaton: circlesrepresented states, labelled arrowsrepresented transitions. We labelled a startstate and we said that a machine would “die”if there was no transition to follow from thecurrent state to another.

We are going to give the formal definition of adeterministic finite state automaton.

. – p.9/37

Formal definition ofDFA’s

Definition. A deterministic finite automaton(DFA) consists of

a finite set of states, often denoted Q,

a finite set Σ of input symbols,a total transition function δ : Q × Σ → Q,

a start state q0 ∈ Q, and

a set F ⊆ Q of final or accepting states.

Remark: we require the transition function to betotal, but some people allow it to be partial. . – p.10/37

Terminology andintuitions

The transition graph we used before is aninformal presentation of the transition function

δ. We have q'qa

if δ(q, a) = q′.

“Deterministic” means that for every state qand input symbol a, there is a unique (i.e.exactly one) following state, δ(q, a).

Later, we shall also see non-deterministicfinite automata (NFA’s), where (q, a) canhave any number of following states.

FA’s are also called “finite state machines”.. – p.11/37

Useful notations forDFA’s

Transition graph, like that for Customer, Store,or Bank.

Transition table, which is a tabular listing ofthe δ function.

. – p.12/37

Transition graph:example

q3q2

q1q0

1

1

0 0 0

1

1

0

Start

Q = {q0, q1, q2, q3}, Σ = {0, 1}, δ(q3, 0) = q1 . . . ,F = {q0}.

. – p.13/37

Meaning of thetransition graph

The nodes of the graph are the states.

The labels of the arrows are input symbols.

The labeled arrows describe the transitionfunction.

The node labeled “Start” is the start state q0.

The states with double circles are the finalstates.

. – p.14/37

Transition table:example

0 1

∗ → q0 q2 q1

q1 q3 q0

q2 q0 q3

q3 q1 q2

This is the transition table for the transition graphgiven above.

. – p.15/37

Meaning of thetransition table

The symbols in the leftmost column are thestates.

The symbols in the top row are the inputsymbols.

The symbols “inside” the table describe thetransition function.

The arrow in the leftmost column marks thestart symbol.

The symbol ∗ in the leftmost column marksthe final states.

. – p.16/37

3

Page 4: Uses of finite automata - University of Bathag248/CM10020/Automata.pdfWe shall study finite automata first, ... exactly one) following state, δ ... This DFA accepts 0011, but not

How a DFAprocesses strings

Let a1a2 · · · an be a string of input symbols.

Initially, the DFA is in its start state q0.

Let q be the state reached after the first isymbols a1a2 · · · ai of the input string. Uponreading the next symbol ai+i, the DFA makesa transition into the new state δ(q, ai+1).

Repeated until the last symbol an.

The DFA said to accept the input string if thestate reached after the last symbol an is in theset F of final states.

. – p.17/37

Accepted strings:example

q3q2

q1q0

1

1

0 0 0

1

1

0

Start

This DFA accepts 1010, but not 1110

It accepts those strings that have an evennumber of 0’s and an even number of 1’s.

Therefore, we call this DFA “parity checker”.. – p.18/37Formal approach to

accepted strings

We define the extended transition function δ̂. Ittakes a state q and an input string w to theresulting state. The definition proceeds byinduction over the length of w.

Induction basis (w has length 0): in this case,w is the empty string, i.e. the string of length0, for which we write ε. We define

δ̂(q, ε) = q.

. – p.19/37

Formal approach toaccepted strings

Induction step (from length l to length l + 1):in this case, w, which has length l + 1, is ofthe form va, where v is a string of length l anda is a symbol. We define

δ̂(q, va) = δ(δ̂(q, v), a).

This works because, by induction hypothesis,δ̂(q, v) is already defined.

. – p.20/37

The language of aDFA

Intuitively, the language of a DFA A is the setof strings w that take the start state to one ofthe accepting states.

Formally, the language L(A) accepted by theDFA A is defined as follows:

L(A) = {w | δ̂(q0, w) ∈ F}.

. – p.21/37

Transition graph:example

q0

q1

q2

Start 0

1 0

1

1

0

Q = {q0, q1, q2}, Σ = {0, 1}, δ(q0, 0) = q1 . . . , F =

{q2}.

. – p.22/37

Transition table:example

0 1

→ q0 q1 q0

q1 q1 q2

∗q2 q2 q2

This is the transition table for the same automa-ton. Let us call this machine A. What languagedoes this automaton accept?

. – p.23/37

Accepted strings:example

q0

q1

q2

Start 0

1 0

1

1

0

This DFA accepts 0011, but not 1110 forinstance. Try some other strings.

To find out what language L(A) is acceptedby A you need to work out what strings A willaccept. Try and describe L(A) in English.

. – p.24/37

4

Page 5: Uses of finite automata - University of Bathag248/CM10020/Automata.pdfWe shall study finite automata first, ... exactly one) following state, δ ... This DFA accepts 0011, but not

ExercisesGive DFA’s accepting the following languagesover the alphabet {0, 1}. (Note that you canchoose between giving a transition table, atransition graph, or a formal presentation of Q, Σ,q0, δ, and F .)

1. The set of all strings ending in 00.

2. The set of all strings with two consecutive 0’s(not necessarily at the end).

3. The set of strings with 011 as a substring.

. – p.25/37

ExerciseFor the alphabet {a, b, c}, give a DFA accepting allstrings that have abc as a substring.

. – p.26/37

Exercises(More advanced; do not worry if you need tutor’shelp to solve this.) Give DFA’s accepting thefollowing languages over the alphabet {0, 1}.

1. The set of all strings such that each block offive consecutive symbols contains at leasttwo 0’s.

2. The set of all strings whose tenth symbolfrom the right is a 1.

3. The set of strings such that the number of 0 isdivisible by five, and the number of 1’s isdivisible by three.

. – p.27/37

ExerciseConsider the DFA with the following transitiontable:

0 1

→ A A B

∗B B A

(1) Informally describe the language accepted bythis DFA; (2) prove by induction on the length ofan input string that your description is correct.(Don’t worry if you need tutor’s help for (2).)

. – p.28/37

Non-deterministic FA(NFA)

An NFA is like a DFA, except that it can be inseveral states at once.

This can be seen as the ability to guesssomething about the input.

Useful for searching texts.

. – p.29/37

NFA: example

An NFA accepting all strings that end in 01:

1q1q00

0,1

q2Start

It is non-deterministic because input 0 in state q0

can lead to both q0 and q1.

. – p.30/37

Using the NFA

1q1q00

0,1

q2Start

Suppose the input string is 100101. The NFAstarts in state q0, as indicated by the token.

. – p.31/37

Using the NFA

1q1q00

0,1

q2Start

The remaining input string is 100101. The NFAreads the first symbol, 1. It remains in state q0.

. – p.32/37

5

Page 6: Uses of finite automata - University of Bathag248/CM10020/Automata.pdfWe shall study finite automata first, ... exactly one) following state, δ ... This DFA accepts 0011, but not

Using the NFA

1q1q00

0,1

q2Start

The remaining input string is 00101. The NFAreads the next symbol, 0. The resulting possiblestates are q0 or q1.

. – p.33/37

Using the NFA

1q1q00

0,1

q2Start

The remaining input string is 0101. The NFA readsthe next symbol, 0. The resulting possible statesare still q0 or q1.

. – p.34/37

Using the NFA

1q1q00

0,1

q2Start

The remaining input string is 101. The NFA readsthe next symbol, 1. The resulting possible statesare q0 and q2. (Because q2 is a final states, thismeans that the word so far, 1001, would be ac-cepted.)

. – p.35/37

Using the NFA

1q1q00

0,1

q2Start

The remaining input string is 01. The NFA readsthe next symbol, 0. There is no transition for 0

from q2, so the token on q2 dies. The resultingpossible states are q0 or q1.

. – p.36/37

Using the NFA

1q1q00

0,1

q2Start

The remaining input string is 1. The NFA readsthe next symbol, 1. The possible states are q0 andq2. Because q2 is final, the NFA accepts the word,100101.

. – p.37/37

Deterministic Finite StateAutomaton (DFA).

. – p.1/39

Formal definition ofDFA’s

We saw the formal definition of a DFA:Definition. A deterministic finite automaton(DFA) consists of

a finite set of states, often denoted Q,

a finite set Σ of input symbols,a total transition function δ : Q × Σ → Q,

a start state q0 ∈ Q, and

a set F ⊆ Q of final or accepting states.

. – p.2/39

Accepted strings:example

q0

q1

q2

Start 0

1 0

1

1

0

01 is accepted.

Strings of length 3 that are accepted: 101,011, 010, 001. Strings not accepted: 111, 110,100, 000

Clearly 01 is going to be a substring of anystring accepted. Is anything else required? . – p.3/39

6

Page 7: Uses of finite automata - University of Bathag248/CM10020/Automata.pdfWe shall study finite automata first, ... exactly one) following state, δ ... This DFA accepts 0011, but not

Formal approach toaccepted strings

We define the extended transition function δ̂. Ittakes a state q and an input string w to theresulting state. The definition proceeds byinduction over the length of w.

Induction basis (w has length 0): in this case,w is the empty string, i.e. the string of length0, for which we write ε. We define

δ̂(q, ε) = q.

. – p.4/39

Formal approach toaccepted strings

Induction step (from length l to length l + 1):in this case, w, which has length l + 1, is ofthe form va, where v is a string of length l anda is a symbol. We define

δ̂(q, va) = δ(δ̂(q, v), a).

This works because, by induction hypothesis,δ̂(q, v) is already defined.

. – p.5/39Accepted strings:example

q0

q1

q2

Start 0

1 0

1

1

0

It accepts the language of all strings of 0 and1 that contain the substring 01.

To prove that this is the language accepted bythe machine we would need to make aninductive argument based on the length of thestring w.

. – p.6/39

Non-deterministic finiteautomata (NFA’s)

. – p.7/39

Non-deterministic FA(NFA)

An NFA is like a DFA, except that it can be inseveral states at once.

This can be seen as the ability to guesssomething about the input.

Useful for searching texts.

. – p.8/39

NFA: example

An NFA accepting all strings that end in 01:

1q1q00

0,1

q2Start

It is non-deterministic because input 0 in state q0

can lead to both q0 and q1.

. – p.9/39

Using the NFA

Last time we saw how to use the NFA. We sawthat when in state q0 given input 0 the resultingstates are q0 or q1:

1q1q00

0,1

q2Start

We also saw that when in state q2 any input withcause the token to die and that the same occurswhen we have input 0 to state q1. . – p.10/39

Formal definition ofNFA

Definition. A non-deterministic finiteautomaton (NFA) consists of

a finite set of states, often denoted Q,

a finite set Σ of input symbols,a transition function δ : Q × Σ → P (Q),

a start state q0 ∈ Q, and

a set F ⊆ Q of final or accepting states.

. – p.11/39

7

Page 8: Uses of finite automata - University of Bathag248/CM10020/Automata.pdfWe shall study finite automata first, ... exactly one) following state, δ ... This DFA accepts 0011, but not

Difference betweenNFA and DFA

Suppose that q is a state and a is an inputsymbol.

In a DFA, we have δ(q, a) ∈ Q, that is, δ(q, a)is a state.

In a NFA, we have δ(q, a) ∈ P (Q), that is,δ(q, a) is a set of states; it can be seen asthe possible states that can result from input ain state q.

. – p.12/39

Formal approach toaccepted strings

We are aiming to describe the language L(A)accepted by a NFA A.

This description is similar to the DFA case,but a bit more sophisticated.

As in the DFA case, we first define theextended transition function:δ̂ : Q × Σ → P (Q).

That function δ̂ will be used to define L(A).

. – p.13/39Example of δ̂ (inputstring 100101)

1q1q00

0,1

q2Start

Before reading any symbols, the set of possiblestates is δ̂(q0, ε) = {q0}.

. – p.14/39

Example of δ̂ (inputstring: 100101)

1q1q00

0,1

q2Start

We have δ̂(q0, 1) = {q0}.

. – p.15/39

Example of δ̂ (inputstring: 100101)

1q1q00

0,1

q2Start

We have δ̂(q0, 10) = {q0, q1}.

. – p.16/39

Example of δ̂ (inputstring: 100101)

1q1q00

0,1

q2Start

We have δ̂(q0, 100) = {q0, q1}.

. – p.17/39Example of δ̂ (inputstring: 100101)

1q1q00

0,1

q2Start

We have δ̂(q0, 1001) = {q0, q2}.

. – p.18/39

Example of δ̂ (inputstring: 100101)

1q1q00

0,1

q2Start

We have δ̂(q0, 10010) = {q0, q1}.

. – p.19/39

8

Page 9: Uses of finite automata - University of Bathag248/CM10020/Automata.pdfWe shall study finite automata first, ... exactly one) following state, δ ... This DFA accepts 0011, but not

Example of δ̂ (inputstring: 100101)

1q1q00

0,1

q2Start

We have δ̂(q0, 100101) = {q0, q2}. Because{q0, q2} ∩ F = {q0, q2} ∩ {q2} = {q2} �= ∅, the NFAaccepts.

. – p.20/39

Formal definition of δ̂

Definition. The extended transition functionδ̂ : Q × Σ → P (Q) of an NFA is definedinductively as follows:

Induction basis (length 0):

δ̂(q, ε) = {q}Induction step (from length l to length l + 1):

δ̂(q, va) =⋃

q′∈δ̂(q,v)

δ(q′, a).

. – p.21/39

The language of anNFA

Intuitively, the language of a NFA A is the setof strings w that lead from the start state to anaccepting possible state.

Formally, the language L(A) accepted by theFA A is defined as follows:

L(A) = {w | δ̂(q0, w) ∩ F �= ∅}.

. – p.22/39

ExerciseGive NFA to accept the following languages.

1. The set of strings over an alphabet{0, 1, . . . , 9} such that the final digit hasappeared before.

2. The set of strings over an alphabet{0, 1, . . . , 9} such that the final digit has notappeared before.

3. The set of strings of 0’s and 1’s such thatthere are two 0’s separated by a number ofpositions that is a multiple of 4.

. – p.23/39

DFA’s and NFA’sEvidently, DFA’s are precisely those NFA’s forwhich the set of states δ(q, a) has exactly oneelement for all q and a.

So, trivially, every language accepted by aDFA is also accepted by some NFA.

Is every language accepted by an NFA alsoaccepted by some DFA?

Surprisingly, the answer is “yes”!

. – p.24/39

Simulation of an NFAby a DFA

Let N = (QN ,Σ, δN , qN0 , FN ) be a NFA. The

equivalent DFA D is obtained from the so-calledpowerset construction (also called “subsetconstruction”.) We define

D = (QD,Σ, δD, qD0 , FD),

where. . .

. – p.25/39

Simulation of an NFAby a DFA

The alphabet of D is that of N .

The states of D are sets of states of N :

QD = P (QN )

The initial state qD0 of D is {qN

0 }.

. – p.26/39

Simulation of an NFAby a DFA

The final states of D are those sets thatcontain the final state of N :

FD = {S ∈ P (QN ) |S ∩ FN �= ∅}The transition function of D arises from thetransition function of N as follows:

δD(S, a) =⋃q′∈S

δN(q′, a)

That is, δD(S, a) is the set of all states of Nthat are reachable from some state q via a.

. – p.27/39

9

Page 10: Uses of finite automata - University of Bathag248/CM10020/Automata.pdfWe shall study finite automata first, ... exactly one) following state, δ ... This DFA accepts 0011, but not

Example of powersetconstruction: tableNFA DFA

1

q1

q0

0

0,1

q2

Start

0 1

∅ ∅ ∅→ {q0} {q0, q1} {q0}{q1} ∅ {q2}∗{q2} ∅ ∅{q0, q1} {q0, q1} {q0, q2}∗{q0, q2} {q0, q1} {q0}∗{q1, q2} ∅ {q2}

∗{q0, q1, q2} {q0, q1} {q0, q2}. – p.28/39

Example of powersetconstruction: graph

{q0,q1,q2}

{}

{q0}

{q1}

{q0,q1}

{q1,q2}

{q2}

{q0,q2}

1

0

0,1

0,1

0

1

1

0

1

00 10

1

Transition graph of the resulting DFA.. – p.29/39Example of powerset

construction: graph

{q0} {q0,q1} {q0,q2}

1

0 1

0

0

1

Optionally, we can remove the unreachable statesof the DFA.

. – p.30/39

Proposition about thesimulation

Proposition.For every NFA N , there is a DFA D

such that L(D) = L(N).

. – p.31/39

Proof of theproposition (part 1/3)

First, we show that for every string w we have

δ̂D({q0}, w) = δ̂N(q0, w) (1)

We proceed by induction on the length l of w.

Base case (l = 0): in this case, w is the emptystring, ε. We have

δ̂D({q0}, ε) = {q0} (by defn. of δ̂D)

= δ̂N(q0, ε) (by defn. of δ̂N ).

. – p.32/39

Proof of theproposition (part 2/3)

Induction step (from l to l + 1): in this case, w,which is of length l + 1, is of the form va,where v is a string of length l and a is asymbol. We have

δ̂D({q0}, va) = δD(δ̂D({q0}, v), a) (by defn. of δ̂D)

= δD(δ̂N(q0, v), a) (by indn. hypoth.)

=⋃

q′∈̂δN (q0,v)

δN(q′, a) (by defn. of δD)

= δ̂N(q0, va) (by defn. of δ̂N ).. – p.33/39

Proof of theproposition (part 3/3)

Finally, we use Equation (1), which we just proved, to provethat the languages of D and N are equal:

w ∈ L(D) ⇐⇒ ̂δD({q0}, w) ∈ FD (by defn. of L(D))

⇐⇒ ̂δN(q0, w) ∈ FD (by Equation (1))

⇐⇒ ̂δN(q0, w) ∩ FN �= ∅ (by defn. of FD)

⇐⇒ w ∈ L(N) (by defn. of L(N)).

. – p.34/39

Languages acceptedby DFAs and NFAs

The proposition implies:

Corollary. A language L is accepted by someDFA if and only if L is accepted by some NFA.

Proof. ⇒: this is the powerset construction wehave just seen.⇐: this is true because every DFA is a specialcase of an NFA, as observed earlier.

. – p.35/39

10

Page 11: Uses of finite automata - University of Bathag248/CM10020/Automata.pdfWe shall study finite automata first, ... exactly one) following state, δ ... This DFA accepts 0011, but not

Warning

Let N be an NFA, and let D be the DFA thatarises from the powerset construction.

As we have seen, we have QD = P (QN).

So, if QN has size k, then the size of QD is 2k.

This exponential growth of the number ofstates makes the powerset constructionunusable in practice.

It can be shown that removing unreachablestates does not prevent this exponentialgrowth.

. – p.36/39

ExerciseConvert the following NFA to a DFA:

0 1

→ p {p, q} {p}q {r} {r}r {s} {}∗s {s} {s}.

. – p.37/39

ExerciseConvert the following NFA to a DFA:

0 1

→ p {q, s} {q}∗q {r} {q, r}r {s} {p}∗s {} {p}.

. – p.38/39

ExerciseConvert the following NFA to a DFA:

0 1

→ p {p, q} {p}q {r, s} {t}r {p, r} {t}∗s {} {}∗t {} {}

Describe informally the language accepted by thisNFA accept? (Don’t worry if you need tutor’s helpfor this.)

. – p.39/39

Non-Deterministic FiniteAutomaton (NFA).

. – p.1/29

Formal definition ofNFA

Definition. A non-deterministic finiteautomaton (NFA) consists of

a finite set of states, often denoted Q,

a finite set Σ of input symbols,a transition function δ : Q × Σ → P (Q),

a start state q0 ∈ Q, and

a set F ⊆ Q of final or accepting states.

. – p.2/29

Formal definition of δ̂

Definition. The extended transition functionδ̂ : Q × Σ → P (Q) of an NFA is definedinductively as follows:

Induction basis (length 0):

δ̂(q, ε) = {q}Induction step (from length l to length l + 1):

δ̂(q, va) =⋃

q′∈δ̂(q,v)

δ(q′, a).

. – p.3/29

The language of anNFA

Intuitively, the language of a NFA A is the setof strings w that lead from the start state to anaccepting possible state.

Formally, the language L(A) accepted by theFA A is defined as follows:

L(A) = {w | δ̂(q0, w) ∩ F �= ∅}.

. – p.4/29

11

Page 12: Uses of finite automata - University of Bathag248/CM10020/Automata.pdfWe shall study finite automata first, ... exactly one) following state, δ ... This DFA accepts 0011, but not

Simulation of an NFAby a DFA

Let N = (QN ,Σ, δN , qN0 , FN ) be a NFA. The

equivalent DFA D is obtained from the so-calledpowerset construction (also called “subsetconstruction”.) We define

D = (QD,Σ, δD, qD0 , FD),

where. . .

. – p.5/29

Simulation of an NFAby a DFA

The alphabet of D is that of N .

The states of D are sets of states of N :

QD = P (QN )

The initial state qD0 of D is {qN

0 }.

. – p.6/29

Simulation of an NFAby a DFA

The final states of D are those sets thatcontain the final state of N :

FD = {S ∈ P (QN ) |S ∩ FN �= ∅}The transition function of D arises from thetransition function of N as follows:

δD(S, a) =⋃q′∈S

δN(q′, a)

That is, δD(S, a) is the set of all states of Nthat are reachable from some state q via a.

. – p.7/29

Example of powersetconstruction: table

NFA:

q0

q1

Start

0,1

1

DFA:0 1

∅ ∅ ∅→ {q0} {q0} {q0, q1}∗{q1} ∅ ∅

∗{q0, q1} {q0} {q0, q1}. – p.8/29

Resulting DFA:

{q }1

{}

0

0

1

1

0{q }

0 1{q ,q }

Start

0

0

1

1

We cannot reach the states ∅ and {q1} from thestart state so we can remove that part of themachine. You should be able to see that thisresulting machine accepts the same languageand how it is forced to work in a slightly moresubtle way.

. – p.9/29

Proposition about thesimulation

Proposition.For every NFA N , there is a DFA Dsuch that L(D) = L(N).

We used the powerset construction to show this.

We have an NFA N with start state q0 andextended transition function δ̂N and the DFA Dobtained using the powerset construction whichhas start state {q0} and extended transitionfunction δ̂D.

. – p.10/29

Summary of proof

Firstly we assume that for every string w wehave

δ̂D({q0}, w) = δ̂N(q0, w). (1)

We use this to show that the languagesaccepted by N and D are the same that is,L(N) = L(D).

We then need to show that (1) is true and wedo this by induction on the length of the inputstring.

. – p.11/29

Languages acceptedby DFAs and NFAs

The proposition implies:

Corollary. A language L is accepted by someDFA if and only if L is accepted by some NFA.

Proof. ⇒: this is the powerset constructionsummarised above.⇐: this is true because every DFA is a specialcase of an NFA, as observed earlier.

. – p.12/29

12

Page 13: Uses of finite automata - University of Bathag248/CM10020/Automata.pdfWe shall study finite automata first, ... exactly one) following state, δ ... This DFA accepts 0011, but not

Regular expressions

. – p.13/29

Regular expressions:motivation

Useful for describing text patterns (withwildcards etc.).

Used e.g. for text search in the text editor“Emacs” and in the Unix search command“grep”.

Used in compilers for recognizing tokens ofprogramming languages, e.g. identifiers,floating-point-numbers, and so on. (Seecompilers lecture.)

. – p.14/29

First example

The regular expression

01∗ + 10∗

denotes the language consisting of all strings thatare either a single 0 followed by any number of1’s, or a single 1 followed by any number of 0’s.

. – p.15/29

Operations onlanguages

Before describing the regular-expression nota-tion, we need to define the operations on lan-guages that the operators of regular expressionsrepresent.

. – p.16/29

ConcatenationThe concatenation L · L′ (or just LL′) oflanguages L and L′ is defined to be the set ofstrings ww′ where w ∈ L and w ∈ L′.

For example, if L = {001, 10, 111} andL′ = {ε, 001}, thenLL′ = {001, 10, 111, 001001, 10001, 111001}.

. – p.17/29

Self-concatenationFor a language L, we write Ln for

L · L · · · · · L︸ ︷︷ ︸n times

That is, Ln is the language that consists ofstrings w1w2 . . . wn, where each wi is in L.

For example, if L = {ε, 001}, thenL3 = {ε, 001, 001001, 001001001}.

Note that L1 = L. The language L0 is definedto be {ε}.

. – p.18/29

Closure (Kleene-star)The closure (or star or Kleene closure) L∗

of a language L is defined to be

L∗ =⋃n≥0

Ln

That is, L∗ is the language that consists ofstrings w1w2 . . . wk, where k is anynon-negative integer and each wi is in L.

E.g. if L = {0, 11}, then L∗ consists of allstrings such that the 1’s come in pairs, e.g.011, 11110, and ε, but not 01011 or 101.

. – p.19/29

Regular expressions:definition

Definition. The regular expressions over analphabet Σ are defines as follows:

Every symbol a ∈ Σ is a regular expression.

If E and E ′ are regular expressions, then sois E + E ′ and E · E ′. (We shall abbreviate thelatter by EE ′.)

If E is a regular expressions, then so is E∗.

The symbol ε is a regular expression.

The symbol ∅ is a regular expression.. – p.20/29

13

Page 14: Uses of finite automata - University of Bathag248/CM10020/Automata.pdfWe shall study finite automata first, ... exactly one) following state, δ ... This DFA accepts 0011, but not

Semantics of regularexpressions

(Remark: “semantics” is the technical term for“meaning”.)

Regular expression E denoted language L(E)

a ∈ Σ {a}E + E ′ L(E) ∪ L(E ′)E · E ′ L(E) · L(E ′)

E∗ (L(E))∗

ε {ε}∅ the empty language, ∅

. – p.21/29

Example

Suppose you want to search some messy textfile for the street parts of addresses, e.g.“Milsom Street” or “Wells Road”.

Let [A − Z] stand for A + B + · · · + Z.

Let [a − z] stand for a + b + · · · + z.

You may want to use a regular expression like

[A−Z][a−z]∗ (Street+St.+Road+Rd.+Lane)

Expressions like this are accepted e.g. by theUNIX command grep, the EMACS text editor,and various other tools.

. – p.22/29

ExercisesWrite regular expressions for the followinglanguages:

1. The set of strings over alphabet {a, b, c} withat least one a and at least one b.

2. The set of strings of 0’s and 1’s whose tenthsymbol from the right end is 1.

3. The set of strings of 0’s and 1’s with at moston pair of consecutive 1’s.

. – p.23/29

ExercisesWrite regular expressions for the followinglanguages:

1. The set of all strings of 0’s and 1’s such thatevery pair of adjacent 0’s appears before anypair of adjacent 1’s.

2. The set of strings of 0’s and 1’s whosenumber of 0’s is divisible by five.

. – p.24/29

ExerciseFor any alphabet Σ, which are the subsets S of Σ∗

such that the set S∗ is finite?

. – p.25/29

Regular expressionsand FA’s: overview

As we shall see, for every regular expressionE, there is an NFA (and therefore also a DFA)that accepts the language defined by E.

Tools that scan text for regular expressionswork in this way.

Also, for every DFA (and therefore for everyNFA) A, there is a regular expression thatdenotes the language accepted by A.

So finite automata and regular expressionsare equivalent with respect to the definablelanguages.

. – p.26/29

NFA’s withε-transitions

For simulating regular expressions, it ishelpful to introduce NFA’s with ε-transitions,or ε-NFA’s in short.The only difference between ε-NFA’s andNFA’s is that the former can makespontaneous transitions, i.e. transitions thatuse up no input—technically speaking, theempty string ε.

q q'

. – p.27/29

NFA’s withε-transitions

More formally, an ε-NFA differs from an NFAonly in that its transition function also acceptsε as an argument:

δ : Q × (Σ∪ ε) → P (Q)

It can be shown by some modified powersetconstruction that for every ε-NFA there is aDFA accepting the same language (seeHopcroft/Motwani/Ullman).

. – p.28/29

14

Page 15: Uses of finite automata - University of Bathag248/CM10020/Automata.pdfWe shall study finite automata first, ... exactly one) following state, δ ... This DFA accepts 0011, but not

From regularexpressions to FA’s

Formally, we shall prove:

Proposition. For every regular expression E,there is an ε-NFA NE such that L(NE) = L(E).

We shall see how this works in the next lecture.

. – p.29/29

15

Page 16: Uses of finite automata - University of Bathag248/CM10020/Automata.pdfWe shall study finite automata first, ... exactly one) following state, δ ... This DFA accepts 0011, but not

Regular expressions ctd.

. – p.1/30

Operations onlanguages

For languages L and L′

The Concatenation L · L′ (or just LL′) isdefined to be the set of strings ww′ wherew ∈ L and w ∈ L′.

We write Ln for L · L · · · · · L (n times). Ln isthe language that consists of stringsw1w2 . . . wn, where each wi is in L.

The closure (or star or Kleene closure) L∗ isdefined to be L∗ =

⋃n≥0 Ln. L∗ is the

language that consists of strings w1w2 . . . wk,where k ∈ Z, k ≥ 0 and each wi is in L.

. – p.2/30

Regular expressions:definition

Definition. The regular expressions over analphabet Σ are defines as follows:

Every symbol a ∈ Σ is a regular expression.

If E and E ′ are regular expressions, then sois E + E ′ and E · E ′. (We shall abbreviate thelatter by EE ′.)

If E is a regular expressions, then so is E∗.

The symbol ε is a regular expression.

The symbol ∅ is a regular expression.. – p.3/30

Semantics of regularexpressions

(Remark: “semantics” is the technical term for“meaning”.)

Regular expression E denoted language L(E)

a ∈ Σ {a}E + E ′ L(E) ∪ L(E ′)E · E ′ L(E) · L(E ′)

E∗ (L(E))∗

ε {ε}∅ the empty language, ∅

. – p.4/30Exercises 5.1 (lastlecture)

Write regular expressions for the followinglanguages:

1. The set of strings over alphabet {a, b, c} withat least one a and at least one b.

2. The set of strings of 0’s and 1’s whose tenthsymbol from the right end is 1.

3. The set of strings of 0’s and 1’s with at moston pair of consecutive 1’s.

. – p.5/30

Exercises 5.2 (lastlecture)

Write regular expressions for the followinglanguages:

1. The set of all strings of 0’s and 1’s such thatevery pair of adjacent 0’s appears before anypair of adjacent 1’s.

2. The set of strings of 0’s and 1’s whosenumber of 0’s is divisible by five.

. – p.6/30

16

Page 17: Uses of finite automata - University of Bathag248/CM10020/Automata.pdfWe shall study finite automata first, ... exactly one) following state, δ ... This DFA accepts 0011, but not

Exercise 5.3 (lastlecture)

For any alphabet Σ, which are the subsets S of Σ∗

such that the set S∗ is finite?

. – p.7/30

Regular expressionsand FA’s: overview

As we shall see, for every regular expressionE, there is an NFA (and therefore also a DFA)that accepts the language defined by E.

Tools that scan text for regular expressionswork in this way.

Also, for every DFA (and therefore for everyNFA) A, there is a regular expression thatdenotes the language accepted by A.

So finite automata and regular expressionsare equivalent with respect to the definablelanguages.

. – p.8/30

NFA’s withε-transitions

For simulating regular expressions, it ishelpful to introduce NFA’s with ε-transitions,or ε-NFA’s in short.The only difference between ε-NFA’s andNFA’s is that the former can makespontaneous transitions, i.e. transitions thatuse up no input—technically speaking, theempty string ε.

q q'

. – p.9/30

NFA’s withε-transitions

More formally, an ε-NFA differs from an NFAonly in that its transition function also acceptsε as an argument:

δ : Q × (Σ∪ ε) → P (Q)

It can be shown by some modified powersetconstruction that for every ε-NFA there is aDFA accepting the same language (seeHopcroft/Motwani/Ullman).

. – p.10/30

From regularexpressions to FA’s

Formally, we shall prove:

Proposition. For every regular expression E,there is an ε-NFA NE such that L(NE) = L(E).

We shall see how this works on the next fewslides.

. – p.11/30

The ε-NFA NE of aregular expression E

For every regular expression E, we shall build anε-NFA NE with exactly one accepting state, fromwhich no further transitions are possible:

NEstart accept

. – p.12/30

NE for E = a ∈ Σ

start accepta

. – p.13/30

NE for E = ε

start accept

. – p.14/30

17

Page 18: Uses of finite automata - University of Bathag248/CM10020/Automata.pdfWe shall study finite automata first, ... exactly one) following state, δ ... This DFA accepts 0011, but not

NE+E′

NE'startE'accept E'

NEstart E accept E

start E+E' accept E+E'

. – p.15/30

NE·E′

NE'startE'

accept E'=accept EE'

NEstart E

=start EE'

accept E

. – p.16/30

NE∗

NEstart E

accept E*accept Estart E*

. – p.17/30

NE for E = ∅

start accept

There is no way to get from the start state to theaccepting state.

. – p.18/30

NFA for (0 + 1)∗1(0 + 1)

See overhead.

. – p.19/30

Exercises 6.1Convert each of the following regularexpressions to an ε-NFA:

1. 01∗

2. (0 + 1)01

3. 00(0 + 1)∗

. – p.20/30The big picture (part1/2)

Trivially, every DFA is an NFA, and every NFAis an ε-NFA.

One goal is to show that all three types ofautomata accept the same languages.

To show this, it suffices to show that for everyε-NFA there is a DFA that accepts the samelanguage.

To that end, we shall use a modifiedpowerset construction.

. – p.21/30

The big picture (part2/2)

We have also seen that every regularexpression is accepted by an ε-NFA.

We shall see later that for every FA there is aregular expression describing the samelanguage.

So all four formalisms (DFA’s, NFA’s, ε-NFA’s,and regular expressions) describe the samelanguages.

. – p.22/30

18

Page 19: Uses of finite automata - University of Bathag248/CM10020/Automata.pdfWe shall study finite automata first, ... exactly one) following state, δ ... This DFA accepts 0011, but not

From ε-NFA to DFASuppose that N is an ε-NFA. We shall nowstudy the modified powerset construction,which produces a DFA D that accepts thesame language as N .

To that end, we need one auxiliary definition:given a set S of states of N , the ε-closurecl(S) of S is the set of states that arereachable from S by any number ofε-transitions.

. – p.23/30

From ε-NFA to DFAThe construction of D from N looks as thepowerset construction, except that we use cl:

The alphabet of D is that of N .

The states of D are sets of the form cl(S),where S ∈ P (N).

The initial state qD0 of D is cl{qN

0 }.

The final states of D are those sets of theform cl(S) that contain a final state of N :

FD = {cl(S) |S ∩ FN �= ∅}. – p.24/30

From ε-NFA to DFAThe transition function of D arises from thetransition function of N as follows:

δD(S, a) =⋃q∈S

cl(δN (q, a))

That is, δD(S, a) is the set of all states of Nthat are reachable from some state q ∈ S viaa, followed by any number of ε-transitions.

. – p.25/30

The simulationproposition

Proposition. For every ε-NFA N, the DFA Dresulting from the modified powerset constructionaccepts the same language.

Proof. One shows that every string w that

w ∈ L(D) ⇐⇒ w ∈ L(N).

The proof works by induction on the length ofw, and is only slighty more complicated than theproof we have seen for the (ordinary) powersetconstruction . – p.26/30

Exercise 6.2Consider the following ε-NFA.

ε a b c

→ p ∅ {p} {q} {r}q {p} {q} {r} ∅∗r {q} {r} ∅ {p}

1. Compute the ε-closure of each state.

2. Give all strings of length three or lessaccepted by this automaton.

3. Convert the automaton to a DFA.. – p.27/30

Exercise 6.3Repeat the previous exercise for the followingε-NFA.

ε a b c

→ p {q, r} ∅ {q} {r}q ∅ {p} {r} {p, q}∗r ∅ ∅ ∅ ∅

. – p.28/30

Exercise 6.4In an earlier exercise, we converted the regularexpressions 01∗, (0 + 1)01, and 00(0 + 1)∗ into ε-NFA’s. Convert each of those ε-NFA’s into a DFA.

. – p.29/30

Formal languages

Next time we will start on formal languages.

A formal language (or simpy “language”) is aset L of strings over some finite alphabet Σ.That is, a subset L ⊆ Σ∗.

Finite automata and regular expressionsdescribe certain formal languages but manyimportant languages are not regular e.g.programming languages.

To describe formal languages we will useformal grammars which we will introduce nextlecture. . – p.30/30

19

Page 20: Uses of finite automata - University of Bathag248/CM10020/Automata.pdfWe shall study finite automata first, ... exactly one) following state, δ ... This DFA accepts 0011, but not

Formal languages

. – p.2/30

Formal languages:overview

A formal language (or simpy “language”) is aset L of strings over some finite alphabet Σ.That is, a subset L ⊆ Σ∗.

Finite automata and regular expressionsdescribe certain formal languages.

But many important formal languages, e.g.programming languages, are not regular.

To describe formal languages, we shall useformal grammars.

. – p.3/30

Formal grammars:overview

Important for describing programminglanguages.

Different kinds of formal grammars aredescribed by the the Chomsky hierarchy.Among the simplest grammars in theChomsky hierarchy are the regulargrammars, which—as we shall seelater—describe the same languages asregular expressions.

. – p.4/30

Formal grammars:basic idea

To generate strings by beginning with a startsymbol S and then apply rules that indicate how

certain combinations of symbols may be replaced

with other combinations of symbols.

. – p.5/30

20

Page 21: Uses of finite automata - University of Bathag248/CM10020/Automata.pdfWe shall study finite automata first, ... exactly one) following state, δ ... This DFA accepts 0011, but not

Formal grammar:definition

Definition. A formal grammar G = (N,Σ, P, S)consists of

a finite set N of non-terminal symbols;a finite set Σ of terminal symbols not in N ;

a finite set P of production rules of the form

u → v

where u and v are strings in (Σ ∪ N)∗ and ucontains at least one non-terminal symbol;

a start symbol S in N .. – p.6/30

Example

The grammar G with non-terminal symbolsN = {S,B}, terminal symbols Σ = {a, b, c}, andproductions

S → abc

S → aSBc

cB → Bc

bB → bb

Following a common practice, we use capital let-

ters for non-terminal symbols and small letters for

terminal symbols.. – p.7/30

Language of a formalgrammar

Definition. The language of a formal grammar

G = (N,Σ, P, S), denoted as L(G), is defined as

all those strings over Σ that can be generated by

starting with the start symbol S and then apply-

ing the production rules in P until no more non-

terminal symbols are present.

. – p.8/30

The Chomskyhierarchy

There are 4 different types of formal grammarsthat we will meet. Some we will spend more timeon than others.

Type 3: regular grammars.

Type 2: context free grammars.

Type 1: context sensitive grammars.

Type 0: unrestricted or free grammars.

To define each type we need only restrict the type

of production rules permitted.. – p.9/30

Context-free grammars andlanguages

. – p.10/30

MotivationIt turns out that many important languages,e.g. programming languages, cannot bedescribed by finite automata/regularexpressions.

To describe such languages, we shallintroduce context-free grammars (CFG’s).

. – p.11/30

Example

The grammar G with non-terminal symbolsN = {S}, terminal symbols Σ = {a, b}, andproductions

S → aSb

S → ab.

Following a common practice, we use capital let-

ters for non-terminal symbols and small letters for

terminal symbols.

. – p.12/30

Context-freegrammar: definition

Definition. A context-free grammarG = (N,Σ, P, S) consists of

a finite set N of non-terminal symbols;a finite set Σ of terminal symbols not in N ;

a finite set P of production rules of the form

u → v

where u is in N and v is a string in (Σ ∪ N)∗;

a start symbol S in N .. – p.13/30

21

Page 22: Uses of finite automata - University of Bathag248/CM10020/Automata.pdfWe shall study finite automata first, ... exactly one) following state, δ ... This DFA accepts 0011, but not

Example: integerexpressions

A simple form of integer expressions.

The non-terminal symbols are

symbol intended meaningE expressions, e.g. (x + y) ∗ 3

I identifiers, e.g. x, y

C constants (here: natural numbers).

The alphabet Σ of terminal symbols is{x, y, 0, 1, . . . , 9, ∗,+, (, )}.

. – p.14/30

Productions forinteger expressions

E → C C → 0 C → C0

E → I C → 1 C → C1

E → E + E... ...

E → E ∗ E C → 9 C → C9

E → (E)

I → x

I → y

. – p.15/30

Compact notation

More compact notation for productions:

E → I |C |E + E |E ∗ E | (E)

C → 0 |C0 | . . . | 9 |C9

I → x | y

. – p.16/30

Language of a formalgrammar

Definition. The language of a formal grammar

G = (N,Σ, P, S), denoted as L(G), is defined as

all those strings over Σ that can be generated by

beginning with the start symbol S and then apply-

ing the productions P .

. – p.17/30

Parse treesEvery string w in a context-free language hasa parse tree.The root of a parse tree is the start symbol S.

The leaves of a parse tree are the terminalsymbols that make up the string w.

The branches of the parse tree describe howthe productions are applied.

. – p.18/30

ExerciseConsider the grammar

S → A1B

A → 0A | εB → 0B | 1B | ε.

Give parse trees for the strings

1. 00101

2. 1001

3. 00011. – p.19/30

ExerciseFor the CFG G defined by the productions

S → aS |Sb | a | b,

prove by induction on the size of the parse tree

that the no string in language L(G) has ba as a

substring.

. – p.20/30

ExercisesGive a context-free grammar for the language of

all palindromes over the alphabet {a, b, c}. (A

palindrome is a word that is equal to the reversed

version of itself, e.g. “abba”, “bab”.)

. – p.21/30

22

Page 23: Uses of finite automata - University of Bathag248/CM10020/Automata.pdfWe shall study finite automata first, ... exactly one) following state, δ ... This DFA accepts 0011, but not

Parsing

Finding a parse tree for a string is calledparsing. Software tools that do this are calledparsers.A compiler must parse every program beforeproducing the executable code.

There are tools called “parser generators”that turn CFG’s into parsers. One of them isthe famous YACC (“Yet Another CompilerCompiler”).

Parsing is a science unto itself, and describedin detail in courses about compilers.

. – p.22/30

Ambiguity

A context-free grammar with more than oneparse tree for some expression is calledambiguous.Ambiguity is dangerous, because it can affectthe meaning of expressions; e.g. in theexpression b ∗ a + b, it matters whether ∗ hasprecedence over + or vice versa.

To address this problem, parser generators(like YACC) allow the language designer tospecify the operator precedence.

. – p.23/30

ExerciseConsider the grammar

S → aS | aSbS | ε.

Show that this grammar is ambiguous.

. – p.24/30

ExerciseConsider the grammar

S → A1B

A → 0A | εB → 0B | 1B | ε.

Show that this grammar is unambiguous.

. – p.25/30

Regular languages

. – p.26/30

Regular grammars

Next, we shall see certain CFG’s calledregular grammars and show that they definethe same languages as finite automata andregular expressions.

These are the regular languages and are type3 on the Chomsky hierarchy.

They form a subset of the context freelanguages, which are type 2, defined by thecontext free grammars we have been talkingabout.

. – p.27/30

Regular grammars:definition

Definition. A CFG is called regular if everyproduction has one of the three forms below

A → aB

A → a

A → ε

where A and B are terminal symbols and a is

some non-terminal symbol.

. – p.28/30

Example

The grammar G with N = {A,B,C,D},Σ = {0, 1}, S = A, and the productions below isregular.

A → 0C | 1B | εB → 0D | 1AC → 0A | 1DD → 0B | 1C

. – p.29/30

23

Page 24: Uses of finite automata - University of Bathag248/CM10020/Automata.pdfWe shall study finite automata first, ... exactly one) following state, δ ... This DFA accepts 0011, but not

Non-example

The grammar below is not regular because of theb on the right of S.

S → aSb

S → ε.

. – p.30/30

Lecture 8: Regularlanguages

. – p.1/27

Formal grammar:definition

Definition. A formal grammar G = (N,Σ, P, S)consists of

a finite set N of non-terminal symbols;a finite set Σ of terminal symbols not in N ;

a finite set P of production rules of the form

u → v

where u and v are strings in (Σ ∪ N)∗ and ucontains at least one non-terminal symbol;

a start symbol S in N .. – p.2/27

The Chomskyhierarchy

There are 4 different types of formal grammarsthat we will meet. Some we will spend more timeon than others.

Type 3: regular grammars.

Type 2: context free grammars.

Type 1: context sensitive grammars.

Type 0: unrestricted or free grammars.

To define each type we need only restrict the typeof production rules permitted.

. – p.3/27

Context-freegrammar: definition

Definition. A context-free grammarG = (N,Σ, P, S) consists of

a finite set N of non-terminal symbols;a finite set Σ of terminal symbols not in N ;

a finite set P of production rules of the form

u → v

where u is in N and v is a string in (Σ ∪ N)∗;

a start symbol S in N .. – p.4/27

Regular grammars

Next, we shall see certain CFG’s calledregular grammars and show that they definethe same languages as finite automata andregular expressions.

These are the regular languages and are type3 on the Chomsky hierarchy.

They form a subset of the context freelanguages, which are type 2, defined by thecontext free grammars we have been talkingabout.

. – p.5/27

Regular grammars:definition

Definition. A CFG is called regular if everyproduction has one of the three forms below

A → aB

A → a

A → ε

where A and B are non-terminal symbols and a issome terminal symbol.

. – p.6/27

Example

The grammar G with N = {S,A}, Σ = {0, 1}, Sthe start state and the productions below isregular.

S → 0S

S → 1B

B → ε

What language does this grammar describe? Canyou find a DFA and a regular expression that de-fine the same language?

. – p.7/27

24

Page 25: Uses of finite automata - University of Bathag248/CM10020/Automata.pdfWe shall study finite automata first, ... exactly one) following state, δ ... This DFA accepts 0011, but not

Non-example

The grammar below is not regular because of theb on the right of S.

S → aSb

S → ε.

This is a context free grammar that describes thelanguage {anbn|n ≥ 1}. You cannot write a regulargrammar that describes this language. Why not?

. – p.8/27

From NFA to regulargrammar

For an NFA over alphabet Σ, we construct aregular grammar G over alphabet Σ as follows:

1. The non-terminal symbols are the states ofthe NFA.

2. The start symbol is the initial state of the NFA.

3. For every transition qa � q′ in the NFA,

introduce a production q → aq′.

4. For every final state q, add a productionq → ε.

. – p.9/27

Parity Checker

A B

C D

1

1

00

1

1

00

Q = {A,B,C,D}, Σ = {0, 1}, F = {A}.

. – p.10/27

Parity checker

The grammar G with N = {A,B,C,D},Σ = {0, 1}, S = A, and the productions below isregular.

A → 0C | 1B | εB → 0D | 1AC → 0A | 1DD → 0B | 1C

. – p.11/27

8.1 ExerciseTurn some of the NFA’s in this lecture course intoregular grammars.

. – p.12/27

From reg. grammarto reg. expression

We shall explain this in a number of steps.

First, we write the grammar in its compactnotation, e.g.

X → aY | bZ | ε (1)

Y → cX | dZ (2)

Z → eZ | fY (3).

. – p.13/27From reg. grammarto reg. expression

Next, we replace → by = and | by +, to make thegrammar look like an equation system:

X = aY + bZ + ε (1)

Y = cX + dZ (2)

Z = eZ + fY (3).

The trick is now to get a solution for the start sym-bol X that does not rely on Y and Z.

. – p.14/27

From reg. grammarto reg. expression

X = aY + bZ + ε (1)

Y = cX + dZ (2)

Z = eZ + fY (3).

We shall reduce the three equations above totwo equations as follows:

1. Find a solution for Z in terms of only X and Y .

2. Replace that solution for Z in equations (1)and (2). (That yields equations for X and Ythat don’t rely on Z anymore.)

. – p.15/27

25

Page 26: Uses of finite automata - University of Bathag248/CM10020/Automata.pdfWe shall study finite automata first, ... exactly one) following state, δ ... This DFA accepts 0011, but not

From reg. grammarto reg. expression

How do we find a solution for

Z = eZ + fY (3)

that doesn’t rely on Z anymore? The idea is

“Z can produce an e and become Z again for anumber of times, but finally Z must become fY .”

Formally, the solution of Equation (3) is

Z = e∗fY.

. – p.16/27

From reg. grammarto reg. expression

Generally, the solution of any equation of theform

A = cA + B

is

A = c∗B.

. – p.17/27From reg. grammarto reg. expression

Next, we replace the solution

Z = e∗fY

in equations (1) and (2)

X = aY + bZ + ε (1)

Y = cX + dZ (2).

This yields

X = aY + be∗fY + ε (1′)

Y = cX + de∗fY (2′).. – p.18/27

From reg. grammarto reg. expression

Simplifying (1’) by using the distributivity lawyields

X = (a + be∗f)Y + ε (1′)

Now instead of three equations over X,Y, Z, wehave only two equations over X,Y :

X = (a + be∗f)Y + ε (1′)

Y = cX + de∗fY (2′).

. – p.19/27

From reg. grammarto reg. expression

X = (a + be∗f)Y + ε (1′)

Y = cX + de∗fY (2′).

Next, we repeat our game of eliminatingnon-terminals and find a solution for Y :

Y = (de∗f)∗cX.

. – p.20/27

From reg. grammarto reg. expression

Using Y = (de∗f)∗cX in (1’) yields

X = (a + be∗f)(de∗f)∗cX + ε (1′′)

The solution of (1”) is

X = ((a + be∗f)(de∗f)∗c)∗ε = ((a + be∗f)(de∗f)∗c)∗

So we found a regular expression for the lan-guage generated by X.

. – p.21/27

8.2 ExerciseConsider the NFA given by the transition tablebelow.

0 1

→ X {X} {X,Y }∗Y {Y } {Y }

1. Give a regular grammar for it.

2. Calculate a regular expression that describesthe language.

. – p.22/27

8.3 ExerciseRepeat the exercise for the NFA below.

0 1

→ ∗X {X} {Y }Y {Y } {Z}Z {Z} {X}

Also, describe the accepted language in English.

. – p.23/27

26

Page 27: Uses of finite automata - University of Bathag248/CM10020/Automata.pdfWe shall study finite automata first, ... exactly one) following state, δ ... This DFA accepts 0011, but not

The big picture: finalversion

See lecture for diagram.

Definition. Languages that are definable by reg-ular expressions or regular grammars or finite au-tomata are called regular languages.

. – p.24/27

A non-regularlanguage

Proposition. The language

L = {anbn |n ≥ 1}(for which we gave a CFG earlier) is not regular.

Proof. By contradiction: Assume that L is regular.

Then there is a DFA A that accepts L.

Let n be the number of states of A.

We read in the string an. We make n state transitionsso must “visit” n + 1 states hence we must “visit” onestate twice, lets call that state q.

. – p.25/27A non-regularlanguage

That means there are two substrings of an, say am andak with m < k such that q = δ̂(q0, a

m) = δ̂(q0, ak).

Now, the machine accepts ambm so the machine readsin am and is in state q and starts from q, reads in bm

and then accepts the string.

If we input akbm the machine reads in ak and is in stateq. We know that from state q with input bm the machineends up in an accepting state so the machine acceptsakbm which implies akbm ∈ L, k < m. Contradiction!

The machine “can’t remember” whether the input wasam or ak.

. – p.26/27

Significance of theexample

That L is not regular has great practicalsignificance:

Recall that a string anbn can be seen as nopening brackets followed by n closingbrackets.

Realistic programming languages containbalanced brackets.

So it can be shown that realistic programminglanguages are are not regular!

. – p.27/27

27

Page 28: Uses of finite automata - University of Bathag248/CM10020/Automata.pdfWe shall study finite automata first, ... exactly one) following state, δ ... This DFA accepts 0011, but not

28

Page 29: Uses of finite automata - University of Bathag248/CM10020/Automata.pdfWe shall study finite automata first, ... exactly one) following state, δ ... This DFA accepts 0011, but not

Lecture 9: The Chomskyhierarchy

. – p.1/31

The Chomskyhierarchy

There are 4 different types of formal grammarsthat we will meet. Some we will spend more timeon than others.

Type 3: regular grammars.

Type 2: context free grammars.

Type 1: context sensitive grammars.

Type 0: unrestricted or free grammars.

To define each type we need only restrict the typeof production rules permitted.

. – p.2/31

29

Page 30: Uses of finite automata - University of Bathag248/CM10020/Automata.pdfWe shall study finite automata first, ... exactly one) following state, δ ... This DFA accepts 0011, but not

Formal grammar:definition

Definition. A formal grammar G = (N,Σ, P, S)consists of

a finite set N of non-terminal symbols;a finite set Σ of terminal symbols not in N ;

a finite set P of production rules of the form

u → v

where u and v are strings in (Σ ∪ N)∗ and ucontains at least one non-terminal symbol;

a start symbol S in N .. – p.3/31

Context-freegrammar: definition

Definition. A context-free grammarG = (N,Σ, P, S) consists of

a finite set N of non-terminal symbols;a finite set Σ of terminal symbols not in N ;

a finite set P of production rules of the form

u → v

where u is in N and v is a string in (Σ ∪ N)∗;

a start symbol S in N .. – p.4/31

Example

The grammar G with non-terminal symbolsN = {S}, terminal symbols Σ = {a, b}, andproductions

S → aSb

S → ab.

This grammar describes the languageL = {anbn|n ≥ 1}.

We cannot build an FA that accepts thislanguage.

What type of machines accept context freelanguages? . – p.5/31

Pushdown automata

. – p.6/31

Pushdown automataInformally a pushdown automata (PDA) is anε-NFA with one additional capability.

We introduce a stack on which we can store astring of symbols.

This means a pushdown automata can“remember” an arbitary amount ofinformation.

However we can only access the informationon the stack in a last-in, first-out way.

. – p.7/31

The stackThe PDA works like an FA does except for thestack.

The PDA can write symbols on to the top ofthe stack, “pushing down” all the othersymbols on the stack.

The PDA can read and remove a symbol fromthe top of the stack and the symbols belowmove back up.

Writing a symbol is often referred to as“pushing”; reading and removing a symbol isreferred to as “popping”.

. – p.8/31

Formal definition ofPDA

Definition. Let Σε = Σ ∪ {ε} and Γε = Γ ∪ {ε}. A(non deterministic) pushdown automaton(PDA) consists of

a finite set of states, often denoted Q,

a finite set Σ of input symbols,a finite set Γ of stack symbols,a transition functionδ : Q × Σε × Γε → P (Q × Γ∗),

a start state q0 ∈ Q,

a set F ⊆ Q of final or accepting states.. – p.9/31

The transitionfunction

The transition function δ has as input thecurrent state q ∈ Q, the next input symbola ∈ Σε and the top symbol of the stack s ∈ Γε.

The PDA is non deterministic so the transitionfunction produces a set of pairs. Each pair isa new state q′ and a string of stack symbolsγ ∈ Γ∗.

If a = ε the PDA changes state withoutreading an input symbol.

If s = ε the PDA changes state withoutreading a stack symbol.

. – p.10/31

30

Page 31: Uses of finite automata - University of Bathag248/CM10020/Automata.pdfWe shall study finite automata first, ... exactly one) following state, δ ... This DFA accepts 0011, but not

Transition diagramsand tables

We can still draw transition diagrams andtransition tables but they need to include thestack information as well.

We label a transition arrow from q to q′ witha, s → γ to signify that on input a we move tostate q′ and pop s off the stack and push thestring γ on to the stack.

The columns of the transition table are splitinto subcolumns for each stack symbol.

. – p.11/31

Transition graph:example

b,a −>ε

ε, ε−> $

a, ε−> a

q 0 q 1

b,a −>εε, $−>ε

q 2q 3

The language accepted by the machine isL = {anbn|n ≥ 1}.

How does it “remember” the number of a’s?. – p.12/31

Transition table:example

Q = {q0, q1, q2, q3}, Σ = {a, b}, Γ = {a, $},F = {q0, q3} and δ is defined by the table:

input a b ε

stack a $ ε a $ ε a $ ε

q0 − − − − − − − − {(q1, $)}q1 − − {(q1, a)} {(q2, ε)} − − − − −q2 − − − {(q2, ε)} − − − {(q3, ε)} −q3 − − − − − − − − −

A dash signifies ∅ (to make it easier to read). . – p.13/31

Equivalence of CFG’sand PDA’s

Theorem A language is context free if and only ifsome pushdown automaton accepts it.

The proof of this theorem is beyond the scope ofthe course, if you are interested you should seeHopcroft/Motwani/Ullman.

. – p.14/31

Exercise 9.1

ε,ε −>ε

ε, ε−> $0, ε−> 0

1, ε−> 1

ε, $ −>ε

0, 0 −>ε1, 1−>ε

q p

rs

1. Write out the transition table for the abovePDA.

2. What is the stack used to store?

3. What language does the PDA accept? . – p.15/31

Exercise 9.2x−>xxi,

e,x −>εε,ε −>x$

ε, x −>ε

q p

r

1. What strings of length 1, 2, 3 or 4 does themachine accept?

2. Let i stand for “if” and e stand for “else”. Whatdoes the PDA do?

. – p.16/31

A context sensitive grammar

. – p.17/31

Context-sensitivegrammar: definition

Definition. A formal grammar is called contextsensitive if every production rule is of the form

αAβ → αγβ

where α, β ∈ (Σ ∪ N)∗, γ ∈ (Σ ∪ N)\{ε} and A isa non-terminal or of the form

S → ε

if and only if S is not on the right hand side of anyrule.

. – p.18/31

31

Page 32: Uses of finite automata - University of Bathag248/CM10020/Automata.pdfWe shall study finite automata first, ... exactly one) following state, δ ... This DFA accepts 0011, but not

Example

The grammar G with non-terminal symbolsN = {S,B}, terminal symbols Σ = {a, b, c}, andproductions

S → abc

S → aSBc

cB → Bc

bB → bb

This grammar describes the language L =

{anbncn|n ≥ 1}. We cannot build a PDA to acceptthis language (see Hopcroft/Motwani/Ullman). . – p.19/31

Chomsky hierarchy:in summary

We have seen a lot about the regularlanguages - see the big picture overhead.

We have seen a context free language that isnot regular - last lecture.

We have seen machines that accept contextfree languages (although we have not provedthis).

We are not going to consider the class ofcontext sensitive grammars in any detail butwe have seen an example of one.

What about unrestricted grammars? . – p.20/31

32