Regular Languages and Expressions

Surinder Kumar Jain,

University of Sydney

Automaton DFA NFA Ε-NFA CFG as a DFA Equivalence Minimal DFA

Expressions Definition Conversion from/to Automaton

Regular Langauges Pumping Lemma – proving regularness Closures Equivalence

A system with many states Can transition from one state to another Usually caused by external input Set of states is finite System is in one state at any given time

Mathematical Definition of a DFA A = (Q, Σ,δ, q0,F) Q : States, DFA is in one of these finite states at

any time. Σ : Input symbols, DFA changes its state from

one state to another state on consuming an input symbol.

δ : Transition function. Given a state and an input symbols, gives the next

DFA state Function over QxΣ -> Q.

q0 : Initial DFA state F : Accepting states. Once DFA reaches one of

these states, it may not accept any more input symbols.

Q = { waiting, pending, rejected, approved, paid }

Σ = {receive, reject, accept, pay }

δ : (waiting -> receive -> pending), (pending -> reject -> rejected), (pending -> accept -> accepted), (accepted -> pay -> paid)

q0 : {waiting}

F : { rejected, paid }

Paid

Paid

Waiting

Pending

Accepted Paid

Rejected

start

receive

reject

accept pay

Q = { waiting, pending, rejected, approved, paid }Σ = {receive, reject, accept, pay }δ : (waiting -> receive -> pending), (pending -> reject -> rejected), (pending -> accept -> accepted), (accepted -> pay -> paid) q0 : {waiting} F : { rejected, paid }

Set of alphabets Concatenation (joining) Strings A subset of strings is a language

A DFA defines a languageAlphabet set is the set of input symbolsConcatenation - one symbol follows anotherAcceptance – sequence of symbols takes

DFA from start state to one of the accepting states

Five-tuple like a DFA, (Q, Σ,δ, q0,F) Transition function returns a set not one

state Several outgoing arcs with same symbol In several states at the same time Language of NFA

Any NFA language can be described by some DFA

Adding non-determinism does not give any thing more

Why use NFAs then : Easier to make for some languages May have fewer states and less complex

Algorithm to convert NFA to DFA For n state NFA,DFA may have up to 2n states Can throw away inaccessible states Observation : DFA has practically the same

number of states as NFA though it often has more transitions

For an NFA, N = {Q, Σ, δ, q0, F}, Construct the DFA, D = {Qd, Σ, δd, {q0}, Fd}

Qd = Powerset of Q δd(S, a) = Up in S δ(p,a) for every S in Qd. Fd = S : S is subset of Q and S has an accepting

state of NFA

DFA operates on one state at a time, NFA operates on sets of states.

Given a state, NFA gives a set of new states Make all possible sets of DFA states as NFA states Transit from one set of states to a new set of all possible state

set Any set with an accepting state is the accepting state in NFA

O(2n) (number of subsets of a set) Efficient algorithm

Do not construct the entire power setStart with start stateOnly construct subsets that can reach an

accepting state from the start stateThe number of states in DFA is much less

than 2n. DFA has practically the same number of

states as NFA though it often has more transitions

Includes ε (the empty string, not in alphabet set) as a transition

ε is identity in concatenation a.ε = ε.a = a for all a Spontaneous transition without an input

An ε-NFA language can be described by some NFA

Every NFA can be described by some DFA

Adding ε transition does not give any thing more

Why use ε-NFAs then :Easier to make for some languagesUseful in proving equivalence of languages

Conversion aims to remove ε transitions Define a new set of states

ε are contained inside the setNo ε arc leaves or enters the new set of states

Epsilon closure (eclose)For a state, set of all states reachable

spontaneously Follow the ε arcs recursively and include reachable

states in the epsilon closure

For an ε-NFA, N = {Q, Σ, δ, q0, F}, Construct the DFA, D = {Qd, Σ, δd, {eclose(q0)}, Fd}

Qd = { eclose(q) | q = eclose(q) and q in Q } δd(S, a) = Up in S δ(p,eclose(a)) for every S in Qd. Fd = S : S is subset of Q and S has an accepting

state of NFA

DFA operates on one state at a time, ε-NFA operates on sets of states with no ε transition leaving the set

Make all eclose sets as DFA states Transit from one set of states to a new set of all eclose state set Any set with an accepting state is the accepting state in NFA

An imperative program can be represented as a Control Flow Graph (CFG) with statements at nodes and predicates at edges

It can be converted into a CFG with both statements and predicates at edges by pushing node statements up incoming

edges Such a CFG is a DFA

Program points are States Statements are input symbols that change

program state from program point to point

Algebraic expression to denote languages

Composed of symbols “ε”, “Ø”, “+”, “*”, “.”, “(“, “)” and alphabets

The language is generated using rules :L(ε) = empty set L(Ø) = empty set L(a) = a for all alphabets a L(p+q) = L(p) U L(q) L(p.q) = { p’.q’ | p’ in L(p) & q’ in L(q) } L(p*) = { qn | q in L(p) and n >= 0 }, q0= ε, qk=q.qk-1

a+b.cThe language generated is :{ a, b.c }

a.b.c*.dthe language generated is :{ a.b.d, a.b.c.d, a.b.c.c.d, a.b.c.c.c.d, … }

A finite way to express an infinite language

DEFINITION

Two regular expression (or automaton) are EQUAL if they both generate same languages

Thus (a.b)* + (b.a)* + a.(b.a)* + b.(b.a)*= (ε + b).(a.b)*.(ε+a)

p + q = q + p (p + q) + r = p + (q + r) (p.q).r = p.(q.r) Ø + p = p + Ø = p ε.p = p.ε = p Ø.p = p.Ø = Ø p.(q=r) = p.q + p.r (p + q).r = p.r + q.r p + p = p (p*)* = p*

Ø* = ε ε* = ε p.p* = p*.p (p + q)* = (p*.q*)*

Every language defined by a finite automaton is also

defined by some regular expressiondefined by a regular expression is also

defined by some DFA

Hopcroft’s formula Rij

(k) = Rij(k-1)+Rik

(k-1).(Rkk(k-1))*.Rkj

(k-1)

Rij(n) is the regular expression of all paths from

i to j. (n is the number of states) States are sorted in some order and numbered

1 to n Rij

(k) is regular expression of all paths from i to j passing thru nodes whose sort order is less than k

Computed for all i,j for k=0, then k=1,…,k=n Rs,f1

(n)+…+Rs,fk(n) is the regular expression of

the DFA s is the start state, f1,…,fk are accepting states, n is

the number of states.

Hopcroft formula is O(n34n), n3 to compute the table and 4n as size of regular expression grows by 4

every time. In practice it is close to O(n3)

By simplifying the regular expression at every step and

using judicious algorithm avoiding recomputation of Rkk

(k) Most DFAs have almost n and not 2n

accessible states A faster state elimination method close

to O(n2) is also available

Regular expression is converted to ε-NFA ε-NFA can the be converted to NFA and to DFA RE to ε-NFA conversion rules :

ε -> One edge (two state) DFA with ε transition Ø -> Two state DFA with no edges a -> Two state with “a” transition + -> A new start/accept statejoining two arguments of + in parallel . -> Accept of first is start of second * -> An ε edge joining star/accept of argument and a new start/accept state

Convert resulting ε-NFA to a DFA

Augment regular expression r to (r).# Position number for each occurrence of

alphabet Compute for each node of syntax tree

nullable (ε in the language)firstpos (set of possible first alphabets) lastpos (set of possible last alphabets)

Compute for each position followpos (set of possible next alphabet

after this position) Construct the DFA

Unix text search, search matching patterns (grep)

Lexical/Parser analysisParse text against a regular expressionfind set of first tokens at this expression

rootfind set of last tkens at this expression rootcan the expression at this root be null setfind set of next tokens after an alphabet

position in a regular expression Efficient search of patterns in very large

repository (web text search)

DEFINITION

A language (a set of strings) is defined to be a regular language if it can be defined by a finite automaton

by a DFA orby an NFA orby an ε-NFA or by a regular expression

Four different ways to describe a regular language

If L is a regular language then there exists integer n such that for every string w in Lwe can break w into x, y, z such that w=x.y.z

y ε |x.y| =< n x.yk.z is in L (for all k >= 0)

Proof based on For a DFA of length n any string of length > n must revisit a state

Used to prove that a language is not regular

Language is a set of string over finite alphabets

Language operators : Union of two languages L(A B) = L(A) L(B) - re Intersection Concatenation L(A.B) = { a.b | a in A, b in B} Kleene Closure L(A*) = { an | a in A, n >= 0 }

a0 = ε for all a and an = an-1 Compliment L(A’) = { a | a not in A } (with respect to some

overall alphabet set) - dfa Difference L(A-B) = L(A) – L(B) - dfa switch q0 F Reversal L (A) = { ak.ak-1…a1 | a1…ak-1.ak in A } Homomorphism – replace an alphabet with another regular

expression Inverse homomorphism

Is the language described empty? Is a particualr string in the described

language? Do two different of languages actually

describe the same language?

Decision properties may require conversion between various forms.

Can the conversion be done in reasonable time?Conversion Complexity

Computing ε closures O(n3) Warshall’s O(n)

Subset construction O(2n)

NFA to DFA O(n32n) (In practice O(n3s)

DFA to NFA conversion O(n)

NFA/DFA to Regular Expression

O(n34n) (worst case) (Actual is much less)

Regular Expression to εNFA

O(n)

Regular Expression to NFA

O(n3)

Regular Expression to DFA

O(n34n^32^n)

Equivalence of two states States p and q in an automaton are

Defined to be equivalent ifFor all input strings applied at state p or qp ends up in an accepting state if and only ifq also ends up in an accepting state

The accepting state reached by p does not have to be same accepting state as that reached by q

If two states p and q are equivalent we can combine them together into a

single state it wont affect the language accepted by

the DFA This process of combining states

together is called Minimization Table-filling algorithm can find if two

states are equivalent or not. Complexity O(n2)

Non-equivalent pairs are distinguishable

Minimum DFA is unique Eliminate all states not reachable from start Determine which states are equivalent Partition states into blocks of equivalent states Equivalence is transitive Thus no state is in two blocks

Equivalence of two Regular Languages Convert them into their minimum DFAs and check for isomorphism

Union method Make a minimum DFA of the union of the two Start state of the two original DFAs must be

equivalent if and only if DFAs are equivalent

Regular Languages and Expressions

Documents

Transcript of Regular Languages and Expressions