Regular Languages and Expressions

Click here to load reader

download Regular Languages and Expressions

of 34

  • date post

    04-Feb-2016
  • Category

    Documents

  • view

    31
  • download

    0

Embed Size (px)

description

Regular Languages and Expressions. Surinder Kumar Jain, University of Sydney. Regular Languages & Expressions. Automaton DFA NFA Ε - NFA CFG as a DFA Equivalence Minimal DFA Expressions Definition Conversion from/to Automaton Regular Langauges - PowerPoint PPT Presentation

Transcript of Regular Languages and Expressions

  • Surinder Kumar Jain, University of Sydney

  • AutomatonDFANFA-NFACFG as a DFAEquivalenceMinimal DFAExpressionsDefinitionConversion from/to AutomatonRegular LangaugesPumping Lemma proving regularnessClosuresEquivalence

  • A system with many statesCan transition from one state to anotherUsually caused by external inputSet of states is finiteSystem is in one state at any given time

  • Mathematical Definition of a DFA A = (Q, ,, q0,F)Q : States, DFA is in one of these finite states at any time. : Input symbols, DFA changes its state from one state to another state on consuming an input symbol. : Transition function. Given a state and an input symbols, gives the next DFA stateFunction over Qx -> Q.q0 : Initial DFA stateF : Accepting states. Once DFA reaches one of these states, it may not accept any more input symbols.

  • Q = { waiting, pending, rejected, approved, paid }

    = {receive, reject, accept, pay }

    : (waiting -> receive -> pending), (pending -> reject -> rejected), (pending -> accept -> accepted), (accepted -> pay -> paid)

    q0 : {waiting}

    F : { rejected, paid }

  • PaidPaidWaitingPendingAcceptedPaidRejectedstartreceiverejectacceptpayQ = { waiting, pending, rejected, approved, paid } = {receive, reject, accept, pay } : (waiting -> receive -> pending), (pending -> reject -> rejected), (pending -> accept -> accepted), (accepted -> pay -> paid) q0 : {waiting} F : { rejected, paid }

  • Set of alphabetsConcatenation (joining)StringsA subset of strings is a language

    A DFA defines a languageAlphabet set is the set of input symbolsConcatenation - one symbol follows anotherAcceptance sequence of symbols takes DFA from start state to one of the accepting states

  • Five-tuple like a DFA, (Q, ,, q0,F)Transition function returns a set not one stateSeveral outgoing arcs with same symbol In several states at the same time Language of NFA

  • Any NFA language can be described by some DFAAdding non-determinism does not give any thing moreWhy use NFAs then :Easier to make for some languagesMay have fewer states and less complexAlgorithm to convert NFA to DFAFor n state NFA,DFA may have up to 2n statesCan throw away inaccessible statesObservation : DFA has practically the same number of states as NFA though it often has more transitions

  • For an NFA, N = {Q, , , q0, F}, Construct the DFA, D = {Qd, , d, {q0}, Fd}

    Qd = Powerset of Qd(S, a) = Up in S (p,a) for every S in Qd.Fd = S : S is subset of Q and S has an accepting state of NFA

    DFA operates on one state at a time, NFA operates on sets of states.Given a state, NFA gives a set of new statesMake all possible sets of DFA states as NFA statesTransit from one set of states to a new set of all possible state setAny set with an accepting state is the accepting state in NFA

  • O(2n) (number of subsets of a set)Efficient algorithmDo not construct the entire power setStart with start stateOnly construct subsets that can reach an accepting state from the start stateThe number of states in DFA is much less than 2n. DFA has practically the same number of states as NFA though it often has more transitions

  • Includes (the empty string, not in alphabet set) as a transition is identity in concatenationa. = .a = a for all aSpontaneous transition without an input

  • An -NFA language can be described by some NFAEvery NFA can be described by some DFAAdding transition does not give any thing moreWhy use -NFAs then :Easier to make for some languagesUseful in proving equivalence of languages

  • Conversion aims to remove transitionsDefine a new set of states are contained inside the setNo arc leaves or enters the new set of states

    Epsilon closure (eclose)For a state, set of all states reachable spontaneously Follow the arcs recursively and include reachable states in the epsilon closure

  • For an -NFA, N = {Q, , , q0, F}, Construct the DFA, D = {Qd, , d, {eclose(q0)}, Fd}

    Qd = { eclose(q) | q = eclose(q) and q in Q }d(S, a) = Up in S (p,eclose(a)) for every S in Qd.Fd = S : S is subset of Q and S has an accepting state of NFA

    DFA operates on one state at a time, -NFA operates on sets of states with no transition leaving the setMake all eclose sets as DFA statesTransit from one set of states to a new set of all eclose state setAny set with an accepting state is the accepting state in NFA

  • An imperative program can be represented as a Control Flow Graph (CFG) withstatements at nodes and predicates at edgesIt can be converted into a CFG with both statements and predicates at edges by pushing node statements up incoming edgesSuch a CFG is a DFAProgram points are StatesStatements are input symbols that change program state from program point to point

  • Algebraic expression to denote languagesComposed of symbols , , +, *, ., (, ) and alphabets The language is generated using rules :L() = empty setL() = empty setL(a) = a for all alphabets aL(p+q) = L(p) U L(q)L(p.q) = { p.q | p in L(p) & q in L(q) }L(p*) = { qn | q in L(p) and n >= 0 }, q0= , qk=q.qk-1

  • a+b.cThe language generated is :{ a, b.c }

    a.b.c*.dthe language generated is :{ a.b.d, a.b.c.d, a.b.c.c.d, a.b.c.c.c.d, }

    A finite way to express an infinite language

  • DEFINITION

    Two regular expression (or automaton) are EQUAL if they both generate same languages

    Thus (a.b)* + (b.a)* + a.(b.a)* + b.(b.a)*= ( + b).(a.b)*.(+a)

  • p + q = q + p(p + q) + r = p + (q + r)(p.q).r = p.(q.r) + p = p + = p.p = p. = p.p = p. = p.(q=r) = p.q + p.r(p + q).r = p.r + q.rp + p = p(p*)* = p** = * = p.p* = p*.p (p + q)* = (p*.q*)*

  • Every language defined by a finite automaton is also defined by some regular expressiondefined by a regular expression is also defined by some DFA

  • Hopcrofts formulaRij(k) = Rij(k-1)+Rik(k-1).(Rkk(k-1))*.Rkj(k-1)Rij(n) is the regular expression of all paths from i to j. (n is the number of states)States are sorted in some order and numbered 1 to nRij(k) is regular expression of all paths from i to j passing thru nodes whose sort order is less than kComputed for all i,j for k=0, then k=1,,k=nRs,f1(n)++Rs,fk(n) is the regular expression of the DFAs is the start state, f1,,fk are accepting states, n is the number of states.

  • Hopcroft formula is O(n34n), n3 to compute the table and 4n as size of regular expression grows by 4 every time.In practice it is close to O(n3)By simplifying the regular expression at every step and using judicious algorithm avoiding recomputation of Rkk(k) Most DFAs have almost n and not 2n accessible statesA faster state elimination method close to O(n2) is also available

  • Regular expression is converted to -NFA-NFA can the be converted to NFA and to DFARE to -NFA conversion rules : -> One edge (two state) DFA with transition -> Two state DFA with no edgesa -> Two state with a transition+ -> A new start/accept statejoining two arguments of + in parallel. -> Accept of first is start of second* -> An edge joining star/accept of argument and a new start/accept stateConvert resulting -NFA to a DFA

  • Augment regular expression r to (r).#Position number for each occurrence of alphabetCompute for each node of syntax treenullable ( in the language)firstpos (set of possible first alphabets)lastpos (set of possible last alphabets)Compute for each positionfollowpos (set of possible next alphabet after this position)Construct the DFA

  • Unix text search, search matching patterns (grep)Lexical/Parser analysisParse text against a regular expressionfind set of first tokens at this expression rootfind set of last tkens at this expression rootcan the expression at this root be null setfind set of next tokens after an alphabet position in a regular expression Efficient search of patterns in very large repository (web text search)

  • DEFINITION

    A language (a set of strings) is defined to be a regular language ifit can be defined by a finite automaton by a DFA orby an NFA orby an -NFA orby a regular expressionFour different ways to describe a regular language

  • If L is a regular language then there existsinteger n such thatfor every string w in Lwe can break w into x, y, z such that w=x.y.zy |x.y| =< nx.yk.z is in L (for all k >= 0)Proof based onFor a DFA of length nany string of length > nmust revisit a stateUsed to prove that a language is not regular

  • Language is a set of string over finite alphabetsLanguage operators :Union of two languages L(A B) = L(A) L(B) - reIntersection Concatenation L(A.B) = { a.b | a in A, b in B}Kleene Closure L(A*) = { an | a in A, n >= 0 }a0 = for all a and an = an-1 Compliment L(A) = { a | a not in A } (with respect to some overall alphabet set) - dfaDifference L(A-B) = L(A) L(B) - dfa switch q0 FReversal L (A) = { ak.ak-1a1 | a1ak-1.ak in A } Homomorphism replace an alphabet with another regular expressionInverse homomorphism

  • Is the language described empty?Is a particualr string in the described language?Do two different of languages actually describe the same language?

  • Decision properties may require conversion between various forms.Can the conversion be done in reasonable time?

    ConversionComplexityComputing closures O(n3) Warshalls O(n)Subset constructionO(2n)NFA to DFAO(n32n) (In practice O(n3s)DFA to NFA conversionO(n)NFA/DFA to Regular ExpressionO(n34n) (worst case) (Actual is much less)Regular Expression to NFAO(n)Regular Expression to NFAO(n3)Regular Expression to DFAO(n34n^32^n)

  • Equivalence of two