Regular and Context-Free Grammars - Computer Science rlc/Courses/Theory/ClassNotes/... ·...

Click here to load reader

  • date post

    13-Mar-2020
  • Category

    Documents

  • view

    3
  • download

    0

Embed Size (px)

Transcript of Regular and Context-Free Grammars - Computer Science rlc/Courses/Theory/ClassNotes/... ·...

  • Regular and Context-Free Grammars

    Foundations of Computer Science Theory

  • Grammars Define Languages

    • A grammar is a set of rules that are stated in terms of two alphabets: • terminal alphabet, Σ, that contains the symbols that

    make up the strings in L(G), and • nonterminal alphabet, the elements of which will

    function as working symbols that will be used while the grammar is operating • These symbols will disappear by the time the grammar

    finishes its job and generates a string

    • A grammar has a unique start symbol, often called S

  • Rewrite Systems and Grammars

    • A rewrite system (or production system or rule- based system) is: – a list of rules, and – an algorithm for applying them

    • Each rule has a left-hand side and a right-hand side

    • Example rules: S → aSb aS → ε aSb → bSabSa

  • Simple Rewrite

    simple-rewrite(R: rewrite system, w: initial string) : 1. Set working-string to w 2. Until told by R to halt do:

    • Match the left-hand-side of some rule against some part of the working-string

    • Replace the matched part of the working-string with the right-hand-side of the rule that was matched

    3. Return the working-string

  • Simple Rewrite Example

    • If the re-write rules are: S → aSb, S → bSa, S → ε, then a derivation of working string w = aaabbb is: S ⇒ aSb ⇒ aaSbb ⇒ aaaSbbb ⇒ aaabbb This derivation uses the first rule three times, and then the third rule to derive aaabbb.

  • Rewrite System Formalism

    • A rewrite system formalism specifies: – The form of the rules

    • Variables may be used: aSa → aa • Regular expressions may be used: ab*ab*a → aaa

    – How do I know which rule to apply? • Any rule that matches may be tried, or • Rules have to tried in the order given, where the first

    one that matches is the one that is applied – When should I stop applying rules?

    • See next slide

  • When To Stop

    The derivation process ends when one of the following things happens:

    1. The working string no longer contains any non-terminal symbols (including, as a special case, when the working string is ε)

    Example: S ⇒ aSb ⇒ aaSbb ⇒ aabb

    2. There are non-terminal symbols in the working string but none of them appears on the left-hand side of any rule in the grammar. In this case, we have a blocked or non-terminated derivation but no generated string.

    Example: Rules: S → aSb, S → bTa, and S → ε Derivations: S ⇒ aSb ⇒ abTab ⇒ [blocked]

  • When To Stop

    It is possible that neither (1) nor (2) is achieved. Example: If G contains only the rules S → Ba and B → bB, with

    S the start symbol, then all derivations will proceed as:

    S ⇒ Ba ⇒ bBa ⇒ bbBa ⇒ bbbBa ⇒ bbbbBa ⇒ ...

    This grammar generates the language ∅

  • Generating Many Strings

    • A rule might match in more than one way • If S → aTTb, T → bTa, and T → ε, then a

    derivation would begin as S ⇒ aTTb ⇒ • Notice there are four choices at the next step:

    1. S ⇒ aTTb ⇒ abTaTb ⇒ 2. S ⇒ aTTb ⇒ aTb ⇒ 3. S ⇒ aTTb ⇒ aTbTab ⇒ 4. S ⇒ aTTb ⇒ aTb ⇒

  • Regular Grammars

    • A regular grammar G is a 4-tuple (V, Σ, R, S) where: – V is the rule alphabet which contains non-terminals – Σ is the set of terminals – R is a finite set of rules of the form:

    X → Y – S is the start symbol (a non-terminal)

  • Regular Grammars

    • In a right-linear regular grammar, all rules must have a left hand side that is a single non- terminal, and a right hand side that is either zero or more terminals, or zero or more terminals followed by a single non-terminal – Legal right-linear regular grammar: S → ε, S → a,

    S → abb, S → T, S → aT, S → abaT – Not legal right-linear regular grammar: S → aSa, aSa → T, S → TA

  • Regular Grammars

    • A grammar is said to be right-linear if all productions are of the form S → x or S → xT, where x is any string of zero or more terminals

    • A grammar is said to be left-linear if all productions are of the form S → x or S → Tx, where x is any string of zero or more terminals

    • A regular grammar is one that is either right- linear or left-linear

  • Regular Grammars

    • The grammar G1 = ({S}, {a,b}, R1, S} with R1 as: S → abS | a is right-linear • Using G1 we can derive the string ababa using

    the sequence: S ⇒ abS ⇒ ababS ⇒ ababa • From this derivation, we can see that L(G1) is

    the language denoted by the regular expression (ab)*a

  • Regular Grammars

    • The grammar G2 = ({S, A, B}, {a,b}, R2, S} with R2 as:

    S → Aab A → Aab | B B → a is left-linear • L(G2) is the language denoted by the regular

    expression L(aab(ab)*)

  • Non-Regular Grammars

    • A linear grammar is a grammar in which at most one variable can occur on the right side of any production, without restriction on the position of this variable

    • A regular grammar is always linear, but not all linear grammars are regular

  • Non-Regular Grammars

    The grammar G = ({S, A, B}, {a,b}, R, S} with R: S → A A → aB | ε B → Ab is not regular. Although every production is either right-linear or left-linear, the grammar itself is neither right-linear nor left-linear, and therefore it is not regular (but it is linear).

  • L = {w ∈ {a, b}* : |w| is even} ((aa) ∪ (ab) ∪ (ba) ∪ (bb))*

    S → aT | ε S → bT | ε T → aS T → bS

    Regular Grammars and DFAs

    The job of the non-terminal S is to generate an even length string. It does this either by generating the empty string or by generating a single character and then creating T. The job of T is to generate an odd length string. It does this by generating a single character and then creating S.

  • Regular Grammars and Regular Languages

    • Theorem: The class of languages that can be defined with regular grammars is exactly the regular languages

    • Proof: By two constructions (see the next slide)

  • Regular grammar → NFA: 1. Create the NFA with a separate state for each non-terminal 2. The start state is the state corresponding to S 3. If there are any rules in R of the form X → w, for some w ∈ Σ, create a new

    state labeled # and add a transition from X to # labeled w 5. For each rule of the form X → wY, add a transition from X to Y labeled w 6. For each rule of the form X → ε, mark state X as accepting 7. Mark state # as accepting NFA → Regular grammar: This proof is effectively the reverse of the

    above construction.

    Regular Grammars and Regular Languages

  • RG to NFA Construction L = {w ∈ {a, b}* : w ends with the pattern aaaa}. S → aS /* An arbitrary number of a’s and b’s can be generated S → bS before the pattern starts S → aB /* Generate the first a of the pattern B → aC /* Generate the second a of the pattern C → aD /* Generate the third a of the pattern D → a /* Generate the last a of the pattern and quit

  • S → ε A → bA C → aC S → aB A → cA C → bC S → aC A → ε C → ε S → bA B → aB S → bC B → cB S → cA B → ε S → cB

    NFA to RG Construction L = {w ∈ {a, b,c}* : there is a symbol of Σ not appearing in w}

  • Satisfying Multiple Criteria Let L = {w ∈ {a,b}* : w contains an odd number of a’s and w ends in a} We can write a regular grammar G that defines L. G will contain three non-terminals, each with a unique function (corresponding to the states of a simple NFA that accepts L). In any partially derived string, if the remaining non-terminal is:

    • S, then the number of a’s so far is even. We don’t have to worry about whether the string ends in a since, to derive a string in L, it will be necessary to generate at least one more a anyway

    • T, then the number of a’s so far is odd and the derived string ends in a • X, then the number of a’s so far is odd and the derived string does not

    end in a

  • Satisfying Multiple Criteria Since only T captures the situation in which the number of a’s so far is odd and the derived string ends in a, T is the only non-terminal that can generate ε. G contains the following rules:

    S → bS /* Initial b’s don’t matter S → aT /* After this, the number of a’s is odd and the generated string ends in a T → ε /* Since the number of a’s is odd, and the string ends in a, it’s OK to quit T → aS /* After this, the number of a’s will be even again T → bX /* After this, the number of a’s is still odd, but the