# Regular and Context-Free Grammars - Computer Science rlc/Courses/Theory/ClassNotes/... ·...

date post

13-Mar-2020Category

## Documents

view

3download

0

Embed Size (px)

### Transcript of Regular and Context-Free Grammars - Computer Science rlc/Courses/Theory/ClassNotes/... ·...

Regular and Context-Free Grammars

Foundations of Computer Science Theory

Grammars Define Languages

• A grammar is a set of rules that are stated in terms of two alphabets: • terminal alphabet, Σ, that contains the symbols that

make up the strings in L(G), and • nonterminal alphabet, the elements of which will

function as working symbols that will be used while the grammar is operating • These symbols will disappear by the time the grammar

finishes its job and generates a string

• A grammar has a unique start symbol, often called S

Rewrite Systems and Grammars

• A rewrite system (or production system or rule- based system) is: – a list of rules, and – an algorithm for applying them

• Each rule has a left-hand side and a right-hand side

• Example rules: S → aSb aS → ε aSb → bSabSa

Simple Rewrite

simple-rewrite(R: rewrite system, w: initial string) : 1. Set working-string to w 2. Until told by R to halt do:

• Match the left-hand-side of some rule against some part of the working-string

• Replace the matched part of the working-string with the right-hand-side of the rule that was matched

3. Return the working-string

Simple Rewrite Example

• If the re-write rules are: S → aSb, S → bSa, S → ε, then a derivation of working string w = aaabbb is: S ⇒ aSb ⇒ aaSbb ⇒ aaaSbbb ⇒ aaabbb This derivation uses the first rule three times, and then the third rule to derive aaabbb.

Rewrite System Formalism

• A rewrite system formalism specifies: – The form of the rules

• Variables may be used: aSa → aa • Regular expressions may be used: ab*ab*a → aaa

– How do I know which rule to apply? • Any rule that matches may be tried, or • Rules have to tried in the order given, where the first

one that matches is the one that is applied – When should I stop applying rules?

• See next slide

When To Stop

The derivation process ends when one of the following things happens:

1. The working string no longer contains any non-terminal symbols (including, as a special case, when the working string is ε)

Example: S ⇒ aSb ⇒ aaSbb ⇒ aabb

2. There are non-terminal symbols in the working string but none of them appears on the left-hand side of any rule in the grammar. In this case, we have a blocked or non-terminated derivation but no generated string.

Example: Rules: S → aSb, S → bTa, and S → ε Derivations: S ⇒ aSb ⇒ abTab ⇒ [blocked]

When To Stop

It is possible that neither (1) nor (2) is achieved. Example: If G contains only the rules S → Ba and B → bB, with

S the start symbol, then all derivations will proceed as:

S ⇒ Ba ⇒ bBa ⇒ bbBa ⇒ bbbBa ⇒ bbbbBa ⇒ ...

This grammar generates the language ∅

Generating Many Strings

• A rule might match in more than one way • If S → aTTb, T → bTa, and T → ε, then a

derivation would begin as S ⇒ aTTb ⇒ • Notice there are four choices at the next step:

1. S ⇒ aTTb ⇒ abTaTb ⇒ 2. S ⇒ aTTb ⇒ aTb ⇒ 3. S ⇒ aTTb ⇒ aTbTab ⇒ 4. S ⇒ aTTb ⇒ aTb ⇒

Regular Grammars

• A regular grammar G is a 4-tuple (V, Σ, R, S) where: – V is the rule alphabet which contains non-terminals – Σ is the set of terminals – R is a finite set of rules of the form:

X → Y – S is the start symbol (a non-terminal)

Regular Grammars

• In a right-linear regular grammar, all rules must have a left hand side that is a single non- terminal, and a right hand side that is either zero or more terminals, or zero or more terminals followed by a single non-terminal – Legal right-linear regular grammar: S → ε, S → a,

S → abb, S → T, S → aT, S → abaT – Not legal right-linear regular grammar: S → aSa, aSa → T, S → TA

Regular Grammars

• A grammar is said to be right-linear if all productions are of the form S → x or S → xT, where x is any string of zero or more terminals

• A grammar is said to be left-linear if all productions are of the form S → x or S → Tx, where x is any string of zero or more terminals

• A regular grammar is one that is either right- linear or left-linear

Regular Grammars

• The grammar G1 = ({S}, {a,b}, R1, S} with R1 as: S → abS | a is right-linear • Using G1 we can derive the string ababa using

the sequence: S ⇒ abS ⇒ ababS ⇒ ababa • From this derivation, we can see that L(G1) is

the language denoted by the regular expression (ab)*a

Regular Grammars

• The grammar G2 = ({S, A, B}, {a,b}, R2, S} with R2 as:

S → Aab A → Aab | B B → a is left-linear • L(G2) is the language denoted by the regular

expression L(aab(ab)*)

Non-Regular Grammars

• A linear grammar is a grammar in which at most one variable can occur on the right side of any production, without restriction on the position of this variable

• A regular grammar is always linear, but not all linear grammars are regular

Non-Regular Grammars

The grammar G = ({S, A, B}, {a,b}, R, S} with R: S → A A → aB | ε B → Ab is not regular. Although every production is either right-linear or left-linear, the grammar itself is neither right-linear nor left-linear, and therefore it is not regular (but it is linear).

L = {w ∈ {a, b}* : |w| is even} ((aa) ∪ (ab) ∪ (ba) ∪ (bb))*

S → aT | ε S → bT | ε T → aS T → bS

Regular Grammars and DFAs

The job of the non-terminal S is to generate an even length string. It does this either by generating the empty string or by generating a single character and then creating T. The job of T is to generate an odd length string. It does this by generating a single character and then creating S.

Regular Grammars and Regular Languages

• Theorem: The class of languages that can be defined with regular grammars is exactly the regular languages

• Proof: By two constructions (see the next slide)

Regular grammar → NFA: 1. Create the NFA with a separate state for each non-terminal 2. The start state is the state corresponding to S 3. If there are any rules in R of the form X → w, for some w ∈ Σ, create a new

state labeled # and add a transition from X to # labeled w 5. For each rule of the form X → wY, add a transition from X to Y labeled w 6. For each rule of the form X → ε, mark state X as accepting 7. Mark state # as accepting NFA → Regular grammar: This proof is effectively the reverse of the

above construction.

Regular Grammars and Regular Languages

RG to NFA Construction L = {w ∈ {a, b}* : w ends with the pattern aaaa}. S → aS /* An arbitrary number of a’s and b’s can be generated S → bS before the pattern starts S → aB /* Generate the first a of the pattern B → aC /* Generate the second a of the pattern C → aD /* Generate the third a of the pattern D → a /* Generate the last a of the pattern and quit

S → ε A → bA C → aC S → aB A → cA C → bC S → aC A → ε C → ε S → bA B → aB S → bC B → cB S → cA B → ε S → cB

NFA to RG Construction L = {w ∈ {a, b,c}* : there is a symbol of Σ not appearing in w}

Satisfying Multiple Criteria Let L = {w ∈ {a,b}* : w contains an odd number of a’s and w ends in a} We can write a regular grammar G that defines L. G will contain three non-terminals, each with a unique function (corresponding to the states of a simple NFA that accepts L). In any partially derived string, if the remaining non-terminal is:

• S, then the number of a’s so far is even. We don’t have to worry about whether the string ends in a since, to derive a string in L, it will be necessary to generate at least one more a anyway

• T, then the number of a’s so far is odd and the derived string ends in a • X, then the number of a’s so far is odd and the derived string does not

end in a

Satisfying Multiple Criteria Since only T captures the situation in which the number of a’s so far is odd and the derived string ends in a, T is the only non-terminal that can generate ε. G contains the following rules:

S → bS /* Initial b’s don’t matter S → aT /* After this, the number of a’s is odd and the generated string ends in a T → ε /* Since the number of a’s is odd, and the string ends in a, it’s OK to quit T → aS /* After this, the number of a’s will be even again T → bX /* After this, the number of a’s is still odd, but the