60-354, Theory of Computation Fall 2013asishm.myweb.cs.uwindsor.ca/cs354/F13/ch4.pdf · 60-354,...
Transcript of 60-354, Theory of Computation Fall 2013asishm.myweb.cs.uwindsor.ca/cs354/F13/ch4.pdf · 60-354,...
60-354, Theory of Computation Fall 2013
Asish Mukhopadhyay
School of Computer Science
University of Windsor
Pushdown Automata (PDA)
• PDA = ε-NFA + stack
• Acceptance
– ε-NFA enters a final state or
– Stack is empty
• Nondeterministic in nature
• More powerful than its deterministic counterpart
An example
• Design a PDA to accept the language
– L= {wwr | w in {0,1}*} by empty stack
• Main idea underlying the construction
– Keep stacking symbols from the input string
– Guess that the middle has been reached
– Now start emptying the stack, as long as stacktop and input symbol matches
Formal Description of a PDA
• A PDA is a 6-tuple: (Q, ∑, Λ, δ, q0, Z0) – Q is the set of states of the ε-NFA
– ∑ is the alphabet of the input string
– Λ is the stack alphabet
– δ is the transition function:
– q0 is the start state of the ε-NFA
– Z0 is the bottom of stack marker
*
2: QQ
PDA for the example language
• Q = {q0, q1}
• ∑ = {0, 1}
• Λ = {0, 1 ,Z0}
PDA for the example language
• δ():
– δ(q0, a, b) = {(q0, ab), (q1,ab)}, a ε{0,1}, bε{0,1,Z0}
– δ(q0, ε, Z0) = {(q1, Z0)} // accept empty string
– δ(q1, a, b) = {} , a, b ε {0,1}, a ≠ b
– δ(q1, a, a) = {(q1, ε)}, a ε {0,1}
– δ(q1, ε, Z0) = {(q1, ε)} // empty stack
Instantaneous Description (ID)
• An ID is a 3-tuple (q, α, β) describing the state of a PDA
– q is the current state of the NFA
– α is the remaining input string
– β is the contents of the stack
Transition
• (q, α, β) |- (q’, α’, β’)
– Useful for simulating moves of a PDA on an input string
• Example:
(q0, 1001, Z0) |- (q0, 001, 1Z0) |- (q1, 01, 01Z0) |- (q1, 1, 1Z0) |- (q1, ε, Z0) |- (q1, ε, ε)
Language accepted by a PDA
• L = {w | (q0, w, Z0) |-* (q, ε, ε) , where q is some state in Q}
Context-free grammars
• An example:
11
00
SS
SS
S
Productions generate the language L = {wwr | w in {0,1}*}
Context-free grammars
• S is a variable, 0 and 1 are terminal symbols
• A string in L is derived by starting with the symbol S and making a sequence of substitutions , replacing a variable by the right hand side of a production
Formal definition
• A context-free grammar is described by a 4-tuple: (V, T, P, S)
– V is the set of variables of the grammar
– T is the set of terminals
– P is the set of productions
– S is a special symbol in V, called the start symbol
– Productions are of the form A -> α, where α is a string over {V U T}* and A is variable in V
Derivations
• Define a relation on {V U T}* thus
– Let α and β be arbitrary strings over {V U T}* and
A -> γ be a production in P
– Then α A β α γ β
– Closure of is denoted by *
Example of a derivation
• S 1S1 10S01 1001
Language generated
• Let G be a context-free grammar
• L(G) = {w ε T* | S * w}
Design a context-free grammar
• L = {w | w ε {a,b}* and is not of the form zz}
• Claim: – The grammar
S -> AB|BA|C
A -> aAb | bAa | aAa | bAb
B -> aBb | bBa | aBa | bBb
A -> a
B -> b
C -> aCb | bCa | aCa | bCb | a | b generates L
A derivation in G
• S AB aAbB aaAabB aaAabbBb aaaabbbb
Another problem
• Design a context-free grammar for the language over {0,1} that consists of the set S
of all strings with twice as many 0’s as 1’s
Claim
• The following grammar generates exactly the strings with this property
010|100|001|100|001|010|| SSSSSSSSS
Claim (2)
• L(G) is contained in S
– In any derivation every application of a production other than S -> ε introduces two 0’s and a single 1.
– Since the property is vacuously true for an empty string, the derived string retains this property whenever the production S -> ε is used
– A formal inductive argument on the number of steps in a derivation can be easily given
Claim (3)
• S is contained in L(G)
– Any of the sequences 001, 010, 100 can be treated as a balanced pair of parentheses
– A string with the above property has an adjacent pair of 00’s when the length is more than 3
Claim (3)
• Completing an inductive argument – Assume inductively that every sequence of length 3n
(n >1) corresponds to a balanced sequence of parentheses
– Consider a sequence of length 3n + 3
– We can remove a sequence 001 or 100 from this sequence
– The residual sequence corresponds to a balanced sequence of parentheses and into this we can reinsert 001 or 100, each of which corresponds to a balanced parentheses pair
Example 20
• Design a cfg that generates the language L= { 0i1j | 2i = 3j+1, j = 1, 3, 5, …}
• Set j = 2k+1, k=0, 1, …
• 2i = 6k+4 or i =3k+2
• The strings are of the form: 03k+212k+1 , which can be written as 02 (03)k (12)k1
Example 20
• Grammar:
– S -> 00B1 ; B -> 000B11|ε
Canonical derivations
• In a derivation, productions can be applied in an arbitrary order
• In a leftmost (rightmost) derivation, we always replace the leftmost (rightmost) variable by its body in a production
Parse Tree
• Any derivation can be represented by a parse tree
Ambiguous grammar
• A cfg G is ambiguous
– if there exists more than one parse tree for a string in L(G)
• In terms of canonical derivations:
– more than one leftmost or rightmost derivation
Decision algorithm
• Is it decidable if a cfg G is ambiguous ?
• We need considerable infrastructure to answer this question
An ambiguous grammar G
• The productions in G are:
– E -> E + E | E * E | (E)|I
– I -> Ia | Ib | I0 | I1 | a | b | 0 | 1
• In G:
– a + b* a has two leftmost derivations (see courseware)
Disambiguating the grammar G
• The precedence of the operators * and + needs to be defined
• The new productions that take care of this are:
– E -> E + T | T
– T -> T * F | F
– F -> (E) | I
– I -> I0 | I1 | Ia | Ib| a | b | 0 | 1
Normal forms
• Chomsky Normal Form (CNF, for short)
– All productions are of the form:
• A -> BC or A -> a
– Neither B nor C can be the start symbol, S
CFG to PDA
• Let G = (V, T, P, S) be a cfg
• P = ({q}, T, V U T, δ, q, S) is a PDA that accepts L(G) by empty stack for
– δ() defined thus:
• For each variable A in V , – δ(q, ε, A ) = { (q, β) | A -> β is a production in G}
• For each terminal symbol a, – δ(q, a, a) = {(q, ε)}
PDA to CFG (1)
• Given PDA, A = (Q, ∑, Λ, δ, q0 , Z0)
• A CFG, G = (V, T, P, S) is constructed thus:
– V = S U {[pXq] | p, q ε Q, X ε Λ}
– The set of productions P includes
• S -> [q0Z0p] for every p ε Q
– Further, if (r, Y1Y2..Yk) ε δ(q, a, X), where a ε ∑ or
a = ε and k ≥ 0, then P includes the production
• [qXrk] -> a[rY1r1][r1Y2r2]…[rk-1Ykrk]
PDA to CFG (2)
• When k = 0, the production is [qXr] -> a
• See Example 22 in courseware
Deterministic PDA (DPDA) (1)
• For each q, a, X,
– δ(q, a, X) is of size at most 1
• When δ(q, a, X) is not empty,
– δ(q, ε, X) is empty
Deterministic PDA (DPDA) (2)
• Acceptance by empty stack and final state are not equivalent
• Equivalent under an additional condition is satisfied
Prefix language and DPDA
• L is a prefix language
– if for a pair of strings x and y in L, neither x nor y is a prefix of the other
• Theorem
– A DPDA accepts L by empty stack iff L is a prefix language that is accepted by some DPDA by final state1
DPDA language
• Languages accepted by DPDAs by final state are called DPDA languages
• Example of a DPDA language:
– Lwcwr = {wcwr | w in {0,1}*}
DPDA and CFLs (1)
• DPDA languages lie strictly between regular languages and context-free languages
• Given a regular language L, we can construct a DPDA that simulates the action of a DFA that accepts L simply by ignoring the stack
• Since the language Lwcwr is not regular, the
inclusion is strict (c is some fixed symbol in ∑)
DPDA and CFLs (2)
• DPDA languages are strictly included in the class of context free languages
– Example: Lwwr (note that this is not a prefix
language)
DPDA languages and ambiguity
• A language accepted by a DPDA (by final state or empty stack) has an unambiguous grammar
• However, not every language that has an unambiguous grammar is accepted by a DPDA
– Example: Lwwr