MELJUN CORTES Automata Theory (Automata9)

22
CSC 3130: Automata theory and formal languages Normal forms and parsing Fall 2008 MELJUN P. CORTES, MELJUN P. CORTES, MBA,MPA,BSCS,ACS MBA,MPA,BSCS,ACS MELJUN CORTES MELJUN CORTES

description

MELJUN CORTES Automata Theory (Automata9)

Transcript of MELJUN CORTES Automata Theory (Automata9)

Page 1: MELJUN CORTES Automata Theory (Automata9)

CSC 3130: Automata theory and formal languages

Normal forms and parsing

Fall 2008MELJUN P. CORTES, MELJUN P. CORTES, MBA,MPA,BSCS,ACSMBA,MPA,BSCS,ACS

MELJUN CORTESMELJUN CORTES

Page 2: MELJUN CORTES Automata Theory (Automata9)

Testing membership and parsing

• Given a grammar

• How can we know if a string x is in its language?

• If so, can we reconstruct a parse tree for x?

S → 0S1 | 1S0S1 | TT → S | e

Page 3: MELJUN CORTES Automata Theory (Automata9)

First attempt

• Maybe we can try all possible derivations:

S → 0S1 | 1S0S1 | TT → S | ε x = 00111

S 0S1

1S0S1

T

00S1101S0S110T1

S

ε

10S10S1...

when do we stop?

Page 4: MELJUN CORTES Automata Theory (Automata9)

Problems

• How do we know when to stop?

S → 0S1 | 1S0S1 | TT → S | ε x = 00111

S 0S1

1S0S1

00S1101S0S110T110S10S1

...

when do we stop?

Page 5: MELJUN CORTES Automata Theory (Automata9)

Problems

• Idea: Stop derivation when length exceeds |x|

• Not right because of ε-productions

• We might want to eliminate ε-productions too

S → 0S1 | 1S0S1 | TT → S | ε x = 01011

S ⇒ 0S1 ⇒ 01S0S11 ⇒ 01S011 ⇒ 010111 3 7 6 5

Page 6: MELJUN CORTES Automata Theory (Automata9)

Problems

• Loops among the variables (S → T → S) might make us go forever

• We might want to eliminate such loops

S → 0S1 | 1S0S1 | TT → S | ε x = 00111

Page 7: MELJUN CORTES Automata Theory (Automata9)

Unit productions

• A unit production is a production of the form

where A1 and A2 are both variables

• Example

A1 → A2

S → 0S1 | 1S0S1 | TT → S | R | εR → 0SR

grammar: unit productions:

S T

R

Page 8: MELJUN CORTES Automata Theory (Automata9)

Removal of unit productions

• If there is a cycle of unit productions

delete it and replace everything with A1

• Example

A1 → A2 → ... → Ak → A1

S → 0S1 | 1S0S1 | TT → S | R | εR → 0SR

S T

R

S → 0S1 | 1S0S1S → R | εR → 0SR

T is replaced by S in the {S, T} cycle

Page 9: MELJUN CORTES Automata Theory (Automata9)

Removal of unit productions

• For other unit productions, replace every chain

by productions A1 → α,... , Ak → α

• Example

A1 → A2 → ... → Ak → α

S → R → 0SR is replaced by S → 0SR, R → 0SR

S → 0S1 | 1S0S1 | R | εR → 0SR

S → 0S1 | 1S0S1 | 0SR | εR → 0SR

Page 10: MELJUN CORTES Automata Theory (Automata9)

Removal of ε-productions

• A variable N is nullable if there is a derivation

• How to remove ε-productions (except from S)

Find all nullable variables N1, ..., Nk

For i = 1 to kFor every production of the form A → αNiβ,

add another production A → αβIf Ni → ε is a production, remove it

If S is nullable, add the special production S → ε

N ⇒ ε*

Page 11: MELJUN CORTES Automata Theory (Automata9)

Example

• Find the nullable variables

S → ACDA→ aB → εC → ED | εD → BC | bE → b

B C D

nullable variablesgrammar

Find all nullable variables N1, ..., Nk

Page 12: MELJUN CORTES Automata Theory (Automata9)

Finding nullable variables

• To find nullable variables, we work backwards– First, mark all variables A s.t. A → ε as nullable– Then, as long as there are productions of the form

where all of A1,…, Ak are marked as nullable, mark A as nullable

A → A1… Ak

Page 13: MELJUN CORTES Automata Theory (Automata9)

Eliminating ε-productions

S → ACDA→ aB → εC → ED | εD → BC | bE → b

nullable variables: B, C, D

For i = 1 to kFor every production of the form A → αNiβ,

add another production A → αβIf Ni → ε is a production, remove it

D → CS → ADD → BD → εS → ACS → AC → E

Page 14: MELJUN CORTES Automata Theory (Automata9)

Recap

• After eliminating ε-productions and unit productions, we know that every derivation

doesn’t shrink in length and doesn’t go into cycles

• Exception: S → ε – We will not use this rule at all, except to check if ε ∈ L

• Note� ε-productions must be eliminated before unit productions

S ⇒ a1…ak where a1, …, ak are terminals*

Page 15: MELJUN CORTES Automata Theory (Automata9)

Example: testing membership

S → 0S1 | 1S0S1 | TT → S | ε

x = 00111

S → ε | 01 | 101 | 0S1 |10S1 | 1S01 | 1S0S1

S 01, 101

10S1

1S01

1S0S1

10011, strings of length ≥ 6

10101, strings of length ≥ 6

unit, ε-prod

eliminate

only strings of length ≥ 6

0S1 0011, 0101100S11strings of length ≥ 6

only strings of length ≥ 6

Page 16: MELJUN CORTES Automata Theory (Automata9)

Algorithm 1 for testing membership

• We can now use the following algorithm to check if a string x is in the language of G

Eliminate all ε-productions and unit productionsIf x = ε and S → ε, accept; else delete S → εLet X := S

While some new production P can be applied to XApply P to XIf X = x, acceptIf |X| > |x|, backtrack

If no more productions can be applied to X, reject

Page 17: MELJUN CORTES Automata Theory (Automata9)

Practical limitations of Algorithm I

• Previous algorithm can be very slow if x is long

• There is a faster algorithm, but it requires that we do some more transformations on the grammar

G = CFG of the java programming languagex = code for a 200-line java program

algorithm might take about 10200 steps!

Page 18: MELJUN CORTES Automata Theory (Automata9)

Chomsky Normal Form

• A grammar is in Chomsky Normal Form if every production (except possibly S → ε) is of the type

• Conversion to Chomsky Normal Form is easy:

A → BC A → aor

A → BcDEreplace terminalswith new variables

A → BCDEC → c

break upsequenceswith new variables

A → BX1

X1 → CX2

X2 → DEC → c

Page 19: MELJUN CORTES Automata Theory (Automata9)

Exercise

• Convert this CFG into Chomsky Normal Form:

S → ε |ADDA

A → a

C → c

D → bCb

Page 20: MELJUN CORTES Automata Theory (Automata9)

Algorithm 2 for testing membership

S → AB | BCA → BA | aB → CC | bC → AB | a

x = baaba

Idea: We generate each substring of x bottom up

ab b aa

ACB B ACAC

BSA SASC

B– B

SAC–

SAC

Page 21: MELJUN CORTES Automata Theory (Automata9)

Parse tree reconstruction

S → AB | BCA → BA | aB → CC | bC → AB | a

x = baabaab b aa

ACB B ACAC

BSA SASC

B– B

SAC–

SAC

Tracing back the derivations, we obtain the parse tree

Page 22: MELJUN CORTES Automata Theory (Automata9)

Cocke-Younger-Kasami algorithm

For i = 1 to k If there is a production A → xi

Put A in table cell iiFor b = 2 to k For s = 1 to k – b + 1 Set t = s + b For j = s to t If there is a production A → BC where B is in cell sj and C is in cell jt Put A in cell st

x1 x2 … xk

11 22 kk

12 23… …

1k

tablecells

s j t k1

b

Input: Grammar G in CNF, string x = x1…xk

Cell ij remembers all possible derivations of substring xi…xj