Download - MELJUN CORTES Automata Theory (Automata9)

Transcript
Page 1: MELJUN CORTES Automata Theory (Automata9)

CSC 3130: Automata theory and formal languages

Normal forms and parsing

Fall 2008MELJUN P. CORTES, MELJUN P. CORTES, MBA,MPA,BSCS,ACSMBA,MPA,BSCS,ACS

MELJUN CORTESMELJUN CORTES

Page 2: MELJUN CORTES Automata Theory (Automata9)

Testing membership and parsing

• Given a grammar

• How can we know if a string x is in its language?

• If so, can we reconstruct a parse tree for x?

S → 0S1 | 1S0S1 | TT → S | e

Page 3: MELJUN CORTES Automata Theory (Automata9)

First attempt

• Maybe we can try all possible derivations:

S → 0S1 | 1S0S1 | TT → S | ε x = 00111

S 0S1

1S0S1

T

00S1101S0S110T1

S

ε

10S10S1...

when do we stop?

Page 4: MELJUN CORTES Automata Theory (Automata9)

Problems

• How do we know when to stop?

S → 0S1 | 1S0S1 | TT → S | ε x = 00111

S 0S1

1S0S1

00S1101S0S110T110S10S1

...

when do we stop?

Page 5: MELJUN CORTES Automata Theory (Automata9)

Problems

• Idea: Stop derivation when length exceeds |x|

• Not right because of ε-productions

• We might want to eliminate ε-productions too

S → 0S1 | 1S0S1 | TT → S | ε x = 01011

S ⇒ 0S1 ⇒ 01S0S11 ⇒ 01S011 ⇒ 010111 3 7 6 5

Page 6: MELJUN CORTES Automata Theory (Automata9)

Problems

• Loops among the variables (S → T → S) might make us go forever

• We might want to eliminate such loops

S → 0S1 | 1S0S1 | TT → S | ε x = 00111

Page 7: MELJUN CORTES Automata Theory (Automata9)

Unit productions

• A unit production is a production of the form

where A1 and A2 are both variables

• Example

A1 → A2

S → 0S1 | 1S0S1 | TT → S | R | εR → 0SR

grammar: unit productions:

S T

R

Page 8: MELJUN CORTES Automata Theory (Automata9)

Removal of unit productions

• If there is a cycle of unit productions

delete it and replace everything with A1

• Example

A1 → A2 → ... → Ak → A1

S → 0S1 | 1S0S1 | TT → S | R | εR → 0SR

S T

R

S → 0S1 | 1S0S1S → R | εR → 0SR

T is replaced by S in the {S, T} cycle

Page 9: MELJUN CORTES Automata Theory (Automata9)

Removal of unit productions

• For other unit productions, replace every chain

by productions A1 → α,... , Ak → α

• Example

A1 → A2 → ... → Ak → α

S → R → 0SR is replaced by S → 0SR, R → 0SR

S → 0S1 | 1S0S1 | R | εR → 0SR

S → 0S1 | 1S0S1 | 0SR | εR → 0SR

Page 10: MELJUN CORTES Automata Theory (Automata9)

Removal of ε-productions

• A variable N is nullable if there is a derivation

• How to remove ε-productions (except from S)

Find all nullable variables N1, ..., Nk

For i = 1 to kFor every production of the form A → αNiβ,

add another production A → αβIf Ni → ε is a production, remove it

If S is nullable, add the special production S → ε

N ⇒ ε*

Page 11: MELJUN CORTES Automata Theory (Automata9)

Example

• Find the nullable variables

S → ACDA→ aB → εC → ED | εD → BC | bE → b

B C D

nullable variablesgrammar

Find all nullable variables N1, ..., Nk

Page 12: MELJUN CORTES Automata Theory (Automata9)

Finding nullable variables

• To find nullable variables, we work backwards– First, mark all variables A s.t. A → ε as nullable– Then, as long as there are productions of the form

where all of A1,…, Ak are marked as nullable, mark A as nullable

A → A1… Ak

Page 13: MELJUN CORTES Automata Theory (Automata9)

Eliminating ε-productions

S → ACDA→ aB → εC → ED | εD → BC | bE → b

nullable variables: B, C, D

For i = 1 to kFor every production of the form A → αNiβ,

add another production A → αβIf Ni → ε is a production, remove it

D → CS → ADD → BD → εS → ACS → AC → E

Page 14: MELJUN CORTES Automata Theory (Automata9)

Recap

• After eliminating ε-productions and unit productions, we know that every derivation

doesn’t shrink in length and doesn’t go into cycles

• Exception: S → ε – We will not use this rule at all, except to check if ε ∈ L

• Note� ε-productions must be eliminated before unit productions

S ⇒ a1…ak where a1, …, ak are terminals*

Page 15: MELJUN CORTES Automata Theory (Automata9)

Example: testing membership

S → 0S1 | 1S0S1 | TT → S | ε

x = 00111

S → ε | 01 | 101 | 0S1 |10S1 | 1S01 | 1S0S1

S 01, 101

10S1

1S01

1S0S1

10011, strings of length ≥ 6

10101, strings of length ≥ 6

unit, ε-prod

eliminate

only strings of length ≥ 6

0S1 0011, 0101100S11strings of length ≥ 6

only strings of length ≥ 6

Page 16: MELJUN CORTES Automata Theory (Automata9)

Algorithm 1 for testing membership

• We can now use the following algorithm to check if a string x is in the language of G

Eliminate all ε-productions and unit productionsIf x = ε and S → ε, accept; else delete S → εLet X := S

While some new production P can be applied to XApply P to XIf X = x, acceptIf |X| > |x|, backtrack

If no more productions can be applied to X, reject

Page 17: MELJUN CORTES Automata Theory (Automata9)

Practical limitations of Algorithm I

• Previous algorithm can be very slow if x is long

• There is a faster algorithm, but it requires that we do some more transformations on the grammar

G = CFG of the java programming languagex = code for a 200-line java program

algorithm might take about 10200 steps!

Page 18: MELJUN CORTES Automata Theory (Automata9)

Chomsky Normal Form

• A grammar is in Chomsky Normal Form if every production (except possibly S → ε) is of the type

• Conversion to Chomsky Normal Form is easy:

A → BC A → aor

A → BcDEreplace terminalswith new variables

A → BCDEC → c

break upsequenceswith new variables

A → BX1

X1 → CX2

X2 → DEC → c

Page 19: MELJUN CORTES Automata Theory (Automata9)

Exercise

• Convert this CFG into Chomsky Normal Form:

S → ε |ADDA

A → a

C → c

D → bCb

Page 20: MELJUN CORTES Automata Theory (Automata9)

Algorithm 2 for testing membership

S → AB | BCA → BA | aB → CC | bC → AB | a

x = baaba

Idea: We generate each substring of x bottom up

ab b aa

ACB B ACAC

BSA SASC

B– B

SAC–

SAC

Page 21: MELJUN CORTES Automata Theory (Automata9)

Parse tree reconstruction

S → AB | BCA → BA | aB → CC | bC → AB | a

x = baabaab b aa

ACB B ACAC

BSA SASC

B– B

SAC–

SAC

Tracing back the derivations, we obtain the parse tree

Page 22: MELJUN CORTES Automata Theory (Automata9)

Cocke-Younger-Kasami algorithm

For i = 1 to k If there is a production A → xi

Put A in table cell iiFor b = 2 to k For s = 1 to k – b + 1 Set t = s + b For j = s to t If there is a production A → BC where B is in cell sj and C is in cell jt Put A in cell st

x1 x2 … xk

11 22 kk

12 23… …

1k

tablecells

s j t k1

b

Input: Grammar G in CNF, string x = x1…xk

Cell ij remembers all possible derivations of substring xi…xj