Module 32

14
1 Module 32 • Chomsky Normal Form (CNF) – 4 step process

description

Module 32. Chomsky Normal Form (CNF) 4 step process. Chomsky Normal Form. A CFG is in Chomsky normal form (CNF) if every production is one of these two types: A → BC A → a Key ideas Eliminating λ-productions (e.g. S → λ) Eliminating unit productions (e.g. A → B). Nullable Variables. - PowerPoint PPT Presentation

Transcript of Module 32

Page 1: Module 32

1

Module 32

• Chomsky Normal Form (CNF)– 4 step process

Page 2: Module 32

2

Chomsky Normal Form

• A CFG is in Chomsky normal form (CNF) if every production is one of these two types:

A → BCA → a

• Key ideas– Eliminating λ-productions (e.g. S → λ)– Eliminating unit productions (e.g. A → B)

Page 3: Module 32

3

Nullable Variables

• A variable A in a CFG G = (V, Σ, S, P) is defined as nullable if:– Base case: P contains the production A → λ – Recursive case: P contains the production

A → B1B2 … Bn and B1 through Bn are nullable

• No other variables are nullable

Page 4: Module 32

4

Finding Nullable Variables

• Initialize N0 to be the set of nullable variables by the base case definition

• i = 0;• do

– i = i+1;– Ni = Ni-1 union {A | P contains A → α where α is a

string in Ni-1*}

• while Ni ≠ Ni-1;

• The final Ni is the set of nullable variables

Page 5: Module 32

5

Eliminating λ-productions

• Given CFG G = (V, Σ, S, P), construct a CFG G1 = (V, Σ, S, P1) as follows.

• Initialize P1 = P• Find set of all nullable variables N in G• For every production A → α in P, add to P1 every production that can

be obtained from this one by deleting from α one or more of the occurrences of nullable variables in α– Example: A → BBCD where B and C are nullable leads to

A → BCD | BCD | BBD | BD | BD | CD | D• Clean up

– Delete all λ-productions from P1

– Delete any duplicate productions– Delete any productions of the form A → A

• Thm: L(G1) = L(G) – {λ}

Page 6: Module 32

6

A-derivable Variables

• B is A-derivable in a CFG G if and only if A ==>G* B

• Recursive definition: A variable B in a CFG G = (V, Σ, S, P) is defined as A-derivable if:– Base case: P contains the production A → B or B = A– Recursive case: Variable C is A-derivable and P

contains the production C → B

• No other variables are A-derivable• Easy to make into algorithm

Page 7: Module 32

7

Eliminating unit productions

• Given CFG G = (V, Σ, S, P), construct a CFG G1 = (V, Σ, S, P1) as follows.

• Initialize P1 = P• Find each A in V, find set of A-derivable variables in V• For each pair (A,B) such that B is A-derivable and every

non-unit production B → α in P, add production A → α in P1

• Clean up– Delete all unit productions from P1

– Delete any duplicate productions

• Thm: L(G1) = L(G) if G did not have any λ-productions

Page 8: Module 32

8

Making CNF grammar

• First eliminate λ-productions• Then eliminate unit productions• For each terminal a in Σ, introduce a variable Xa with production rule

Xa → a• For each production of the form A → α where terminal a appears and |

α| > 1, replace a with Xa

• Finally, replace productions of the form A → B1B2 … Bn with a series of productions:– A →B1Y1

– Y1 →B21Y2

–....

– Yn-2 →Bn-1Bn

– Other methods can be used for this last step

Page 9: Module 32

9

Example: Eliminate λ-productions

• S → AACD• A → aAb | λ• C → aC | a• D → aDa | bDb | λ• Nullable variables: A & D• New grammar

– S → AACD | ACD | AAC | CD | AC | C– A → aAb | ab– C → aC | a– D → aDa | bDb | aa | bb

Page 10: Module 32

10

Example: Eliminate unit productions

• S → AACD | ACD | AAC | CD | AC | C• A → aAb | ab• C → aC | a• D → aDa | bDb | aa | bb• New grammar:

– S → AACD | ACD | AAC | CD | AC | aC | a– A → aAb | ab– C → aC | a– D → aDa | bDb | aa | bb

Page 11: Module 32

11

Example: Add Xa and Xb

• S → AACD | ACD | AAC | CD | AC | aC | a• A → aAb | ab• C → aC | a• D → aDa | bDb | aa | bb• New grammar

– S → AACD | ACD | AAC | CD | AC | XaC | a– A → XaAXb | XaXb

– C → XaC | a– D → XaDXa | XbDXb | XaXa | XbXb

– Xa→ a– Xb→ b

Page 12: Module 32

12

Example: Shorten long productions

• S → AACD | ACD | AAC | CD | AC | XaC | a

• A → XaAXb | XaXb

• C → XaC | a

• D → XaDXa | XbDXb | XaXa | XbXb

• Xa→ a

• Xb→ b

• Example replacement– S → AACD becomes S → AT1, T1 → AT2, T2 → CD

Page 13: Module 32

13

Observations

• Consider a derivation from a CNF grammar G that begins:S ==>G ABCD

• How short can the final derived terminal string be?

• Why?

Page 14: Module 32

14

Observation 2

• A path in a parse tree has length x if it contains x variables

• Consider a parse tree T for a string x and a CNF grammar G with m variables.– Suppose the longest path in T has length k.

How long can this string x be?– Suppose string x has length 2m. How short can

the longest path in T be?