CYK Algorithm & CFL reachability

40
CYK Algorithm & CFL reachability By - Lohit Krishnan Chetas Mahajan

description

CYK Algorithm & CFL reachability. By - Lohit Krishnan Chetas Mahajan. Outline. CYK Algorithm Background Problem statement. Intuition Terminologies Formal description and example. Background. Named after C ocke, Y ounger K asami. Some fascinating qualities: - PowerPoint PPT Presentation

Transcript of CYK Algorithm & CFL reachability

Page 1: CYK Algorithm & CFL reachability

CYK Algorithm & CFL reachability

By - Lohit Krishnan

Chetas Mahajan

Page 2: CYK Algorithm & CFL reachability

Outline

• CYK Algorithm– Background– Problem statement.– Intuition– Terminologies – Formal description and

example

Page 3: CYK Algorithm & CFL reachability

Background

• Named after Cocke,YoungerKasami.

• Some fascinating qualities:– It shows that deciding if s ϵ L(G) is in P for any CNF

CFG G.– Uses a “dynamic programming” or “table-filling

algorithm” which solves decision problem.

Page 4: CYK Algorithm & CFL reachability

Problem Statement• Given the CFG G :

S -> AB | BCA -> BA | aB -> CC | bC -> AB | a

• L be the language generated by G.

• Is the string “baaba”, a valid member of the L ?

• How many substrings of “baaba” are valid members of L ?

• How many distinct substrings of the given string are valid members of L ?

• How many non-empty substrings of the given string are not valid members of L ?

• How many substrings of the given string are only generated by B ?

Page 5: CYK Algorithm & CFL reachability

Problem Statement

• Given a context-free grammar G and a string w – G = (V, Σ ,P , S) where • V finite set of variables • Σ (the alphabet) finite set of terminal symbols • P finite set of rules • S start symbol (distinguished element of V) • V and Σ are assumed to be disjoint

– G is used to generate the strings of language L• Does w ϵ L(G) ?? (Membership Problem)

Page 6: CYK Algorithm & CFL reachability

Terminology

• Let n be the length of the string w.• Partition the given string using n+1 lines.• Number those lines from 0 to n.• Now, we define – xij as the substring of the string w which lies

between the lines i and j. (Here i < j).

– Tij be the set of non-terminals which generate

string xij

Page 7: CYK Algorithm & CFL reachability

Terminology

• Grammar :S-> AB | BCA -> BA | aB -> CC | bC -> AB | a

• String to be checked is “baaba”.

• x13 = aa

• x35 = ba

• x05 = baaba

• T23 = Non-terminals generating x23 (i.e “a”).

• T23 = { A, C }b a a b a

0 1 2 3 4 5

• Build a table T of Tij , 0 ≤ i ≤ n -1 ; 1 ≤ j ≤ n ; i < j

Page 8: CYK Algorithm & CFL reachability

Intuition of the algorithm

• Tij are the subproblems of Dynamic Programming.

• In this problem, we need to decide whether the start symbol belongs in T0n.

• Formation of DP: -

• T(T1T2

) = { X | X->t1t2 and t1 ϵ T1 and t2 ϵ T2 }

• Tij = U T(TikTkj)

j-1

k = i+1

Page 9: CYK Algorithm & CFL reachability

T(0,1) T(1,2) T(2,3) T(3,4) T(4,5)

T(0,2) T(1,3) T(2,4) T(3,5)

T(0,3) T(1,4) T(2,5)

T(0,4) T(1,5)

T(0,5)

b a a b a0 1 2 3 4 5

b a a b a

ba aa ab ba

baa aab aba

baab aaba

baaba

Page 10: CYK Algorithm & CFL reachability

B A,C A,C B A,C

S -> AB | BCA -> BA | aB -> CC | bC -> AB | a

b a a b a0 1 2 3 4 5

b a a b a

ba aa ab ba

baa aab aba

baab aaba

baaba

Page 11: CYK Algorithm & CFL reachability

B A,C A,C B A,C

A,S

S -> AB | BCA -> BA | aB -> CC | bC -> AB | a

b a a b a0 1 2 3 4 5

b a a b a

ba aa ab ba

baa aab aba

baab aaba

baaba

Page 12: CYK Algorithm & CFL reachability

B A,C A,C B A,C

A,S B

S -> AB | BCA -> BA | aB -> CC | bC -> AB | a

b a a b a0 1 2 3 4 5

b a a b a

ba aa ab ba

baa aab aba

baab aaba

baaba

Page 13: CYK Algorithm & CFL reachability

B A,C A,C B A,C

A,S B S,C

S -> AB | BCA -> BA | aB -> CC | bC -> AB | a

b a a b a0 1 2 3 4 5

b a a b a

ba aa ab ba

baa aab aba

baab aaba

baaba

Page 14: CYK Algorithm & CFL reachability

B A,C A,C B A,C

A,S B S,C A,S

S -> AB | BCA -> BA | aB -> CC | bC -> AB | a

b a a b a0 1 2 3 4 5

b a a b a

ba aa ab ba

baa aab aba

baab aaba

baaba

Page 15: CYK Algorithm & CFL reachability

B A,C A,C B A,C

A,S B S,C A,S

-S -> AB | BCA -> BA | aB -> CC | bC -> AB | a

b a a b a0 1 2 3 4 5

b a a b a

ba aa ab ba

baa aab aba

baab aaba

baaba

Page 16: CYK Algorithm & CFL reachability

B A,C A,C B A,C

A,S B S,C A,S

- BS -> AB | BCA -> BA | aB -> CC | bC -> AB | a

b a a b a0 1 2 3 4 5

b a a b a

ba aa ab ba

baa aab aba

baab aaba

baaba

Page 17: CYK Algorithm & CFL reachability

B A,C A,C B A,C

A,S B S,C A,S

- B BS -> AB | BCA -> BA | aB -> CC | bC -> AB | a

b a a b a0 1 2 3 4 5

b a a b a

ba aa ab ba

baa aab aba

baab aaba

baaba

Page 18: CYK Algorithm & CFL reachability

B A,C A,C B A,C

A,S B S,C A,S

- B B

-S -> AB | BCA -> BA | aB -> CC | bC -> AB | a

b a a b a0 1 2 3 4 5

b a a b a

ba aa ab ba

baa aab aba

baab aaba

baaba

Page 19: CYK Algorithm & CFL reachability

B A,C A,C B A,C

A,S B S,C A,S

- B B

- S,C,AS -> AB | BCA -> BA | aB -> CC | bC -> AB | a

b a a b a0 1 2 3 4 5

b a a b a

ba aa ab ba

baa aab aba

baab aaba

baaba

Page 20: CYK Algorithm & CFL reachability

B A,C A,C B A,C

A,S B S,C A,S

- B B

- S,C,A

S,C,A

S -> AB | BCA -> BA | aB -> CC | bC -> AB | a

b a a b a0 1 2 3 4 5

b a a b a

ba aa ab ba

baa aab aba

baab aaba

baaba

Page 21: CYK Algorithm & CFL reachability

Answers• Is the string “baaba”, a valid

member of the L ? Yes !!

Page 22: CYK Algorithm & CFL reachability

B A,C A,C B A,C

A,S B S,C A,S

- B B

- S,C,A

S,C,A

b a a b a0 1 2 3 4 5

b a a b a

ba aa ab ba

baa aab aba

baab aaba

baaba

Page 23: CYK Algorithm & CFL reachability

Answers• Is the string “baaba”, a valid

member of the L ? Yes !!

• How many substrings of “baaba” are valid members of L ?

5

Page 24: CYK Algorithm & CFL reachability

B A,C A,C B A,C

A,S B S,C A,S

- B B

- S,C,A

S,C,A

b a a b a0 1 2 3 4 5

b a a b a

ba aa ab ba

baa aab aba

baab aaba

baaba

Page 25: CYK Algorithm & CFL reachability

Answers• Is the string “baaba”, a valid

member of the L ? Yes !!

• How many substrings of “baaba” are valid members of L ?

5

• How many distinct substrings of the given string are valid members of L ?

4

Page 26: CYK Algorithm & CFL reachability

B A,C A,C B A,C

A,S B S,C A,S

- B B

- S,C,A

S,C,A

b a a b a0 1 2 3 4 5

b a a b a

ba aa ab ba

baa aab aba

baab aaba

baaba

Page 27: CYK Algorithm & CFL reachability

Answers• Is the string “baaba”, a valid

member of the L ? Yes !!

• How many substrings of “baaba” are valid members of L ?

5

• How many distinct substrings of the given string are valid members of L ?

4

• How many non-empty substrings of the given string are not valid members of L ?

15 – 5 = 10

Page 28: CYK Algorithm & CFL reachability

Answers

• Is the string “baaba”, a valid member of the L ?

Yes !!

• How many substrings of “baaba” are valid members of L ?

5

• How many distinct substrings of the given string are valid members of L ?

4

• How many non-empty substrings of the given string are not valid members of L ?

15 – 5 = 10

• How many substrings of the given string are only generated by B ?

5

Page 29: CYK Algorithm & CFL reachability

B A,C A,C B A,C

A,S B S,C A,S

- B B

- S,C,A

S,C,A

b a a b a0 1 2 3 4 5

b a a b a

ba aa ab ba

baa aab aba

baab aaba

baaba

Page 30: CYK Algorithm & CFL reachability

CFL Reachability

Page 31: CYK Algorithm & CFL reachability

Outline

• CFL reachability– Motivation– Problem definition– Variants of CFL

Reachability problem– Relation with other

Problems– Algorithm– Example

Page 32: CYK Algorithm & CFL reachability

Motivation

“Program Analysis via Graph-reachability”By Thomas Reps

Page 33: CYK Algorithm & CFL reachability

Motivation

• Program analysis requires extraction of information from a program without actually running it.

• Classical data-flow analysis maintains set of “dataflow facts” with each program point.

• Program analysis Graph Reachability problem(GRP)

• GRP is a special case of CFL Reachability problem.

Page 34: CYK Algorithm & CFL reachability

Problem Definition

• Let L be a context-free language over alphabet ∑, and let G be a graph whose edges are labeled with members of ∑.

• Each path in G defines a word over ∑*, namely, the word obtained by concatenating, in order, the labels of the edges on the path. A path in G is an L-path if its word is a member of L.

Page 35: CYK Algorithm & CFL reachability

Variants of CFL Reachability Problem

1. The all-pairs L-path problem.2. The single-source L-path problem.3. The single-target L-path problem4. The single-source/single-target L-path

problem.• Other Variants : Multi-source L-path problem,

the multi-target L-path problem, and the multi-source/multi-target L-path problem

Page 36: CYK Algorithm & CFL reachability

Example

• L be the language that consists of strings of matched parentheses and square brackets, with zero or more e’s inside it.• Only one L-Path : [(e[])eee[e]]

Page 37: CYK Algorithm & CFL reachability

Relation with other problems

• Ordinary Graph Reachability Problem– Put all the labels as e, and L = e*

• CFL Recognition Problem – “Given a string w and a context-free language L, is

w ϵ L?”– Create a linear graph s →... → t, that has |w|

edges, and label the ith edge with the ith letter of w. – There is an L-path from s to t iff w ϵ L.

Page 38: CYK Algorithm & CFL reachability

Algorithm• Normalize the grammar so that the right-hand side of each production has at most two symbols (either terminals or nonterminals).• Add additional edges as shown in the figure below.

A ϵ N B, C ϵ (N U T)

• Solution can be obtained via edges labelled with Start Symbol of the Grammar.

Page 39: CYK Algorithm & CFL reachability

Example

• Grammar :S-> AB | BCA -> BA | aB -> CC | bC -> AB | a

• Graph G :

• All pair L-Path Problem.

b a a

b

Page 40: CYK Algorithm & CFL reachability

Questions ??