Solving systems of regular expression equationspolaris.cs.uiuc.edu/~padua/cs426/cs426-5.pdf · •...

27
1 of 27 CS426 Compiler Construction Fall 2006 Solving systems of regular expression equations

Transcript of Solving systems of regular expression equationspolaris.cs.uiuc.edu/~padua/cs426/cs426-5.pdf · •...

Page 1: Solving systems of regular expression equationspolaris.cs.uiuc.edu/~padua/cs426/cs426-5.pdf · • Definition: A Context Free Grammar (N, Σ, P, S) is said to be in Chomsky normal

1 of 27CS426 Compiler Construction

Fall 2006

Solving systems of regular expression equations

Page 2: Solving systems of regular expression equationspolaris.cs.uiuc.edu/~padua/cs426/cs426-5.pdf · • Definition: A Context Free Grammar (N, Σ, P, S) is said to be in Chomsky normal

2 of 27CS426 Compiler Construction

Fall 2006

Consider the following DFSM

For each transition from state Q to state R under input a, we create a production of the form Q→Ra and for each final state, P, a production of the form P → ε. Productions lead to equations in the obvious way.

So, we have the equations

A1=1A1+0A2

A2=0A3+1a1

A3=0A3+1A1+ε

A1 A3A2

1 0 0 0

1

1

Page 3: Solving systems of regular expression equationspolaris.cs.uiuc.edu/~padua/cs426/cs426-5.pdf · • Definition: A Context Free Grammar (N, Σ, P, S) is said to be in Chomsky normal

3 of 27CS426 Compiler Construction

Fall 2006

These equations can be solved as follows:

First solve for A3

A3=0*1A1+0*

Then replace A3 in the equation for A2

A2= 00*1A1+00*+1A1

finally replacing A2 in the equation for A1 we have

A1 = 1A1+000*1A1+000*+01A1

= (1+000*1+01)*000*

the obvious answer !

Page 4: Solving systems of regular expression equationspolaris.cs.uiuc.edu/~padua/cs426/cs426-5.pdf · • Definition: A Context Free Grammar (N, Σ, P, S) is said to be in Chomsky normal

4 of 27CS426 Compiler Construction

Fall 2006

Parsing

• In our compiler model, the parser accepts tokens and verifies that the string can be generated by the grammar.

• Three types of parsers-- tabular -- top-down-- bottom up

Lexical analyzer parser

rest of front end

Symboltable

source program

token

get nexttoken

parsetree

Page 5: Solving systems of regular expression equationspolaris.cs.uiuc.edu/~padua/cs426/cs426-5.pdf · • Definition: A Context Free Grammar (N, Σ, P, S) is said to be in Chomsky normal

Backtrack parsing

• Backtrack parsing simulates a nondeterministic machine.

• There are both top-down and bottom up forms of backtrack parsing.

• Both forms may take an exponential amount of time to parse.

• Let us only discuss the top down version.

• The name top-down parsing comes from the idea that we attempt to produce a parse tree for the input string starting from the top (root) and working down to the leaves.

Page 6: Solving systems of regular expression equationspolaris.cs.uiuc.edu/~padua/cs426/cs426-5.pdf · • Definition: A Context Free Grammar (N, Σ, P, S) is said to be in Chomsky normal

• For top-down backtrack parsing, we begin with a tree containing one node labeled S, the start symbol of the grammar.

• We assume that the alternate productions for every nonterminal have been ordered. that is, if A → α1 | α2 | ... | αn are all the A productions in the grammar, we assign some ordering to the αi’s (the alternates for A).

Page 7: Solving systems of regular expression equationspolaris.cs.uiuc.edu/~padua/cs426/cs426-5.pdf · • Definition: A Context Free Grammar (N, Σ, P, S) is said to be in Chomsky normal

• Recursively proceed as follows-- If the active node is labeled by a nonterminal, say A,

then choose the first alternate, say X1 ... Xk, for A and create k direct descendants for A labeled X1, X2, ..., Xk. Make X1 the active node. If k = 0, then make the node immediately to the right of A active.

-- If the active node is labeled by a terminal, say a, then compare the current input symbol with a. If they match, then make active the node inmmediately to the right of of a and move the input pointer one symbol to the right. If a does not match the current input symbol, go back to the node where the previous production was applied, adjust the input pointer if necessary and try the next alternate. If no alternate is possible, go back to the next previous node, and so forth.

7 of 27CS426 Compiler Construction

Fall 2006

Page 8: Solving systems of regular expression equationspolaris.cs.uiuc.edu/~padua/cs426/cs426-5.pdf · • Definition: A Context Free Grammar (N, Σ, P, S) is said to be in Chomsky normal

• Cosider the grammar S→aSbS | aS | c and the input string aacbc. The sequence of parsing tress would be as follows:

S

a S b S

S

a S b S

a S b S

S

a S b S

S

a S b S

a S b S

no matchbacktrack

S

a S b S

a S b S

a Sno matchbacktrack

S

a S b S

a S b S

c

S

a S b S

a S b S

c

S

a S b S

a S b S

c c

S

a S b S

a S b S

c c

no matchbacktrack

Page 9: Solving systems of regular expression equationspolaris.cs.uiuc.edu/~padua/cs426/cs426-5.pdf · • Definition: A Context Free Grammar (N, Σ, P, S) is said to be in Chomsky normal

9 of 27CS426 Compiler Construction

Fall 2006

S

a S b S

a Sc

S

a S b S

a S

c

S

a S b S

a S

c

S

a S b S

a S

c

Page 10: Solving systems of regular expression equationspolaris.cs.uiuc.edu/~padua/cs426/cs426-5.pdf · • Definition: A Context Free Grammar (N, Σ, P, S) is said to be in Chomsky normal

10 of 27CS426 Compiler Construction

Fall 2006

The Cocke-Younger-Kasami Algorithm

• Tabular methods include: Cocke-Younger-Kasami algorithm and Earley’s algorithm.

• These are essentially dynamic programming methods and are discussed mainly because of their simplicity.

• However, they are highly inefficent (n3 time and n2 space). They are the parsing equivalent to “bubble sort”.

Page 11: Solving systems of regular expression equationspolaris.cs.uiuc.edu/~padua/cs426/cs426-5.pdf · • Definition: A Context Free Grammar (N, Σ, P, S) is said to be in Chomsky normal

• Definition: A Context Free Grammar (N, Σ, P, S) is said to be in Chomsky normal form (CNF) if each production in P is one of the forms (1) A → BC with A, B, C in N(2) A → a with a in Σ, or(3) if ε is in L(G) then S → ε is a production and S does not appear on the right hand side of any production.

• Theorem: Let L be a context free language (i.e. a language that can be generated by a context free grammar). Then L = L(G) for some CFG G in Chomsky Normal Form.

Page 12: Solving systems of regular expression equationspolaris.cs.uiuc.edu/~padua/cs426/cs426-5.pdf · • Definition: A Context Free Grammar (N, Σ, P, S) is said to be in Chomsky normal

12 of 27CS426 Compiler Construction

Fall 2006

Page 13: Solving systems of regular expression equationspolaris.cs.uiuc.edu/~padua/cs426/cs426-5.pdf · • Definition: A Context Free Grammar (N, Σ, P, S) is said to be in Chomsky normal

13 of 27CS426 Compiler Construction

Fall 2006

• Example: The GFG, G, defined by

S → aAB | BAA → BBB | aB → AS | b

can be converted to the equivalent CNF grammar, G’S → a’ <AB> | BAA → B <BB> | aB → AS | b<AB> → AB<BB> → BBa’ → a

where G=(N,{a,b},P,S) and G’=(N’,{a,b},P’,S). Here P and P’ correspond to the two sets of productions above and N’ ={S, A, B, <AB>, <BB>, a’}.

Page 14: Solving systems of regular expression equationspolaris.cs.uiuc.edu/~padua/cs426/cs426-5.pdf · • Definition: A Context Free Grammar (N, Σ, P, S) is said to be in Chomsky normal

• let w = a1a2...an be an input string which is to be parsed according to G. We assume that each ai is in Σ for 1 ≤ i ≤ n. The essence of the algorithm is the construction of a parse table T, whose elements we denote tij for 1 ≤ i ≤ n and 1 ≤ j ≤ n-i+1. Each tij will have a value which is a

subset of N. A will be in tij iff A →+ aiai+1...ai+j-1, that is, if A derives the the j input symbols beginning at position i. As a special case, the input string w is in L(G) iff S is in t1n.

Page 15: Solving systems of regular expression equationspolaris.cs.uiuc.edu/~padua/cs426/cs426-5.pdf · • Definition: A Context Free Grammar (N, Σ, P, S) is said to be in Chomsky normal

15 of 27CS426 Compiler Construction

Fall 2006

Page 16: Solving systems of regular expression equationspolaris.cs.uiuc.edu/~padua/cs426/cs426-5.pdf · • Definition: A Context Free Grammar (N, Σ, P, S) is said to be in Chomsky normal

16 of 27CS426 Compiler Construction

Fall 2006

Page 17: Solving systems of regular expression equationspolaris.cs.uiuc.edu/~padua/cs426/cs426-5.pdf · • Definition: A Context Free Grammar (N, Σ, P, S) is said to be in Chomsky normal

17 of 27CS426 Compiler Construction

Fall 2006

Top Down parsing.

A non-backtracking form of top-down parser is called a predictive parser (section 2.4 of the textbook).

The main idea is that at each point it is possible to know by looking at the next input token which production should be applied. For most programming languages, the fact that each type of statement is initiated by a different keyword (e.g. for, while, if) facilitates the development of top-down predictive parsers.

Page 18: Solving systems of regular expression equationspolaris.cs.uiuc.edu/~padua/cs426/cs426-5.pdf · • Definition: A Context Free Grammar (N, Σ, P, S) is said to be in Chomsky normal

18 of 27CS426 Compiler Construction

Fall 2006

• Consider the following grammartype → simple | * id | array [ simple ] of typesymple → integer | char | num : num

• Here the tokens are in bold.

• Starting from type, it is always clear which production to apply. -- Thus, if the next token is integer, char, or num,

the production type → simple should be appied.-- If it is *, the the production type → * id should be

applied .-- Finally, if it is array,

type → array [ simple ] of type should be applied

• Clearly, no need for backtracking ever arises.

Page 19: Solving systems of regular expression equationspolaris.cs.uiuc.edu/~padua/cs426/cs426-5.pdf · • Definition: A Context Free Grammar (N, Σ, P, S) is said to be in Chomsky normal

19 of 27CS426 Compiler Construction

Fall 2006

• Predictive parsing relies on the information about what first symbols can be generated by the right hand side of a production.

• Thus,-- FIRST(simple) = {integer, char, num}-- FIRST(* id) = { * }-- FIRST(array [ simple ] of type) = {array}

Page 20: Solving systems of regular expression equationspolaris.cs.uiuc.edu/~padua/cs426/cs426-5.pdf · • Definition: A Context Free Grammar (N, Σ, P, S) is said to be in Chomsky normal

20 of 27CS426 Compiler Construction

Fall 2006

Page 21: Solving systems of regular expression equationspolaris.cs.uiuc.edu/~padua/cs426/cs426-5.pdf · • Definition: A Context Free Grammar (N, Σ, P, S) is said to be in Chomsky normal

Problems with predictive parsing.

Consider the grammar

S → S aS → b

By just looking at the next token it is impossible to determine how many times the first production shoudl be applied.

For example, if the input is baaaaa then the first production must be applied five times before the second one is.

Grammars like this are said to be left recursive.

Left recursion can be elimninated by changing the grammar. In this case to:

S → b S’S’ → a S’ | ε

21 of 27CS426 Compiler Construction

Fall 2006

Page 22: Solving systems of regular expression equationspolaris.cs.uiuc.edu/~padua/cs426/cs426-5.pdf · • Definition: A Context Free Grammar (N, Σ, P, S) is said to be in Chomsky normal

A second problem is that it may not be possible to determine by just looking at the next token which production to apply.

Consider the following grammar:S → i E t S | i E t S e S | a E → b

By just looking at the next token it is impossibel to determine whether the first or the second production should be applied.

Again, changing the grammar can solve this problem:

S → i E t S S’ | a

S’ → e S | ε

E → b

22 of 27CS426 Compiler Construction

Fall 2006

Page 23: Solving systems of regular expression equationspolaris.cs.uiuc.edu/~padua/cs426/cs426-5.pdf · • Definition: A Context Free Grammar (N, Σ, P, S) is said to be in Chomsky normal

23 of 27CS426 Compiler Construction

Fall 2006

Example

Consider the grammar:A → id = EE → E + TE → E - TE → TE → -TT → T * FT → T /FT → FF → (E)F → id F → number

After eliminating the left recursion, we obtain

Page 24: Solving systems of regular expression equationspolaris.cs.uiuc.edu/~padua/cs426/cs426-5.pdf · • Definition: A Context Free Grammar (N, Σ, P, S) is said to be in Chomsky normal

24 of 27CS426 Compiler Construction

Fall 2006

A → id = EE → T E’E → -T E’ E’→ -T E’E’→ +T E’T → F T’T’ → * F T’T → / F T’F → (E)F → id F → number

From where we can create the recursive descent parser.

Page 25: Solving systems of regular expression equationspolaris.cs.uiuc.edu/~padua/cs426/cs426-5.pdf · • Definition: A Context Free Grammar (N, Σ, P, S) is said to be in Chomsky normal

procedure A(){match(id);t=idnum;match(‘=’);expression();emit(“fstore”);emitln(t);

}procedure E(){

if ( lookahead == “-” ) {match(“-”);T();emit(“fneg”);

}else T();E’();

}procedure E’(){

if ( lookahead == “-” ) {match(“-”);T();emit(“fsub”);E’();

}elseif ( lookahead == “+” ) {

match(“+”);T();emit(“fadd”);E’();

}

}

Page 26: Solving systems of regular expression equationspolaris.cs.uiuc.edu/~padua/cs426/cs426-5.pdf · • Definition: A Context Free Grammar (N, Σ, P, S) is said to be in Chomsky normal

procedure T(){F();T’();

}procedure T’(){

if ( lookahead == “*” ) {match(“*”);F();emitln(“fmul”);T’();

}elseif ( lookahead == “/” ) {

match(“/”);F();emitln(“fdiv”);T’();

}

}

26 of 27CS426 Compiler Construction

Fall 2006

Page 27: Solving systems of regular expression equationspolaris.cs.uiuc.edu/~padua/cs426/cs426-5.pdf · • Definition: A Context Free Grammar (N, Σ, P, S) is said to be in Chomsky normal

procedure F(){if ( lookahead == num ) {

emit(“ldc”);emitln(value);match (num);

}elseif lookahead == “(” {

match(“(”);E();match(“)”);

}else {

match(id);emit(“fload”);emitln(idnum);

}}

27 of 27CS426 Compiler Construction

Fall 2006