Solving systems of regular expression equationspolaris.cs.uiuc.edu/~padua/cs426/cs426-5.pdf · •...

1 of 27CS426 Compiler Construction

Fall 2006

Solving systems of regular expression equations


Fall 2006

Consider the following DFSM

For each transition from state Q to state R under input a, we create a production of the form Q→Ra and for each final state, P, a production of the form P → ε. Productions lead to equations in the obvious way.

So, we have the equations

A1=1A1+0A2

A2=0A3+1a1

A3=0A3+1A1+ε

A1 A3A2

1 0 0 0

1

1


Fall 2006

These equations can be solved as follows:

First solve for A3

A3=0*1A1+0*

Then replace A3 in the equation for A2

A2= 00*1A1+00*+1A1

finally replacing A2 in the equation for A1 we have

A1 = 1A1+000*1A1+000*+01A1

= (1+000*1+01)*000*

the obvious answer !


Fall 2006

Parsing

• In our compiler model, the parser accepts tokens and verifies that the string can be generated by the grammar.

• Three types of parsers-- tabular -- top-down-- bottom up

Lexical analyzer parser

rest of front end

Symboltable

source program

token

get nexttoken

parsetree

Backtrack parsing

• Backtrack parsing simulates a nondeterministic machine.

• There are both top-down and bottom up forms of backtrack parsing.

• Both forms may take an exponential amount of time to parse.

• Let us only discuss the top down version.

• The name top-down parsing comes from the idea that we attempt to produce a parse tree for the input string starting from the top (root) and working down to the leaves.

• For top-down backtrack parsing, we begin with a tree containing one node labeled S, the start symbol of the grammar.

• We assume that the alternate productions for every nonterminal have been ordered. that is, if A → α1 | α2 | ... | αn are all the A productions in the grammar, we assign some ordering to the αi’s (the alternates for A).

• Recursively proceed as follows-- If the active node is labeled by a nonterminal, say A,

then choose the first alternate, say X1 ... Xk, for A and create k direct descendants for A labeled X1, X2, ..., Xk. Make X1 the active node. If k = 0, then make the node immediately to the right of A active.

-- If the active node is labeled by a terminal, say a, then compare the current input symbol with a. If they match, then make active the node inmmediately to the right of of a and move the input pointer one symbol to the right. If a does not match the current input symbol, go back to the node where the previous production was applied, adjust the input pointer if necessary and try the next alternate. If no alternate is possible, go back to the next previous node, and so forth.


Fall 2006

• Cosider the grammar S→aSbS | aS | c and the input string aacbc. The sequence of parsing tress would be as follows:

S

a S b S

S

a S b S

a S b S

S

a S b S

S

a S b S

a S b S

no matchbacktrack

S

a S b S

a S b S

a Sno matchbacktrack

S

a S b S

a S b S

c

S

a S b S

a S b S

c

S

a S b S

a S b S

c c

S

a S b S

a S b S

c c

no matchbacktrack


Fall 2006

S

a S b S

a Sc

S

a S b S

a S

c

S

a S b S

a S

c

S

a S b S

a S

c


Fall 2006

The Cocke-Younger-Kasami Algorithm

• Tabular methods include: Cocke-Younger-Kasami algorithm and Earley’s algorithm.

• These are essentially dynamic programming methods and are discussed mainly because of their simplicity.

• However, they are highly inefficent (n3 time and n2 space). They are the parsing equivalent to “bubble sort”.

• Definition: A Context Free Grammar (N, Σ, P, S) is said to be in Chomsky normal form (CNF) if each production in P is one of the forms (1) A → BC with A, B, C in N(2) A → a with a in Σ, or(3) if ε is in L(G) then S → ε is a production and S does not appear on the right hand side of any production.

• Theorem: Let L be a context free language (i.e. a language that can be generated by a context free grammar). Then L = L(G) for some CFG G in Chomsky Normal Form.


Fall 2006


Fall 2006

• Example: The GFG, G, defined by

S → aAB | BAA → BBB | aB → AS | b

can be converted to the equivalent CNF grammar, G’S → a’ <AB> | BAA → B <BB> | aB → AS | b<AB> → AB<BB> → BBa’ → a

where G=(N,{a,b},P,S) and G’=(N’,{a,b},P’,S). Here P and P’ correspond to the two sets of productions above and N’ ={S, A, B, <AB>, <BB>, a’}.

• let w = a1a2...an be an input string which is to be parsed according to G. We assume that each ai is in Σ for 1 ≤ i ≤ n. The essence of the algorithm is the construction of a parse table T, whose elements we denote tij for 1 ≤ i ≤ n and 1 ≤ j ≤ n-i+1. Each tij will have a value which is a

subset of N. A will be in tij iff A →+ aiai+1...ai+j-1, that is, if A derives the the j input symbols beginning at position i. As a special case, the input string w is in L(G) iff S is in t1n.


Fall 2006


Fall 2006

Top Down parsing.

A non-backtracking form of top-down parser is called a predictive parser (section 2.4 of the textbook).

The main idea is that at each point it is possible to know by looking at the next input token which production should be applied. For most programming languages, the fact that each type of statement is initiated by a different keyword (e.g. for, while, if) facilitates the development of top-down predictive parsers.


Fall 2006

• Consider the following grammartype → simple | * id | array [ simple ] of typesymple → integer | char | num : num

• Here the tokens are in bold.

• Starting from type, it is always clear which production to apply. -- Thus, if the next token is integer, char, or num,

the production type → simple should be appied.-- If it is *, the the production type → * id should be

applied .-- Finally, if it is array,

type → array [ simple ] of type should be applied

• Clearly, no need for backtracking ever arises.


Fall 2006

• Predictive parsing relies on the information about what first symbols can be generated by the right hand side of a production.

• Thus,-- FIRST(simple) = {integer, char, num}-- FIRST(* id) = { * }-- FIRST(array [ simple ] of type) = {array}


Fall 2006

Problems with predictive parsing.

Consider the grammar

S → S aS → b

By just looking at the next token it is impossible to determine how many times the first production shoudl be applied.

For example, if the input is baaaaa then the first production must be applied five times before the second one is.

Grammars like this are said to be left recursive.

Left recursion can be elimninated by changing the grammar. In this case to:

S → b S’S’ → a S’ | ε


Fall 2006

A second problem is that it may not be possible to determine by just looking at the next token which production to apply.

Consider the following grammar:S → i E t S | i E t S e S | a E → b

By just looking at the next token it is impossibel to determine whether the first or the second production should be applied.

Again, changing the grammar can solve this problem:

S → i E t S S’ | a

S’ → e S | ε

E → b


Fall 2006


Fall 2006

Example

Consider the grammar:A → id = EE → E + TE → E - TE → TE → -TT → T * FT → T /FT → FF → (E)F → id F → number

After eliminating the left recursion, we obtain


Fall 2006

A → id = EE → T E’E → -T E’ E’→ -T E’E’→ +T E’T → F T’T’ → * F T’T → / F T’F → (E)F → id F → number

From where we can create the recursive descent parser.

procedure A(){match(id);t=idnum;match(‘=’);expression();emit(“fstore”);emitln(t);

}procedure E(){

if ( lookahead == “-” ) {match(“-”);T();emit(“fneg”);

}else T();E’();

}procedure E’(){

if ( lookahead == “-” ) {match(“-”);T();emit(“fsub”);E’();

}elseif ( lookahead == “+” ) {

match(“+”);T();emit(“fadd”);E’();

}

}

procedure T(){F();T’();

}procedure T’(){

if ( lookahead == “*” ) {match(“*”);F();emitln(“fmul”);T’();

}elseif ( lookahead == “/” ) {

match(“/”);F();emitln(“fdiv”);T’();

}

}


Fall 2006

procedure F(){if ( lookahead == num ) {

emit(“ldc”);emitln(value);match (num);

}elseif lookahead == “(” {

match(“(”);E();match(“)”);

}else {

match(id);emit(“fload”);emitln(idnum);

}}


Fall 2006

Solving systems of regular expression equationspolaris.cs.uiuc.edu/~padua/cs426/cs426-5.pdf · •...

Documents

Transcript of Solving systems of regular expression equationspolaris.cs.uiuc.edu/~padua/cs426/cs426-5.pdf · •...