Compiler --- Theory and Implementation - Southeast …cse.seu.edu.cn/PersonalPage/seu_zzz/compiler...

29
Compiler --- Top-Down Parsing Zhang Zhizheng [email protected] School of Computer Science and Engineering, Software College Southeast University 2013/11/12 1 Zhang Zhizheng, Southeast University

Transcript of Compiler --- Theory and Implementation - Southeast …cse.seu.edu.cn/PersonalPage/seu_zzz/compiler...

Compiler--- Top-Down Parsing

Zhang [email protected]

School of Computer Science and Engineering,Software College

Southeast University

2013/11/12 1Zhang Zhizheng, Southeast University

Left Recursion Infinite loop Eliminating Left Recursion

Backtracking inefficientMethods: Prediction

1. Lifting common factor

2. Eliminating Ambiguity

Solution:Rewriting the grammar E.g.stmtif expr then stmt|if expr

then stmt else stmt|other==> stmt matched-

stmt|unmatched-stmtmatched-stmt if expr then

matched-stmt else matched-stmt|other

unmatched-stmt if expr then stmt|if expr then matched-stmt else unmatched-stmt

Problems in T-D Approach

E.g. Consider the following grammar, and parse the string id+id*id#

1.E TE` 2.E` +TE`

3.E` 4.T FT`

5.T` *FT` 6.T`

7.F id 8.F (E)

A Case without left recursion,

left common factor, ambiguity

1) FIRST & FOLLOW

FIRST:

• If is any string of grammar symbols, let FIRST() be the set of terminals that begin the string derived

from .

• If , then is also in FIRST()

• That is :

V*, First()={a| a……,a VT }

+

FOLLOW:• For non-terminal A, to be the set of terminals a that

can appear immediately to the right of A in some sentential form.

• That is: Follow(A)={a|S …Aa…,a VT }

If S…A, then # FOLLOW(A)。

2) Computing FIRST()

(1)to compute FIRST(X) for all grammar symbols X

• If X is terminal, then FIRST(X) is {X}.

• If X is a production, then add to FIRST(X).

• If Xa is a production, then add a to FIRST(X).

• If X is non-terminal, and X Y1Y2…Yk,Yj(VNVT),1j k, then

{ j=1; FIRST(X)={}; //initiate

while ( j<k and FIRST(Yj)) {

FIRST(X)=FIRST(X)(FIRST(Yj)-{})

j=j+1

}

IF (j=k and FIRST(Yk))

FIRST(X)=FIRST(X) {}

}

(2)to compute FIRST for any string =X1X2…Xn,Xi(VNVT),1i n

{i=1; FIRST()={}; //initiate

repeat

{

FIRST()=FIRST()(FIRST(Xi)-{})

i=i+1

}

until (i=n and FIRST(Xj))

IF (i=n and FIRST(Xn))

FIRST()=FIRST(){}

}

3) Computing FOLLOW(A)(1) Place # in FOLLOW(S), where S is the start

symbol and # is the input right end-marker.

(2)If there is A B in G, then add First()-{}to Follow(B).

(3)If there is A B, or AB where FIRST() contains ,then add Follow(A) to Follow(B).

construct FIRST & FOLLOW for each non-terminals

1.E TE` 2.E` +TE`

3.E` 4.T FT`

5.T` *FT` 6.T`

7.F i 8.F (E)

Answer:

First(E)=First(T)=First(F)={(, i}

First(E`)={+, }

First(T`)={*, }

Follow(E)= Follow(E`)={),#}

Follow(T)= Follow(T`)={+,),#}

Follow(F)={*,+,),#}

4) Construction of Predictive Parsing Tables

Main Idea: Suppose A is a production with a in FIRST(). Then the parser will expand A by when the current input symbol is a. If , we should again expand A by if the current input symbol is in FOLLOW(A), or if the # on the input has been reached and # is in FOLLOW(A).

*

– Input. Grammar G.

– Output. Parsing table M.

Method.

1. For each production A , do steps 2 and 3.

2. For each terminal a in FIRST(), add A to M[A,a].

3. If is in FIRST(), add A to M[A,b] for each terminal b in FOLLOW(A). If is in FIRST() and # is in FOLLOW(A), add A to M[A,#].

4.Make each undefined entry of M be error.

Parsing table M

id + * ( ) #

E ETE` ETE`

E` E`

+TE`

E`ε E`ε

T TFT` TFT`

T` T`ε T`

*FT`

T`ε T`ε

F F i F (E)

Predictive Parsing ProgramParsing Table M

id+id*id#

E#

Please Write down the procedure of analysis!

LL(1) Algorithm

X: the symbol on top of the stack;

a: the current input symbol

If X=a=#, the parser halts and announces successful completion of parsing;

If X=a!=#, the parser pops X off the stack and advances the input pointer to the next input symbol;

If X is a non-terminal, the program consults entry M[X,a] of the parsing table M. This entry will be either an X-production of the grammar or an error entry.

Usability of LL(1)

E.g. Consider the following Grammar, construct predictive

parsing table for it.

S iEtSS` |a

S` eS |

E b

a b e i t #

S S a S

iEtSS`

S` S` eS

S`

S`ε

E E b

Definition

A grammar whose parsing table has no

multiply-defined entries is said to

be LL(1).

The first “L” stands for scanning the

input from left to right.

The second “L” stands for producing a

leftmost derivation

“1” means using one input symbol of

look-ahead s.t each step to make

parsing action decisions.

(1)No ambiguous can be LL(1).

(2)Left-recursive grammar cannot be LL(1).

(3)A grammar G is LL(1) if and only if

whenever A | are two distinct

productions of G

I. For no terminal a do both and derive strings beginning with a.

II. At most one of and can derive the empty string.III. If , then does not derive any string beginning with

a terminal in FOLLOW(A).

Forms of left recursion

Left recursion is the grammar contains the following

kind of productions.

• P P| Immediate recursion

or

• P Aa , APb Indirect recursion

Eliminate Left Recursions

The Main Idea of Algorithm

(1) Elimination of immediate left recursion

P P|

=> P->*

=> P P’ P’ P’|

(2) Elimination of indirect left recursion

Convert it into immediate left recursion first according to specific order, then eliminate the related immediate left recursion

Algorithm:

– (1)Arrange the non-terminals in G in some order as P1,P2,…,Pn, do step 2 for each of them.

– (2) for (i=1,i<=n,i++)

{for (k=1,k<=i-1,k++){replace each production of the form Pi Pk

by Pi 1 | 2 |……| ,n ;

where Pk 1| 2|……| ,n are all the current Pk productions

}

change Pi Pi1| Pi2|…. | Pim|1| 2|….| n into

Pi 1 Pi`| 2 Pi `|……| n Pi`

Pi`1Pi`|2Pi`|……| mPi`|

}

eliminate the immediate left recursion

(3)Simplify the grammar.

E.g. Eliminating all left recursion in the following grammar:

(1) S Qc|c (2)Q Rb|b (3) R Sa|a

Answer: 1)Arrange the non-terminals in the order:R,Q,S

2)for R: no actions.

for Q:Q Rb|b Q Sab|ab|b

for S: S Qc|c S Sabc|abc|bc|c;

then get S (abc|bc|c)S`

S` abcS`|

3) Because R,Q is not reachable, so delete them

so, the grammar is :

S (abc|bc|c)S`

S` abcS`|

If the grammar contains the productions like A1| 2|…. | n

Chang them into AA`

A`1|2|…. |n

Lift the Common Factor

A. Left Recursion is a fatal flaw

B. To improve the TOP-Down by Prediction (LL(1))① Eliminating Ambiguities manually

② Eliminating Left Recursions

③ Lifting Maximal Common Factors

④ Constructing A Prediction Parse

Table

A Survey

Assignments

Exercises CH4 1~3.

Practices Implement a parser for your C language by the idea of LL(1)Input: A Sequence of Tokens that is output by your ScannerOutput: A sequence of labels of the rules used in derivations, or an error report.