CS415 Compilers LR Parsing & Error Recoveryzz124/cs415_spring2014/... · Review: LR(k) items The...

CS415 Compilers

LR Parsing & Error Recovery

These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice

University

cs415, spring 14 Lecture 13 2

Review: LR(k) items

The LR(1) table construction algorithm uses LR(1) items to represent valid configurations of an LR(1) parser

An LR(k) item is a pair [P, δ], where P is a production A→β with a • at some position in the rhs δ is a lookahead string of length ≤ k (words or EOF)

The • in an item indicates the position of the top of the stack LR(1): [A→•βγ,a] means that the input seen so far is consistent with the use

of A →βγ immediately after the symbol on top of the stack [A →β•γ,a] means that the input seen so far is consistent with the use

of A →βγ at this point in the parse, and that the parser has already recognized β.

[A →βγ•,a] means that the parser has seen βγ, and that a lookahead symbol of a is consistent with reducing to A.


Review - Computing Closures

Closure(s) adds all the items implied by items already in s •  Any item [A→β•Bδ,a] implies [B→•τ,x] for each production

with B on the lhs, and each x ∈ FIRST(δa) – for LR(1) item

The algorithm

Closure( s ) while ( s is still changing ) ∀ items [A → β •Bδ,a] ∈ s ∀ productions B → τ ∈ P ∀ b ∈ FIRST(δa) // δ might be ε if [B → • τ,b] ∉ s then add [B→ • τ,b] to s

Ø  Classic fixed-point method Ø  Halts because s ⊂ ITEMS

Closure “fills out” a state


Review - Computing Gotos

Goto(s,x) computes the state that the parser would reach if it recognized an x while in state s •  Goto( { [A→β•Xδ,a] }, X ) produces [A→βX•δ,a] (easy part) •  Should also includes closure( [A→βX•δ,a] ) (fill out the state)

The algorithm

Goto( s, X ) new ←Ø ∀ items [A→β•Xδ,a] ∈ s new ← new ∪ [A→βX•δ,a] return closure(new)

Ø  Not a fixed-point method! Ø  Straightforward computation Ø  Uses closure ( )

Goto() moves forward


Review - Building the Canonical Collection

Start from s0 = closure( [S’→S,EOF ] ) Repeatedly construct new states, until all are found

The algorithm cc0 ← closure ( [S’→ •S, EOF] ) CC ← { cc0 } while ( new sets are still being added to CC) for each unmarked set ccj ∈ CC mark ccj as processed for each x following a • in an item in ccj temp ← goto(ccj, x) if temp ∉ CC then CC ← CC ∪ { temp } record transitions from ccj to temp on x

Ø  Fixed-point computation (worklist version) Ø  Loop adds to CC Ø  CC ⊆ 2ITEMS, so CC is finite


High-level overview 1  Build the canonical collection of sets of LR(1) Items, I

a  Begin in an appropriate state, s0 ♦ Assume: S’ →S, and S’ is unique start symbol that does

not occur on any RHS of a production (extended CFG - ECFG)

♦  [S’ →•S,EOF], along with any equivalent items ♦ Derive equivalent items as closure( s0 )

b  Repeatedly compute, for each sk, and each X, goto(sk,X) ♦  If the set is not already in the collection, add it ♦  Record all the transitions created by goto( )

This eventually reaches a fixed point

2  Fill in the table from the collection of sets of LR(1) items The canonical collection completely encodes the

transition diagram for the handle-finding DFA

Review – LR(1) Table Construction


Review: Example (building the collection)

Initialization Step

s0 ← closure( { [Goal → •Expr , EOF] } ) = {[Goal → •Expr , EOF], [Expr à • Term – Expr, EOF], [Expr à •

Term, EOF], [Term à • Factor * Term, -], [Term à • Factor, -], [Term à •

Factor * Term, EOF], [Term à • Factor, EOF], [Factor à •ident, *], [Factor à •ident, -], [Factor à •ident, EOF]}

S ← { S0 }

1: Goal → Expr 2: Expr → Term – Expr 3: Expr → Term 4: Term → Factor * Term 5: Term → Factor 6: Factor → ident

Symbol FIRSTGoal { ident }Expr { ident }Term { ident }

Factor { ident }– { – }* { * }

ident { ident }


Example (building the collection)

Iteration 1 s1 ← goto(s0 , Expr) s2 ← goto(s0 , Term) s3 ← goto(s0 , Factor) s4 ← goto(s0 , ident )

s0 ← closure( { [Goal → •Expr , EOF] } ) { [Goal → • Expr , EOF], [Expr → • Term – Expr , EOF], [Expr → • Term , EOF], [Term → • Factor * Term , EOF], [Term → • Factor * Term , –], [Term → • Factor , EOF], [Term → • Factor , –], [Factor → • ident , EOF], [Factor → • ident , –], [Factor → • ident , *] }



Iteration 1 s1 ← goto(s0 , Expr) = { [Goal → Expr •, EOF] } s2 ← goto(s0 , Term) = { [Expr → Term • – Expr , EOF], [Expr → Term •,

EOF] }

s3 ← goto(s0 , Factor) = { [Term → Factor • * Term , EOF],[Term →

Factor • * Term , –], [Term → Factor •, EOF], [Term → Factor •, –] }

s4 ← goto(s0 , ident ) = { [Factor → ident •, EOF],[Factor → ident •, –],

[Factor → ident •, *] }

s0 ← closure( { [Goal → •Expr , EOF] } ) { [Goal → • Expr , EOF], [Expr → • Term – Expr , EOF], [Expr → • Term , EOF], [Term → • Factor * Term , EOF], [Term → • Factor * Term , –], [Term → • Factor , EOF], [Term → • Factor , –], [Factor → • ident , EOF], [Factor → • ident , –], [Factor → • ident , *] }



EOF] }





Iteration 2 s5 ← goto(s2 , – ) s6 ← goto(s3 , * )




EOF] }





Iteration 2

s5 ← goto(s2 , – ) = { [Expr → Term – • Expr , EOF], [Expr → • Term – Expr , EOF], [Expr → • Term , EOF], [Term → • Factor * Term , –], [Term → • Factor , –], [Term → • Factor * Term , EOF], [Term → • Factor , EOF], [Factor → • ident , *], [Factor → • ident , –], [Factor → • ident , EOF] }

s6 ← goto(s3 , * ) = … see next page



Iteration 2

s6 ← goto(s3 , * ) = { [Term → Factor * • Term , EOF], [Term → Factor * • Term , –], [Term → • Factor * Term , EOF], [Term → • Factor * Term , –], [Term → • Factor , EOF], [Term → • Factor , –], [Factor → • ident , EOF], [Factor → • ident , –], [Factor → • ident , *] }


s5 ← goto(s2 , – ) = { [Expr → Term – • Expr , EOF], [Expr → • Term – Expr , EOF], [Expr → • Term , EOF], [Term → • Factor * Term , –], [Term → • Factor * Term , EOF], [Term → • Factor , –], [Term → • Factor , EOF], [Factor → • ident , *], [Factor → • ident , –], [Factor → • ident , EOF] }

Iteration 3 s7 ← goto(s5 , Expr ) = { ? } s8 ← goto(s6 , Term ) = { ? } s2 ← goto(s5, Term), s3 ← goto(s5, factor) , s4 ← goto(S5,

ident), s3 ← goto(s6, Factor), s4 ← goto(S6, ident)


Iteration 2

s6 ← goto(s3 , * ) = { [Term → Factor * • Term , EOF], [Term → Factor * • Term , –], [Term → • Factor * Term , EOF], [Term → • Factor * Term , –], [Term → • Factor , EOF], [Term → • Factor , –], [Factor → • ident , EOF], [Factor → • ident , –], [Factor → • ident , *] }


s5 ← goto(s2 , – ) = { [Expr → Term – • Expr , EOF], [Expr → • Term – Expr , EOF], [Expr → • Term , EOF], [Term → • Factor * Term , –], [Term → • Factor * Term , EOF], [Term → • Factor , –], [Term → • Factor , EOF], [Factor → • ident , *], [Factor → • ident , –], [Factor → • ident , EOF] }

Iteration 3 s7 ← goto(s5 , Expr ) = { [Expr → Term – Expr •, EOF] } s8 ← goto(s6 , Term ) = { [Term → Factor * Term •, EOF], [Term →

Factor * Term •, –] }


Example (Summary)

S0 : { [Goal → • Expr , EOF], [Expr → • Term – Expr , EOF], [Expr → • Term , EOF], [Term → • Factor * Term , EOF], [Term → • Factor * Term , –], [Term → • Factor , EOF], [Term → • Factor , –], [Factor → • ident , EOF], [Factor → • ident , –], [Factor→ • ident, *] } S1 : { [Goal → Expr •, EOF] } S2 : { [Expr → Term • – Expr , EOF], [Expr → Term •, EOF] }

S3 : { [Term → Factor • * Term , EOF],[Term → Factor • * Term , –], [Term → Factor •, EOF], [Term → Factor •, –] }

S4 : { [Factor → ident •, EOF],[Factor → ident •, –], [Factor → ident •, *] }

S5 : { [Expr → Term – • Expr , EOF], [Expr → • Term – Expr , EOF], [Expr → • Term , EOF], [Term → • Factor * Term , –], [Term → • Factor , –], [Term → • Factor * Term , EOF], [Term → • Factor , EOF], [Factor → • ident , *], [Factor → • ident , –], [Factor → • ident , EOF] }


Example (Summary)

S6 : { [Term → Factor * • Term , EOF], [Term → Factor * • Term , –], [Term → • Factor * Term , EOF], [Term → • Factor * Term , –], [Term → • Factor , EOF], [Term → • Factor , –], [Factor → • ident , EOF], [Factor → • ident , –], [Factor → • ident , *] }

S7: { [Expr → Term – Expr •, EOF] }

S8 : { [Term → Factor * Term •, EOF], [Term → Factor * Term •, –] }


Example (DFA)

s0 s4 s5

s1 s2 s7

s6 s3

s8

ident

term

factor

-

term

expr ident

factor

expr

*

factor term

ident State Ident - * Expr Term Factor

0 4 1 2 3

1

2 5

3 6 4

5 4 7 2 3

6 4 8 3

7 8

The State Transition Table


Filling in the ACTION and GOTO Tables

The algorithm

∀ set sx ∈ S ∀ item i ∈ sx if i is [A→β •ad,b] and goto(sx,a) = sk, a ∈ T then ACTION[x,a] ← “shift k” else if i is [S’→S •,EOF] then ACTION[x , EOF] ← “accept” else if i is [A→β •,a] then ACTION[x,a] ← “reduce A→β” ∀ n ∈ NT if goto(sx ,n) = sk then GOTO[x,n] ← k

Many items generate no table entry


Example (Filling in the tables) The algorithm produces LR(1) parse table

ACTION GOTO

Ident - * EOF Expr Term Factor0 s 4 1 2 31 acc2 s 5 r 33 r 5 s 6 r 54 r 6 r 6 r 65 s 4 7 2 36 s 4 8 37 r 28 r 4 r 4

Plugs into the skeleton LR(1) parser

State Ident - * Expr Term Factor

0 4 1 2 3

1

2 5

3 6 4

5 4 7 2 3

6 4 8 3

7 8

Remember the state transition table?

cs415, spring 14

An Example for Table Filling Practice

Lecture 15 20

A Parse Table Filling Example

For pdf lecture notes readers, see attached LR(1) parse table example file


What can go wrong?

What if set s contains [A→β•aγ,b] and [B→β•,a] ? •  First item generates “shift”, second generates “reduce” •  Both define ACTION[s,a] — cannot do both actions •  This is a fundamental ambiguity, called a shift/reduce error •  Modify the grammar to eliminate it (if-then-else)

What if set s contains [A→γ•, a] and [B→γ•, a] ? •  Each generates “reduce”, but with a different production •  Both define ACTION[s,a] — cannot do both reductions •  This fundamental ambiguity is called a reduce/reduce error •  Modify the grammar to eliminate it

In either case, the grammar is not LR(1)

EaC includes a worked example


Shrinking the Tables

Three options: •  Combine terminals such as number & identifier, + & -, * & /

→  Directly removes a column, may remove a row →  For expression grammar, 198 (vs. 384) table entries

•  Combine rows or columns (table compression) →  Implement identical rows once & remap states →  Requires extra indirection on each lookup →  Use separate mapping for ACTION & for GOTO

•  Use another construction algorithm →  Both LALR(1) and SLR(1) produce smaller tables →  Implementations are readily available


LR(0) versus SLR(1) versus LR(1)

LR(0) ? -- set of LR(0) items as states LR(1) ? -- set of LR(1) items as states, different states

compared to LR(0) SLR(1) ? -- LR(0) items and canonical sets, same as LR(0) SLR(1): add FOLLOW(A) to each LR(0) item [A→γ•] as its

second component: [A→γ•, a], ∀a ∈FOLLOW(A)


LR(0) versus SLR(1) versus LR(1)

Example: LR(0) ? LR(1) ? SLR(1) ?

S’ → S S → S ; a | a


LR(0) versus LR(1) versus SLR(1)

s0 = Closure({[S’ → .S]}) = {[S’ -> .S], [S -> .S; a], [S -> .a] } s1 = Closure( GoTo (s0, S)) = {[S’ → S. ], [S → S.; a] } s3 = Closure( GoTo (s1, ;)) = {[S → S; . a]}

s2 = Closure( GoTo (s0, a)) = {[S → a.]} s4 = Closure( GoTo (s3, a)) = {[S → S;a .] }

Grammar is not LR(0), but LR(1) and SLR(1)

s0 = Closure({[S’ → .S,eof]}) = {[S’ -> .S,eof], [S -> .S; a,eof], [S -> .a,;] } s1 = Closure( GoTo (s0, S)) = {[S’ → S. eof], [S → S.; a,eof] } s3 = Closure( GoTo (s1, ;)) = {[S → S; . a,eof]}

LR(0) States

s2 = Closure( GoTo (s0, a)) = {[S → a.,;]} s4 = Closure( GoTo (s3, a)) = {[S → S;a ., eof] }

LR(1) States


LALR(1) versus LR(1)

LALR(1) ? LR(1) items, State -> Grouped LR(1) states LALR(1): Merge two sets of LR(1) items (states), if they have the same core. core of set of LR(1) items: set of LR(0) items derived by ignoring the lookahead symbols

FACT: collapsing LR(1) states into LALR(1) states cannot introduce shift/reduce conflicts


LALR(1) versus LR(1)

s0 = Closure({[S’ → .S, eof]}) s1 = Closure( GoTo (s0, a)) = {[S → a . Ad, eof], [S → a . Be, eof], [A → .c, d], [ B → .c, e]} s2 = Closure( GoTo (s0, b)) = {[S → b . Ae, eof], [S → b . Bd, eof], [A → .c, e], [B → .c, d]} There are other states that are not listed here!

s3 = Closure( GoTo (s1, c)) = {[A → c. , d], [B → c. , e]} s4 = Closure( GoTo (s2, c)) = {[A → c. , e], [B → c. , d]}

Grammar is LR(1), but not LALR(1), since collapsing s3 and s4 (same core) will introduce reduce-reduce conflict.


Hierarchy of Context-Free Grammars

The inclusion hierarchy for context-free grammars

•  Operator precedence includes some ambiguous grammars

•  LL(1) is a subset of SLR(1)

Context-free grammars

Unambiguous CFGs

Operator Precedence

Floyd-Evans Parsable

LR(k)

LR(1)

LALR(1)

SLR(1)

LR(0)

LL(k)

LL(1)

Ref Book: Michael Sipser, “Introduction to the Theory of Computation”, 3rd Edition


Error Recovery in Shift-Reduce Parsers

The problem: parser encounters an invalid token Goal: Want to parse the rest of the file Basic idea (panic mode):

→  Assume something went wrong while trying to find handle for nonterminal A

→  Pretend handle for A has been found; pop “handle”, skip over input to find terminal that can follow A

Restarting the parser (panic mode): →  find a restartable state on the stack (has transition for

nonterminal A) →  move to a consistent place in the input (token that can follow A) →  perform (error) reduction (for nonterminal A) →  print an informative message


Error Recovery in YACC

Yacc’s (bison’s) error mechanism (note: version dependent!) •  designated token error •  used in error productions of the form A → error α // basic case •  α specifies synchronization points When error is discovered •  pops stack until it finds state where it can shift the error token •  resumes parsing to match α special cases:

→  α = w, where w is string of terminals: skip input until w has been read →  α = ε : skip input until state transition on input token is defined

•  error productions can have actions


Error Recovery in YACC

cmpdstmt: BEG stmt_list END stmt_list : stmt | stmt_list ‘;’ stmt | error { yyerror(“\n***Error: illegal statement\n”);} This should •  throw out the erroneous statement •  synchronize at “;” or “end” (implicit: α = ε) •  writes message “***Error: illegal statement” to stderr Example: begin a & 5 | hello ; a := 3 end ↑ ↑ resume parsing ***Error: illegal statement


Project #2 (see “lex & yacc”, Levine et al., O’Reilly)

→  You do have to (slightly) change the scanner (scan.l)

→  How to specify and use attributes in YACC?

§  Define attributes as types in attr.h

typedef struct info_node {int a; int b} infonode; §  Include type attribute name in %union in parse.y

%union {tokentype token; infonode myinfo; … } §  Assign attributes in parse.y to

–  Terminals: %token <token> ID ICONST –  Non-terminals: %type <myinfo> block variables procdecls cmpdstmt

§  Accessing attribute values in parse.y –  use $$, $1, $2 … etc. notation: block : variables procdecls {$2.b = $1.b + 1;} cmpdstmt { $$.a = $1.a + $2.a + $3.b;}


YACC : parse.y %{ #include <stdio.h> #include "attr.h" int yylex(); void yyerror(char * s); #include "symtab.h" %} %union {tokentype token; } %token PROG PERIOD PROC VAR ARRAY RANGE OF %token INT REAL DOUBLE WRITELN THEN ELSE IF %token BEG END ASG NOT %token EQ NEQ LT LEQ GEQ GT OR EXOR AND DIV NOT %token <token> ID CCONST ICONST RCONST %start program %% program : PROG ID ';' block PERIOD { }

; block : BEG ID ASG ICONST END { }

; %% void yyerror(char* s) { fprintf(stderr,"%s\n",s); } int main() { printf("1\t"); yyparse(); return 1; }

parse.y : Will be included verbatim in parse.tab.c

Rules with semantic actions

Main program and “helper” functions; may contain initialization code of global structures. Will be included verbatim in parse.tab.c

List and assign attributes


Project #2 : Things to do

•  Learn/Review the C programming language

•  Add error productions (syntax errors)

•  Define and assign attributes to non-terminals

•  Implement single-level symbol table

•  Perform type checking and produce required error messages; note: actions may occur at “any” location on right-hand side (implicit use of marker productions)


Ad-hoc, syntax directed translation schemes,type checking

Read EaC: Chapters 4.1- 4.4

Next two classes

CS415 Compilers LR Parsing & Error Recoveryzz124/cs415_spring2014/... · Review: LR(k) items The...

Documents

Transcript of CS415 Compilers LR Parsing & Error Recoveryzz124/cs415_spring2014/... · Review: LR(k) items The...