CS415 Compilers LR Parsing & Error Recoveryzz124/cs415_spring2014/... · Review: LR(k) items The...

of 35 /35
CS415 Compilers LR Parsing & Error Recovery These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

Embed Size (px)

Transcript of CS415 Compilers LR Parsing & Error Recoveryzz124/cs415_spring2014/... · Review: LR(k) items The...

  • CS415 Compilers

    LR Parsing & Error Recovery

    These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice

    University

  • cs415, spring 14 Lecture 13

    2

    Review: LR(k) items

    The LR(1) table construction algorithm uses LR(1) items to represent valid configurations of an LR(1) parser

    An LR(k) item is a pair [P, δ], where P is a production A→β with a • at some position in the rhs δ is a lookahead string of length ≤ k (words or EOF)

    The • in an item indicates the position of the top of the stack LR(1): [A→•βγ,a] means that the input seen so far is consistent with the use

    of A →βγ immediately after the symbol on top of the stack [A →β•γ,a] means that the input seen so far is consistent with the use

    of A →βγ at this point in the parse, and that the parser has already recognized β.

    [A →βγ•,a] means that the parser has seen βγ, and that a lookahead symbol of a is consistent with reducing to A.

  • cs415, spring 14 Lecture 14

    3

    Review - Computing Closures

    Closure(s) adds all the items implied by items already in s •  Any item [A→β•Bδ,a] implies [B→•τ,x] for each production

    with B on the lhs, and each x ∈ FIRST(δa) – for LR(1) item

    The algorithm

    Closure( s ) while ( s is still changing ) ∀ items [A → β •Bδ,a] ∈ s ∀ productions B → τ ∈ P ∀ b ∈ FIRST(δa) // δ might be ε if [B → • τ,b] ∉ s then add [B→ • τ,b] to s

    Ø  Classic fixed-point method Ø  Halts because s ⊂ ITEMS

    Closure “fills out” a state

  • cs415, spring 14 Lecture 14

    4

    Review - Computing Gotos

    Goto(s,x) computes the state that the parser would reach if it recognized an x while in state s •  Goto( { [A→β•Xδ,a] }, X ) produces [A→βX•δ,a] (easy part) •  Should also includes closure( [A→βX•δ,a] ) (fill out the state)

    The algorithm

    Goto( s, X ) new ←Ø ∀ items [A→β•Xδ,a] ∈ s new ← new ∪ [A→βX•δ,a] return closure(new)

    Ø  Not a fixed-point method! Ø  Straightforward computation Ø  Uses closure ( )

    Goto() moves forward

  • cs415, spring 14 Lecture 14

    5

    Review - Building the Canonical Collection

    Start from s0 = closure( [S’→S,EOF ] ) Repeatedly construct new states, until all are found

    The algorithm cc0 ← closure ( [S’→ •S, EOF] ) CC ← { cc0 } while ( new sets are still being added to CC) for each unmarked set ccj ∈ CC mark ccj as processed for each x following a • in an item in ccj temp ← goto(ccj, x) if temp ∉ CC then CC ← CC ∪ { temp } record transitions from ccj to temp on x

    Ø  Fixed-point computation (worklist version) Ø  Loop adds to CC Ø  CC ⊆ 2ITEMS, so CC is finite

  • cs415, spring 14 Lecture 14

    6

    High-level overview 1  Build the canonical collection of sets of LR(1) Items, I

    a  Begin in an appropriate state, s0 ♦ Assume: S’ →S, and S’ is unique start symbol that does

    not occur on any RHS of a production (extended CFG - ECFG)

    ♦  [S’ →•S,EOF], along with any equivalent items ♦ Derive equivalent items as closure( s0 )

    b  Repeatedly compute, for each sk, and each X, goto(sk,X) ♦  If the set is not already in the collection, add it ♦  Record all the transitions created by goto( )

    This eventually reaches a fixed point

    2  Fill in the table from the collection of sets of LR(1) items The canonical collection completely encodes the

    transition diagram for the handle-finding DFA

    Review – LR(1) Table Construction

  • cs415, spring 14 Lecture 14

    7

    Review: Example (building the collection)

    Initialization Step

    s0 ← closure( { [Goal → •Expr , EOF] } ) = {[Goal → •Expr , EOF], [Expr à • Term – Expr, EOF], [Expr à •

    Term, EOF], [Term à • Factor * Term, -], [Term à • Factor, -], [Term à •

    Factor * Term, EOF], [Term à • Factor, EOF], [Factor à •ident, *], [Factor à •ident, -], [Factor à •ident, EOF]}

    S ← { S0 }

    1: Goal → Expr 2: Expr → Term – Expr 3: Expr → Term 4: Term → Factor * Term 5: Term → Factor 6: Factor → ident

    Symbol FIRSTGoal { ident }Expr { ident }Term { ident }

    Factor { ident }– { – }* { * }

    ident { ident }

  • cs415, spring 14 Lecture 14

    8

    Example (building the collection)

    Iteration 1 s1 ← goto(s0 , Expr) s2 ← goto(s0 , Term) s3 ← goto(s0 , Factor) s4 ← goto(s0 , ident )

    s0 ← closure( { [Goal → •Expr , EOF] } ) { [Goal → • Expr , EOF], [Expr → • Term – Expr , EOF], [Expr → • Term , EOF], [Term → • Factor * Term , EOF], [Term → • Factor * Term , –], [Term → • Factor , EOF], [Term → • Factor , –], [Factor → • ident , EOF], [Factor → • ident , –], [Factor → • ident , *] }

  • cs415, spring 14 Lecture 14

    9

    Example (building the collection)

    Iteration 1 s1 ← goto(s0 , Expr) = { [Goal → Expr •, EOF] } s2 ← goto(s0 , Term) = { [Expr → Term • – Expr , EOF], [Expr → Term •,

    EOF] }

    s3 ← goto(s0 , Factor) = { [Term → Factor • * Term , EOF],[Term → Factor • * Term , –], [Term → Factor •, EOF], [Term → Factor •, –] }

    s4 ← goto(s0 , ident ) = { [Factor → ident •, EOF],[Factor → ident •, –],

    [Factor → ident •, *] }

    s0 ← closure( { [Goal → •Expr , EOF] } ) { [Goal → • Expr , EOF], [Expr → • Term – Expr , EOF], [Expr → • Term , EOF], [Term → • Factor * Term , EOF], [Term → • Factor * Term , –], [Term → • Factor , EOF], [Term → • Factor , –], [Factor → • ident , EOF], [Factor → • ident , –], [Factor → • ident , *] }

  • cs415, spring 14 Lecture 14

    10

    Iteration 1 s1 ← goto(s0 , Expr) = { [Goal → Expr •, EOF] } s2 ← goto(s0 , Term) = { [Expr → Term • – Expr , EOF], [Expr → Term •,

    EOF] }

    s3 ← goto(s0 , Factor) = { [Term → Factor • * Term , EOF],[Term → Factor • * Term , –], [Term → Factor •, EOF], [Term → Factor •, –] }

    s4 ← goto(s0 , ident ) = { [Factor → ident •, EOF],[Factor → ident •, –],

    [Factor → ident •, *] }

    Iteration 2 s5 ← goto(s2 , – ) s6 ← goto(s3 , * )

    Example (building the collection)

  • cs415, spring 14 Lecture 14

    11

    Iteration 1 s1 ← goto(s0 , Expr) = { [Goal → Expr •, EOF] } s2 ← goto(s0 , Term) = { [Expr → Term • – Expr , EOF], [Expr → Term •,

    EOF] }

    s3 ← goto(s0 , Factor) = { [Term → Factor • * Term , EOF],[Term → Factor • * Term , –], [Term → Factor •, EOF], [Term → Factor •, –] }

    s4 ← goto(s0 , ident ) = { [Factor → ident •, EOF],[Factor → ident •, –],

    [Factor → ident •, *] }

    Iteration 2

    s5 ← goto(s2 , – ) = { [Expr → Term – • Expr , EOF], [Expr → • Term – Expr , EOF], [Expr → • Term , EOF], [Term → • Factor * Term , –], [Term → • Factor , –], [Term → • Factor * Term , EOF], [Term → • Factor , EOF], [Factor → • ident , *], [Factor → • ident , –], [Factor → • ident , EOF] }

    s6 ← goto(s3 , * ) = … see next page

    Example (building the collection)

  • cs415, spring 14 Lecture 14

    12

    Iteration 2

    s6 ← goto(s3 , * ) = { [Term → Factor * • Term , EOF], [Term → Factor * • Term , –], [Term → • Factor * Term , EOF], [Term → • Factor * Term , –], [Term → • Factor , EOF], [Term → • Factor , –], [Factor → • ident , EOF], [Factor → • ident , –], [Factor → • ident , *] }

    Example (building the collection)

    s5 ← goto(s2 , – ) = { [Expr → Term – • Expr , EOF], [Expr → • Term – Expr , EOF], [Expr → • Term , EOF], [Term → • Factor * Term , –], [Term → • Factor * Term , EOF], [Term → • Factor , –], [Term → • Factor , EOF], [Factor → • ident , *], [Factor → • ident , –], [Factor → • ident , EOF] }

    Iteration 3 s7 ← goto(s5 , Expr ) = { ? } s8 ← goto(s6 , Term ) = { ? } s2 ← goto(s5, Term), s3 ← goto(s5, factor) , s4 ← goto(S5,

    ident), s3 ← goto(s6, Factor), s4 ← goto(S6, ident)

  • cs415, spring 14 Lecture 14

    13

    Iteration 2

    s6 ← goto(s3 , * ) = { [Term → Factor * • Term , EOF], [Term → Factor * • Term , –], [Term → • Factor * Term , EOF], [Term → • Factor * Term , –], [Term → • Factor , EOF], [Term → • Factor , –], [Factor → • ident , EOF], [Factor → • ident , –], [Factor → • ident , *] }

    Example (building the collection)

    s5 ← goto(s2 , – ) = { [Expr → Term – • Expr , EOF], [Expr → • Term – Expr , EOF], [Expr → • Term , EOF], [Term → • Factor * Term , –], [Term → • Factor * Term , EOF], [Term → • Factor , –], [Term → • Factor , EOF], [Factor → • ident , *], [Factor → • ident , –], [Factor → • ident , EOF] }

    Iteration 3 s7 ← goto(s5 , Expr ) = { [Expr → Term – Expr •, EOF] } s8 ← goto(s6 , Term ) = { [Term → Factor * Term •, EOF], [Term →

    Factor * Term •, –] }

  • cs415, spring 14 Lecture 14

    14

    Example (Summary)

    S0 : { [Goal → • Expr , EOF], [Expr → • Term – Expr , EOF], [Expr → • Term , EOF], [Term → • Factor * Term , EOF], [Term → • Factor * Term , –], [Term → • Factor , EOF], [Term → • Factor , –], [Factor → • ident , EOF], [Factor → • ident , –], [Factor→ • ident, *] } S1 : { [Goal → Expr •, EOF] } S2 : { [Expr → Term • – Expr , EOF], [Expr → Term •, EOF] }

    S3 : { [Term → Factor • * Term , EOF],[Term → Factor • * Term , –], [Term → Factor •, EOF], [Term → Factor •, –] } S4 : { [Factor → ident •, EOF],[Factor → ident •, –], [Factor → ident •, *] }

    S5 : { [Expr → Term – • Expr , EOF], [Expr → • Term – Expr , EOF], [Expr → • Term , EOF], [Term → • Factor * Term , –], [Term → • Factor , –], [Term → • Factor * Term , EOF], [Term → • Factor , EOF], [Factor → • ident , *], [Factor → • ident , –], [Factor → • ident , EOF] }

  • cs415, spring 14 Lecture 14

    15

    Example (Summary)

    S6 : { [Term → Factor * • Term , EOF], [Term → Factor * • Term , –],  [Term → • Factor * Term , EOF], [Term → • Factor * Term , –], [Term → • Factor , EOF], [Term → • Factor , –], [Factor → • ident , EOF], [Factor → • ident , –], [Factor → • ident , *] }

    S7: { [Expr → Term – Expr •, EOF] }

    S8 : { [Term → Factor * Term •, EOF], [Term → Factor * Term •, –] }

  • cs415, spring 14 Lecture 14

    16

    Example (DFA)

    s0 s4 s5

    s1 s2 s7

    s6 s3

    s8

    ident

    term

    factor

    -

    term

    expr ident

    factor

    expr

    *

    factor term

    ident State Ident - * Expr Term Factor

    0 4 1 2 3

    1

    2 5

    3 6 4

    5 4 7 2 3

    6 4 8 3

    7 8

    The State Transition Table

  • cs415, spring 14 Lecture 14

    17

    Example (DFA)

    s0 s4 s5

    s1 s2 s7

    s6 s3

    s8

    ident

    term

    factor

    -

    term

    expr ident

    factor

    expr

    *

    factor term

    ident State Ident - * Expr Term Factor

    0 4 1 2 3

    1

    2 5

    3 6 4

    5 4 7 2 3

    6 4 8 3

    7 8

    The State Transition Table

  • cs415, spring 14 Lecture 14

    18

    Filling in the ACTION and GOTO Tables

    The algorithm

    ∀ set sx ∈ S ∀ item i ∈ sx if i is [A→β •ad,b] and goto(sx,a) = sk, a ∈ T then ACTION[x,a] ← “shift k” else if i is [S’→S •,EOF] then ACTION[x , EOF] ← “accept” else if i is [A→β •,a] then ACTION[x,a] ← “reduce A→β” ∀ n ∈ NT if goto(sx ,n) = sk then GOTO[x,n] ← k

    Many items generate no table entry

  • cs415, spring 14 Lecture 14

    19

    Example (Filling in the tables) The algorithm produces LR(1) parse table

    ACTION GOTOIdent - * EOF Expr Term Factor

    0 s 4 1 2 31 acc2 s 5 r 33 r 5 s 6 r 54 r 6 r 6 r 65 s 4 7 2 36 s 4 8 37 r 28 r 4 r 4

    Plugs into the skeleton LR(1) parser

    State Ident - * Expr Term Factor

    0 4 1 2 3

    1

    2 5

    3 6 4

    5 4 7 2 3

    6 4 8 3

    7 8

    Remember the state transition table?

  • cs415, spring 14

    An Example for Table Filling Practice

    Lecture 15

    20

    A Parse Table Filling Example

    For pdf lecture notes readers, see attached LR(1) parse table example file

  • cs415, spring 14 Lecture 14

    21

    What can go wrong?

    What if set s contains [A→β•aγ,b] and [B→β•,a] ? •  First item generates “shift”, second generates “reduce” •  Both define ACTION[s,a] — cannot do both actions •  This is a fundamental ambiguity, called a shift/reduce error •  Modify the grammar to eliminate it (if-then-else)

    What if set s contains [A→γ•, a] and [B→γ•, a] ? •  Each generates “reduce”, but with a different production •  Both define ACTION[s,a] — cannot do both reductions •  This fundamental ambiguity is called a reduce/reduce error •  Modify the grammar to eliminate it

    In either case, the grammar is not LR(1)

    EaC includes a worked example

  • cs415, spring 14 Lecture 14

    22

    Shrinking the Tables

    Three options: •  Combine terminals such as number & identifier, + & -, * & /

    →  Directly removes a column, may remove a row →  For expression grammar, 198 (vs. 384) table entries

    •  Combine rows or columns (table compression) →  Implement identical rows once & remap states →  Requires extra indirection on each lookup →  Use separate mapping for ACTION & for GOTO

    •  Use another construction algorithm →  Both LALR(1) and SLR(1) produce smaller tables →  Implementations are readily available

  • cs415, spring 14 Lecture 14

    23

    LR(0) versus SLR(1) versus LR(1)

    LR(0) ? -- set of LR(0) items as states LR(1) ? -- set of LR(1) items as states, different states

    compared to LR(0) SLR(1) ? -- LR(0) items and canonical sets, same as LR(0) SLR(1): add FOLLOW(A) to each LR(0) item [A→γ•] as its

    second component: [A→γ•, a], ∀a ∈FOLLOW(A)

  • cs415, spring 14 Lecture 15

    24

    LR(0) versus SLR(1) versus LR(1)

    Example: LR(0) ? LR(1) ? SLR(1) ?

    S’ → S S → S ; a | a

  • cs415, spring 14 Lecture 15

    25

    LR(0) versus LR(1) versus SLR(1)

    s0 = Closure({[S’ → .S]}) = {[S’ -> .S], [S -> .S; a], [S -> .a] } s1 = Closure( GoTo (s0, S)) = {[S’ → S. ], [S → S.; a] } s3 = Closure( GoTo (s1, ;)) = {[S → S; . a]}

    s2 = Closure( GoTo (s0, a)) = {[S → a.]} s4 = Closure( GoTo (s3, a)) = {[S → S;a .] }

    Grammar is not LR(0), but LR(1) and SLR(1)

    s0 = Closure({[S’ → .S,eof]}) = {[S’ -> .S,eof], [S -> .S; a,eof], [S -> .a,;] } s1 = Closure( GoTo (s0, S)) = {[S’ → S. eof], [S → S.; a,eof] } s3 = Closure( GoTo (s1, ;)) = {[S → S; . a,eof]}

    LR(0) States

    s2 = Closure( GoTo (s0, a)) = {[S → a.,;]} s4 = Closure( GoTo (s3, a)) = {[S → S;a ., eof] }

    LR(1) States

  • cs415, spring 14 Lecture 14

    26

    LALR(1) versus LR(1)

    LALR(1) ? LR(1) items, State -> Grouped LR(1) states LALR(1): Merge two sets of LR(1) items (states), if they have the same core. core of set of LR(1) items: set of LR(0) items derived by ignoring the lookahead symbols

    FACT: collapsing LR(1) states into LALR(1) states cannot introduce shift/reduce conflicts

  • cs415, spring 14 Lecture 15

    27

    LALR(1) versus LR(1)

    s0 = Closure({[S’ → .S, eof]}) s1 = Closure( GoTo (s0, a)) = {[S → a . Ad, eof], [S → a . Be, eof], [A → .c, d], [ B → .c, e]} s2 = Closure( GoTo (s0, b)) = {[S → b . Ae, eof], [S → b . Bd, eof], [A → .c, e], [B → .c, d]} There are other states that are not listed here!

    s3 = Closure( GoTo (s1, c)) = {[A → c. , d], [B → c. , e]} s4 = Closure( GoTo (s2, c)) = {[A → c. , e], [B → c. , d]}

    Grammar is LR(1), but not LALR(1), since collapsing s3 and s4 (same core) will introduce reduce-reduce conflict.

  • cs415, spring 14 Lecture 16

    28

    Hierarchy of Context-Free Grammars

    The inclusion hierarchy for context-free grammars

    •  Operator precedence includes some ambiguous grammars

    •  LL(1) is a subset of SLR(1)

    Context-free grammars

    Unambiguous CFGs

    Operator Precedence

    Floyd-Evans Parsable

    LR(k)

    LR(1)

    LALR(1)

    SLR(1)

    LR(0)

    LL(k)

    LL(1)

    Ref Book: Michael Sipser, “Introduction to the Theory of Computation”, 3rd Edition

  • cs415, spring 14 Lecture 15

    29

    Error Recovery in Shift-Reduce Parsers

    The problem: parser encounters an invalid token Goal: Want to parse the rest of the file Basic idea (panic mode):

    →  Assume something went wrong while trying to find handle for nonterminal A

    →  Pretend handle for A has been found; pop “handle”, skip over input to find terminal that can follow A

    Restarting the parser (panic mode): →  find a restartable state on the stack (has transition for

    nonterminal A) →  move to a consistent place in the input (token that can follow A) →  perform (error) reduction (for nonterminal A) →  print an informative message

  • cs415, spring 14 Lecture 15

    30

    Error Recovery in YACC

    Yacc’s (bison’s) error mechanism (note: version dependent!) •  designated token error •  used in error productions of the form A → error α // basic case •  α specifies synchronization points When error is discovered •  pops stack until it finds state where it can shift the error token •  resumes parsing to match α special cases:

    →  α = w, where w is string of terminals: skip input until w has been read →  α = ε : skip input until state transition on input token is defined

    •  error productions can have actions

  • cs415, spring 14 Lecture 15

    31

    Error Recovery in YACC

    cmpdstmt: BEG stmt_list END stmt_list : stmt | stmt_list ‘;’ stmt | error { yyerror(“\n***Error: illegal statement\n”);} This should •  throw out the erroneous statement •  synchronize at “;” or “end” (implicit: α = ε) •  writes message “***Error: illegal statement” to stderr Example: begin a & 5 | hello ; a := 3 end ↑ ↑ resume parsing ***Error: illegal statement

  • cs415, spring 14 Lecture 16

    32

    Project #2 (see “lex & yacc”, Levine et al., O’Reilly)

    →  You do have to (slightly) change the scanner (scan.l)

    →  How to specify and use attributes in YACC?

    §  Define attributes as types in attr.h

    typedef struct info_node {int a; int b} infonode; §  Include type attribute name in %union in parse.y

    %union {tokentype token; infonode myinfo; … } §  Assign attributes in parse.y to

    –  Terminals: %token ID ICONST –  Non-terminals: %type block variables procdecls cmpdstmt

    §  Accessing attribute values in parse.y –  use $$, $1, $2 … etc. notation: block : variables procdecls {$2.b = $1.b + 1;} cmpdstmt { $$.a = $1.a + $2.a + $3.b;}

  • cs415, spring 14 Lecture 16

    33

    YACC : parse.y %{ #include #include "attr.h" int yylex(); void yyerror(char * s); #include "symtab.h" %} %union {tokentype token; } %token PROG PERIOD PROC VAR ARRAY RANGE OF %token INT REAL DOUBLE WRITELN THEN ELSE IF %token BEG END ASG NOT %token EQ NEQ LT LEQ GEQ GT OR EXOR AND DIV NOT %token ID CCONST ICONST RCONST %start program %% program : PROG ID ';' block PERIOD { }

    ; block : BEG ID ASG ICONST END { }

    ; %% void yyerror(char* s) { fprintf(stderr,"%s\n",s); } int main() { printf("1\t"); yyparse(); return 1; }

    parse.y : Will be included verbatim in parse.tab.c

    Rules with semantic actions

    Main program and “helper” functions; may contain initialization code of global structures. Will be included verbatim in parse.tab.c

    List and assign attributes

  • cs415, spring 14 Lecture 16

    34

    Project #2 : Things to do

    •  Learn/Review the C programming language

    •  Add error productions (syntax errors)

    •  Define and assign attributes to non-terminals

    •  Implement single-level symbol table

    •  Perform type checking and produce required error messages; note: actions may occur at “any” location on right-hand side (implicit use of marker productions)

  • cs415, spring 14 Lecture 15

    35

    Ad-hoc, syntax directed translation schemes,type checking

    Read EaC: Chapters 4.1- 4.4

    Next two classes