Syntax Analysis (Part 2) - HS-KARLSRUHEsuma0002/CS4212/syntax2.pdf · The Characteristic Finite...

42
Syntax Analysis (Part 2) Martin Sulzmann Martin Sulzmann Syntax Analysis (Part 2) 1 / 42

Transcript of Syntax Analysis (Part 2) - HS-KARLSRUHEsuma0002/CS4212/syntax2.pdf · The Characteristic Finite...

Page 1: Syntax Analysis (Part 2) - HS-KARLSRUHEsuma0002/CS4212/syntax2.pdf · The Characteristic Finite State Machine (CFSM) for a grammar is a DFA which recognizes viable prefixes of right-sentential

Syntax Analysis (Part 2)

Martin Sulzmann

Martin Sulzmann Syntax Analysis (Part 2) 1 / 42

Page 2: Syntax Analysis (Part 2) - HS-KARLSRUHEsuma0002/CS4212/syntax2.pdf · The Characteristic Finite State Machine (CFSM) for a grammar is a DFA which recognizes viable prefixes of right-sentential

Bottom-Up Parsing

Idea

Build right-most derivation.

Scan input and seek for matching right hand sides.

Terminology

If S →∗ β then β is a sentential form of G.

Right sentential form = right most reduction of non-terminal symbols.

Handle = Right hand side α of a production A → α where α is partof a right sentential form S ⇒∗ β.

Martin Sulzmann Syntax Analysis (Part 2) 2 / 42

Page 3: Syntax Analysis (Part 2) - HS-KARLSRUHEsuma0002/CS4212/syntax2.pdf · The Characteristic Finite State Machine (CFSM) for a grammar is a DFA which recognizes viable prefixes of right-sentential

Example

ConsiderS → aABe (1)A → Abc (2) | b (3)B → d (4)

The sentence abbcde can be reduced to S:

Rule Sentential form

3 abbcde

2 aAbcde

4 aAde

1 aABe

S

S ⇒ aABe

⇒ aAde

⇒ aAbcde

⇒ abbcde

right-most derivation

Marked forms are the handles.

Martin Sulzmann Syntax Analysis (Part 2) 3 / 42

Page 4: Syntax Analysis (Part 2) - HS-KARLSRUHEsuma0002/CS4212/syntax2.pdf · The Characteristic Finite State Machine (CFSM) for a grammar is a DFA which recognizes viable prefixes of right-sentential

Another Example

ConsiderE → E + E | E ∗ E | (E ) | id

Right-most derivation:E ⇒ E + E ⇒ E + E ∗ E ⇒ E + E ∗ id3 ⇒ E + id2 ∗ id3 ⇒ id1 + id2 ∗ id3Consider how the input string is “reduced”:

right-sentential form Handle reducing production

id1 + id2 ∗ id3 id1 E → id

E + id2 ∗ id3 id2 E → id

E + E ∗ id3 id3 E → id

E + E ∗ E E ∗ E E → E ∗ EE + E E + E E → E + E

E

Martin Sulzmann Syntax Analysis (Part 2) 4 / 42

Page 5: Syntax Analysis (Part 2) - HS-KARLSRUHEsuma0002/CS4212/syntax2.pdf · The Characteristic Finite State Machine (CFSM) for a grammar is a DFA which recognizes viable prefixes of right-sentential

Handle Pruning (=Shift/Reduce Parsing)

Goal

Scan input, recognize handles and replace (if match found) by left-handside until we reach the start symbol.

Shift-Reduce Approach

Make use of stack to keep track of “partial” handles.

Shift:

Seek for handle.“Shift” input symbols on some stack.

Reduce:

Match for handle detected.Replace right hand by left hand side.For A → β:

“pop” |β| symbols

“pus” A onto stack

Martin Sulzmann Syntax Analysis (Part 2) 5 / 42

Page 6: Syntax Analysis (Part 2) - HS-KARLSRUHEsuma0002/CS4212/syntax2.pdf · The Characteristic Finite State Machine (CFSM) for a grammar is a DFA which recognizes viable prefixes of right-sentential

Example

ConsiderE → E + E | E ∗ E | (E ) | id

Stack Input Action

$ (id ∗ id)$ shift$( id ∗ id)$ shift$(id ∗id)$ reduce E → id

$(E ∗id)$ shift$(E∗ id)$ shift$(E ∗ id )$ reduce E → id

$(E ∗ E )$ reduce E → E ∗ E$(E )$ shift$(E ) $ reduce E → (E )$E $ accept

E ⇒ (E ) ⇒ (E ∗ E ) ⇒ (E ∗ id) ⇒ (id ∗ id)Martin Sulzmann Syntax Analysis (Part 2) 6 / 42

Page 7: Syntax Analysis (Part 2) - HS-KARLSRUHEsuma0002/CS4212/syntax2.pdf · The Characteristic Finite State Machine (CFSM) for a grammar is a DFA which recognizes viable prefixes of right-sentential

LR Parsing

Approach

Instance of shift-reduce parser where we employ FSA to recognize handles.

Method

Build a right-most derivation in reverse w ⇐∗ S .

Maintain stack of viable prefixes, i.e. forms α such that S ⇒∗ αw forsome w ∈ V ∗

t .

Shift action: Move token from input w to stack.

Reduce action: Apply grammar rule (in reverse) to top section ofstack.

Martin Sulzmann Syntax Analysis (Part 2) 7 / 42

Page 8: Syntax Analysis (Part 2) - HS-KARLSRUHEsuma0002/CS4212/syntax2.pdf · The Characteristic Finite State Machine (CFSM) for a grammar is a DFA which recognizes viable prefixes of right-sentential

Skeleton LR Parser

Algorithm

push s0

token:= next_token()

repeat

s:= top of stack

if action[s,token] = ‘‘shift si’’ then

push si

token:= next_token()

else if action[s,token] = ‘‘reduce A->beta’’ then

pop |beta| states

s’:= top of stack

push goto[s’,A]

else if action[s,token] = ‘‘accept’’ then done

else error

Martin Sulzmann Syntax Analysis (Part 2) 8 / 42

Page 9: Syntax Analysis (Part 2) - HS-KARLSRUHEsuma0002/CS4212/syntax2.pdf · The Characteristic Finite State Machine (CFSM) for a grammar is a DFA which recognizes viable prefixes of right-sentential

Example

S → E

E → T + E | TT → F ∗ T | FF → id

state action goto

id + ∗ $ E T F

0 s4 1 2 31 acc

2 s5 r33 r5 s6 r54 r6 r6 r65 s4 7 2 36 s4 8 37 r28 r4 r4

Stack Input Action$ 0 id ∗ id + id$ s4$ 0 4 ∗id + id$ r6$ 0 3 ∗id + id$ s6$ 0 3 6 id + id$ s4$ 0 3 6 4 +id$ r6$ 0 3 6 3 +id$ r5$ 0 3 6 8 +id$ r4$ 0 2 +id$ s5$ 0 2 5 id$ s4$ 0 2 5 4 $ r6$ 0 2 5 3 $ r5$ 0 2 5 2 $ r3$ 0 2 5 7 $ r2$ 0 1 $ acc

Martin Sulzmann Syntax Analysis (Part 2) 9 / 42

Page 10: Syntax Analysis (Part 2) - HS-KARLSRUHEsuma0002/CS4212/syntax2.pdf · The Characteristic Finite State Machine (CFSM) for a grammar is a DFA which recognizes viable prefixes of right-sentential

LR(k) Grammars

Idea

A grammar G is LR(k) if given a right-most derivation

S = γ0 ⇒R γ1 ⇒R . . . ⇒R γn = w

we can, for each right-sentential form in the derivation:

1 isolate the handle of each right-sentential form, and

2 determine the production by which to reduce

by scanning γi from left to right, going at most k symbols beyond theright end of the handle of γi .

Martin Sulzmann Syntax Analysis (Part 2) 10 / 42

Page 11: Syntax Analysis (Part 2) - HS-KARLSRUHEsuma0002/CS4212/syntax2.pdf · The Characteristic Finite State Machine (CFSM) for a grammar is a DFA which recognizes viable prefixes of right-sentential

LR(k) Items

LR(k) Items

The table construction algorithm uses sets of LR(k) items orconfigurations to represent the possible states in a parse.

A LR(k) item is a pair [α, β] where

α is a production from G with a • at some position in the rhs,marking how much of the rhs of a production has already been seen

β is a lookahead string containing k symbols (terminals or $)

As we will see, the two cases of interest are k=0 and k=1.

Martin Sulzmann Syntax Analysis (Part 2) 11 / 42

Page 12: Syntax Analysis (Part 2) - HS-KARLSRUHEsuma0002/CS4212/syntax2.pdf · The Characteristic Finite State Machine (CFSM) for a grammar is a DFA which recognizes viable prefixes of right-sentential

Example

The • indicates how much of an item we have seen at a given state in theparse:

[A → •XYZ ] indicates that the parser is looking for a string that can bederived from XYZ .

[A → XY • Z ] indicates that the parser has seen a string derived from XY

and is looking for one derivable from Z .

LR(0) items (no lookahead):

A → XYZ generates 4 LR(0) items:

1. [A → •XYZ ] 2. [A → X • YZ ]3. [A → XY • Z ] 4. [A → XYZ•]

Martin Sulzmann Syntax Analysis (Part 2) 12 / 42

Page 13: Syntax Analysis (Part 2) - HS-KARLSRUHEsuma0002/CS4212/syntax2.pdf · The Characteristic Finite State Machine (CFSM) for a grammar is a DFA which recognizes viable prefixes of right-sentential

The CFSM

The Characteristic Finite State Machine (CFSM) for a grammar is a DFAwhich recognizes viable prefixes of right-sentential forms (Recall: A viableprefix is any prefix that does not extend beyond the handle).

It accepts when a handle has been discovered and needs to be reduced.

To construct the CFSM we need two functions:

closure0(I) to build its states

goto0(I,X) to determine its transitions

Martin Sulzmann Syntax Analysis (Part 2) 13 / 42

Page 14: Syntax Analysis (Part 2) - HS-KARLSRUHEsuma0002/CS4212/syntax2.pdf · The Characteristic Finite State Machine (CFSM) for a grammar is a DFA which recognizes viable prefixes of right-sentential

closure0

Given an item [A → α • Bβ], its closure contains the item itself and anyother item that can generate legal substrings to follow α.

Thus, if the parser has viable prefix α on the stack, the input shouldreduce to Bβ (or γ for other item [B → •γ] in the closure).

closure0(I):

repeat

if [A → α • Bβ] ∈ I and B → γ is a rule

then add [B → •γ] to I

until no more items can be added to I

return I

Martin Sulzmann Syntax Analysis (Part 2) 14 / 42

Page 15: Syntax Analysis (Part 2) - HS-KARLSRUHEsuma0002/CS4212/syntax2.pdf · The Characteristic Finite State Machine (CFSM) for a grammar is a DFA which recognizes viable prefixes of right-sentential

goto0

goto0(I ,X ) is the closure of the set of all items [A → αX • β] such that[A → α • Xβ] ∈ I .

If I is the set of valid items fro some viable prefix γ, then goto0(I ,X ) isthe set of valid items for the viable prefix γX .

goto0(I ,X ) represents the state after recognizing X in state I .

goto0(I,X):

let J be the set of items [A → αX • β]such that [A → α • Xβ] ∈ I

return closure0(J)

Martin Sulzmann Syntax Analysis (Part 2) 15 / 42

Page 16: Syntax Analysis (Part 2) - HS-KARLSRUHEsuma0002/CS4212/syntax2.pdf · The Characteristic Finite State Machine (CFSM) for a grammar is a DFA which recognizes viable prefixes of right-sentential

Example

ConsiderP → CP | ǫC → agBe | aeB → acB | a

Assume I = {[C → ag • Be]}. Then (I omit [ and ] for simplicity)

closure0(I ) =

C → ag • Be,B → •acB,B → •a

Assume I = {[P → •], [P → •CP]}. Then

goto0(I ,C ) =

P → C • P ,P → •CP ,P → •,C → •agBe,C → •ae

Martin Sulzmann Syntax Analysis (Part 2) 16 / 42

Page 17: Syntax Analysis (Part 2) - HS-KARLSRUHEsuma0002/CS4212/syntax2.pdf · The Characteristic Finite State Machine (CFSM) for a grammar is a DFA which recognizes viable prefixes of right-sentential

Constructing the LR(0) Item Sets

We introduce a new start symbol S ′ where S ′ → S (some presentationsassume S ′ → S$.)

States = {closure0({S ′ → •S})}while we can keep making changes

for each I ∈ States and symbol X

I’ = goto0(I,X)

if I 6∈ States then States = States ∪ {I ′}

Martin Sulzmann Syntax Analysis (Part 2) 17 / 42

Page 18: Syntax Analysis (Part 2) - HS-KARLSRUHEsuma0002/CS4212/syntax2.pdf · The Characteristic Finite State Machine (CFSM) for a grammar is a DFA which recognizes viable prefixes of right-sentential

LR(0) Example

Grammar:S → E (1)E → E + T (2) | T (3)T → id (4) | (E ) (5)

CFSM:

GFED@ABC?>=<89:;8

//GFED@ABC0

E

��

T

11

(

""id //GFED@ABC?>=<89:;4 GFED@ABC5

E

��

T

OO

id

oo (

ss

GFED@ABC?>=<89:;1+ //GFED@ABC2

id

OO

T

��

(

??��������� GFED@ABC6

+oo

)

��GFED@ABC?>=<89:;3 GFED@ABC?>=<89:;7

LR(0) items:I0 : S → •E I4 : T → id•

E → •E + T I5 : T → (•E )E → •T E → •E + T

T → •id E → •TT → •(E ) T → •id

I1 : S → E• T → •(E )E → E • +T I6 : T → (E•)

I2 : E → E + •T E → E • +T

T → •id I7 : T → (E )•T → •(E ) I8 : E → T•

I3 : E → E + T•

Martin Sulzmann Syntax Analysis (Part 2) 18 / 42

Page 19: Syntax Analysis (Part 2) - HS-KARLSRUHEsuma0002/CS4212/syntax2.pdf · The Characteristic Finite State Machine (CFSM) for a grammar is a DFA which recognizes viable prefixes of right-sentential

Constructing the Parsing Table

We have constructed LR(0) items and CFSM (state i of the CFSM isconstructed from Ii ).

If A → α • aβ ∈ Ii and goto0(Ii ,a)=Ij then action(i ,a)=shift j .

If A → α• ∈ I then action(I ,a) = reduce(A → α) for all a.

If S ′ → S• ∈ I then action(I ,$) = accept.

If goto0(Ii ,A) = Ij (where A ∈ Vn) then goto(i ,A)=j .

Set undefined entries in action and goto to “error”.

Set initial state of parser to closure0([S ′ → •S ]).

Martin Sulzmann Syntax Analysis (Part 2) 19 / 42

Page 20: Syntax Analysis (Part 2) - HS-KARLSRUHEsuma0002/CS4212/syntax2.pdf · The Characteristic Finite State Machine (CFSM) for a grammar is a DFA which recognizes viable prefixes of right-sentential

LR(0) Example

GFED@ABC?>=<89:;8

//GFED@ABC0

E

��

T

11

(

""id //GFED@ABC?>=<89:;4 GFED@ABC5

E

��

T

OO

id

oo (

ss

GFED@ABC?>=<89:;1+ //GFED@ABC2

id

OO

T

��

(

??��������� GFED@ABC6

+oo

)

��GFED@ABC?>=<89:;3 GFED@ABC?>=<89:;7

state action goto

id ( ) + $ S E T

0 s4 s5 1 81 acc

2 s4 s5 33 r2 r2 r2 r2 r24 r4 r4 r4 r4 r45 s4 s5 6 86 s7 s27 r5 r5 r5 r5 r58 r3 r3 r3 r3 r3

Martin Sulzmann Syntax Analysis (Part 2) 20 / 42

Page 21: Syntax Analysis (Part 2) - HS-KARLSRUHEsuma0002/CS4212/syntax2.pdf · The Characteristic Finite State Machine (CFSM) for a grammar is a DFA which recognizes viable prefixes of right-sentential

Another Example

ConsiderS → A | Bc | DA → a | bB → a

D → bc

Is this grammar LR(0)?

Martin Sulzmann Syntax Analysis (Part 2) 21 / 42

Page 22: Syntax Analysis (Part 2) - HS-KARLSRUHEsuma0002/CS4212/syntax2.pdf · The Characteristic Finite State Machine (CFSM) for a grammar is a DFA which recognizes viable prefixes of right-sentential

Another Example (2)

Grammar:S → A | Bc | DA → a | bB → a

D → bc

LR(0) items:I0 : S ′ → •S I3 : S → B • c

S → •A I4 : S → Bc•S → •Bc I5 : S → D•S → •D I6 : A → a•A → •a B → a•A → •b I7 : A → b•B → •a D → b • cD → •bc . . .

I1 : S ′ → S•I2 : S → A•

Martin Sulzmann Syntax Analysis (Part 2) 22 / 42

Page 23: Syntax Analysis (Part 2) - HS-KARLSRUHEsuma0002/CS4212/syntax2.pdf · The Characteristic Finite State Machine (CFSM) for a grammar is a DFA which recognizes viable prefixes of right-sentential

Another Example (3)

Grammar:S → A | Bc | DA → a | bB → a

D → bc

LR(0) items:I0 : S ′ → •S I3 : S → B • c

S → •A I4 : S → Bc•S → •Bc I5 : S → D•S → •D I6 : A → a• reduceA → •a B → a• reduce conflict!A → •b I7 : A → b•B → •a D → b • cD → •bc . . .

I1 : S ′ → S•I2 : S → A•

Martin Sulzmann Syntax Analysis (Part 2) 23 / 42

Page 24: Syntax Analysis (Part 2) - HS-KARLSRUHEsuma0002/CS4212/syntax2.pdf · The Characteristic Finite State Machine (CFSM) for a grammar is a DFA which recognizes viable prefixes of right-sentential

Another Example (4)

Grammar:S → A | Bc | DA → a | bB → a

D → bc

LR(0) items:I0 : S ′ → •S I3 : S → B • c

S → •A I4 : S → Bc•S → •Bc I5 : S → D•S → •D I6 : A → a• reduceA → •a B → a• reduce conflict!A → •b I7 : A → b• reduceB → •a D → b • c shift conflict!D → •bc . . .

I1 : S ′ → S•I2 : S → A•

Above grammar is not LR(0).

Martin Sulzmann Syntax Analysis (Part 2) 24 / 42

Page 25: Syntax Analysis (Part 2) - HS-KARLSRUHEsuma0002/CS4212/syntax2.pdf · The Characteristic Finite State Machine (CFSM) for a grammar is a DFA which recognizes viable prefixes of right-sentential

LR Parsing

Short Summary

If LR(0) parsing tables contains multiply-defined action entries then G isnot LR(0).

There are two possible types of conflicts:

shift-reduce

reduce-reduce

Conflicts can be resolved by looking ahead.

Martin Sulzmann Syntax Analysis (Part 2) 25 / 42

Page 26: Syntax Analysis (Part 2) - HS-KARLSRUHEsuma0002/CS4212/syntax2.pdf · The Characteristic Finite State Machine (CFSM) for a grammar is a DFA which recognizes viable prefixes of right-sentential

SLR(1): Simple Lookahead LR

SLR(1)

Add lookaheads after building LR(0) item sets. We construct the SLR(1)parsing table as follows:

If A → α • aβ ∈ I (a ∈ Vt) and goto0(I ,a)=Ii thenaction(I ,a)=shift i .

If A → α• ∈ I then action(I ,a) = reduce(A → α) for alla ∈ Follow(A).

If S ′ → S• ∈ I then action(I ,$) = accept.

If goto0(Ii ,A) = Ij (where A ∈ Vn) then goto(i ,A)=j .

Set undefined entries in action and goto to “error”.

Set initial state of parser to closure0([S ′ → •S ]).

Martin Sulzmann Syntax Analysis (Part 2) 26 / 42

Page 27: Syntax Analysis (Part 2) - HS-KARLSRUHEsuma0002/CS4212/syntax2.pdf · The Characteristic Finite State Machine (CFSM) for a grammar is a DFA which recognizes viable prefixes of right-sentential

SLR(1) but not LR(0) Example

S → A (1) | Bc (2) | bc (3) Follow(S) = {$}A → a (4) | b (5) Follow(A) = {$}B → a (6) Follow(B) = {c}

I0 : S ′ → •S I2 S → A•S → •A I3 S → B • cS → •Bc I4 S → Bc•S → •bc I5 A → a•A → •a B → a•A → •b I6 A → b•B → •a S → b • c

I1 : S ′ → S• I7 S → Bc•

GFED@ABC?>=<89:;1 GFED@ABC?>=<89:;2

// GFED@ABC?>=<89:;6

c

��

GFED@ABC0b

oo

S

OOA

@@���������

B //

a

��

GFED@ABC3

c

��GFED@ABC?>=<89:;7 GFED@ABC?>=<89:;5 GFED@ABC?>=<89:;4

Martin Sulzmann Syntax Analysis (Part 2) 27 / 42

Page 28: Syntax Analysis (Part 2) - HS-KARLSRUHEsuma0002/CS4212/syntax2.pdf · The Characteristic Finite State Machine (CFSM) for a grammar is a DFA which recognizes viable prefixes of right-sentential

SLR(1) Parsing Table

S → A (1) | Bc (2) | bc (3) Follow(S) = {$}A → a (4) | b (5) Follow(A) = {$}B → a (6) Follow(B) = {c}

state action goto

a b c $ S A B

0 s5 s6 1 2 31 acc

2 r13 s44 r25 r6 r46 s77 r2

Martin Sulzmann Syntax Analysis (Part 2) 28 / 42

Page 29: Syntax Analysis (Part 2) - HS-KARLSRUHEsuma0002/CS4212/syntax2.pdf · The Characteristic Finite State Machine (CFSM) for a grammar is a DFA which recognizes viable prefixes of right-sentential

Limitations of SLR(1)

ConsiderS ′ → S

S → L = R | RL → ∗R | idR → L

LR(0) items:I0 : S ′ → •S I1 : S ′ → S•

S → •L = R I2 : S → L• = R shift!S → •R R → L• reduce! (=∈ Follow(R))L → • ∗ RL → •idR → •id

Martin Sulzmann Syntax Analysis (Part 2) 29 / 42

Page 30: Syntax Analysis (Part 2) - HS-KARLSRUHEsuma0002/CS4212/syntax2.pdf · The Characteristic Finite State Machine (CFSM) for a grammar is a DFA which recognizes viable prefixes of right-sentential

Limitations of SLR(1) (cont’d)

Examples shows that it can be insufficient to consider Follow sets.

Assume we are parsing id = id .

In item 2, the shift action is the only correct choice (to reduce L to R getsus nowhere).

Note that item R → L• came via S ′ ⇒ S ⇒ R ⇒ L (each symbol can onlybe “followed” by $).

Hence, we need to make an effort to remember the actual right context.

Martin Sulzmann Syntax Analysis (Part 2) 30 / 42

Page 31: Syntax Analysis (Part 2) - HS-KARLSRUHEsuma0002/CS4212/syntax2.pdf · The Characteristic Finite State Machine (CFSM) for a grammar is a DFA which recognizes viable prefixes of right-sentential

LR(1) Items Revisited

Idea

We can incorporate the extra right-context information into items.

An LR(1) item is of the form [A → α • β, t] where t ∈ Vt ∪ {$}.

An item [A → α•, a] calls for the reduction A → α only if the next inputsymbol is a.

Martin Sulzmann Syntax Analysis (Part 2) 31 / 42

Page 32: Syntax Analysis (Part 2) - HS-KARLSRUHEsuma0002/CS4212/syntax2.pdf · The Characteristic Finite State Machine (CFSM) for a grammar is a DFA which recognizes viable prefixes of right-sentential

LR(1) Items Revisited (cont’d)

closure1 with Context

closure1(I):

while new items can be added

if [A → α • Bγ, c] ∈ I and B → δ is a rule

then for each b ∈ First(γc)add [B → •δ, b] to I

goto1 with Context

goto1(I,X):

let J be the set of items [A → αX • β, a]such that [A → α • Xβ, a] ∈ I

return closure0(J)

Initial State

[S ′ → •S , $]Martin Sulzmann Syntax Analysis (Part 2) 32 / 42

Page 33: Syntax Analysis (Part 2) - HS-KARLSRUHEsuma0002/CS4212/syntax2.pdf · The Characteristic Finite State Machine (CFSM) for a grammar is a DFA which recognizes viable prefixes of right-sentential

LR(1) Parsing Table

Algorithm

For each set Ii of LR(1) items:

If [A → α • aβ] ∈ Ii and goto1(Ii ,a)=Ij then action(i ,a)=shift j .

If [A → α•, b] ∈ Ii (A 6= S ′) then action(i ,b)= reduce(A → α).

If [S ′ → S•, $] ∈ Ii then action(i ,$)= accept.

If goto1(Ii ,A) = Ij (where A ∈ Vn) then goto(i ,A)=j .

All remaining entries are set to “error”.

Initial state contains [S ′ → •S , $].

Martin Sulzmann Syntax Analysis (Part 2) 33 / 42

Page 34: Syntax Analysis (Part 2) - HS-KARLSRUHEsuma0002/CS4212/syntax2.pdf · The Characteristic Finite State Machine (CFSM) for a grammar is a DFA which recognizes viable prefixes of right-sentential

Recall Example

ConsiderS ′ → S

S → L = R | RL → ∗R | idR → L

LR(1) items:I0 : [S ′ → •S , $] I1 : [S ′ → S•, $]

[S → •L = R , $] I2 : [S → L• = R , $] no =[S → •R , $] [R → L•, $]

[L → • ∗ R ,=] = due to First(= R)

[L → •id ,=][R → •L, $]

Martin Sulzmann Syntax Analysis (Part 2) 34 / 42

Page 35: Syntax Analysis (Part 2) - HS-KARLSRUHEsuma0002/CS4212/syntax2.pdf · The Characteristic Finite State Machine (CFSM) for a grammar is a DFA which recognizes viable prefixes of right-sentential

Another Example

Grammar:S′ → S (0)S → CC (1)C → cC (2) | d (3)

state action goto

c d $ S C

0 s3 s4 1 21 acc

2 s6 s7 53 s3 s4 84 r3 r35 r16 s6 s7 97 r38 r2 r29 r2

LR(1) items:I0 : S′ → •S, $ I4 : C → d•, c | d

S → •CC , $ I5 : S → CC•, $C → •cC , c | d I6 : C → c • C , $C → •d, c | d C → •cC , $

I1 : S′ → S•, $ C → •d, $I2 : S → C • C , $ I7 : C → d•, $

C → •cC , $ I8 : C → cC•, c | dC → •d, $ I9 : C → cC•, $

I3 : C → c • C , c | dC → •cC , c | dC → •d, c | d

Martin Sulzmann Syntax Analysis (Part 2) 35 / 42

Page 36: Syntax Analysis (Part 2) - HS-KARLSRUHEsuma0002/CS4212/syntax2.pdf · The Characteristic Finite State Machine (CFSM) for a grammar is a DFA which recognizes viable prefixes of right-sentential

Another Example (2)

Grammar:S′ → S (0)S → CC (1)C → cC (2) | d (3)

state action goto

c d $ S C

0 s3 s4 1 21 acc

2 s6 s7 53 s3 s4 84 r3 r35 r16 s6 s7 97 r38 r2 r29 r2

LR(1) items:I0 : S′ → •S, $ I4 : C → d•, c | d

S → •CC , $ I5 : S → CC•, $C → •cC , c | d I6 : C → c • C , $C → •d, c | d C → •cC , $

I1 : S′ → S•, $ C → •d, $I2 : S → C • C , $ I7 : C → d•, $

C → •cC , $ I8 : C → cC•, c | dC → •d, $ I9 : C → cC•, $

I3 : C → c • C , c | dC → •cC , c | dC → •d, c | d

I3 and I6 have the same “core”!

Martin Sulzmann Syntax Analysis (Part 2) 36 / 42

Page 37: Syntax Analysis (Part 2) - HS-KARLSRUHEsuma0002/CS4212/syntax2.pdf · The Characteristic Finite State Machine (CFSM) for a grammar is a DFA which recognizes viable prefixes of right-sentential

LALR: Lookahead LR

Idea

Define the core of a set of LR(1) items to be the set of LR(0) itemsderived by ignoring the lookahead symbols.

E.g. the two sets

{[C → c • C , c | d ], [C → •cC , c | d ], [C → •d , c | d ]}

{[C → c • C , $], [C → •cC , $], [C → •d , $]}

have the same core.

Key idea: If two sets of LR(1) items, Ii and Ij , have the same core, we canmerge the states that represent them in the action and goto tables.

Martin Sulzmann Syntax Analysis (Part 2) 37 / 42

Page 38: Syntax Analysis (Part 2) - HS-KARLSRUHEsuma0002/CS4212/syntax2.pdf · The Characteristic Finite State Machine (CFSM) for a grammar is a DFA which recognizes viable prefixes of right-sentential

LALR(1) algorithm

Algorithm

Construct LR(1) items.

Find all cores among sets of LR(1) items, replace these sets by theirunion, update goto function (incrementally).

For each IiIf [A → α • aβ, b] ∈ Ii and goto1(Ii ,a)=Ij then action(i ,a)=shift j .If [A → α•, b] ∈ Ii (A 6= S ′) then action(i ,b)= reduce(A → α).If [S ′ → S•, $] ∈ Ii then action(i ,$)= accept.

If goto1(Ii ,A) = Ij (where A ∈ Vn) then goto(i ,A)=j .

All remaining entries are set to “error”.Initial state = closure1({[S ′ → •S , $]}).More efficient LALR construction possible, see textbook.

Martin Sulzmann Syntax Analysis (Part 2) 38 / 42

Page 39: Syntax Analysis (Part 2) - HS-KARLSRUHEsuma0002/CS4212/syntax2.pdf · The Characteristic Finite State Machine (CFSM) for a grammar is a DFA which recognizes viable prefixes of right-sentential

Recall Example

Grammar:S′ → S (0)S → CC (1)C → cC (2) | d (3)

state action goto

c d $ S C

0 s3 s4 1 21 acc

2 s6 s7 53 s3 s4 84 r3 r35 r16 s6 s7 97 r38 r2 r29 r2

LR(1) items:I0 : S′ → •S, $ I4 : C → d•, c | d

S → •CC , $ I5 : S → CC•, $C → •cC , c | d I6 : C → c • C , $C → •d, c | d C → •cC , $

I1 : S′ → S•, $ C → •d, $I2 : S → C • C , $ I7 : C → d•, $

C → •cC , $ I8 : C → cC•, c | dC → •d, $ I9 : C → cC•, $

I3 : C → c • C , c | dC → •cC , c | dC → •d, c | d

Martin Sulzmann Syntax Analysis (Part 2) 39 / 42

Page 40: Syntax Analysis (Part 2) - HS-KARLSRUHEsuma0002/CS4212/syntax2.pdf · The Characteristic Finite State Machine (CFSM) for a grammar is a DFA which recognizes viable prefixes of right-sentential

Recall Example (2)

Grammar:S′ → S (0)S → CC (1)C → cC (2) | d (3)

state action goto

c d $ S C

0 s3 s4 1 21 acc

2 s6 s7 53 s3 s4 84 r3 r35 r16 s6 s7 97 r38 r2 r29 r2

LR(1) items:I0 : S′ → •S, $ I4 : C → d•, c | d

S → •CC , $ I5 : S → CC•, $C → •cC , c | d I6 : C → c • C , $C → •d, c | d C → •cC , $

I1 : S′ → S•, $ C → •d, $I2 : S → C • C , $ I7 : C → d•, $

C → •cC , $ I8 : C → cC•, c | dC → •d, $ I9 : C → cC•, $

I3 : C → c • C , c | dC → •cC , c | dC → •d, c | d

Merged states:I36 : C → c • C , c | d | $

C → •cC , c | d | $C → •d, c | d | $

I47 : C → d•, c | d | $I89 : C → cC•, c | d | $

Martin Sulzmann Syntax Analysis (Part 2) 40 / 42

Page 41: Syntax Analysis (Part 2) - HS-KARLSRUHEsuma0002/CS4212/syntax2.pdf · The Characteristic Finite State Machine (CFSM) for a grammar is a DFA which recognizes viable prefixes of right-sentential

Recall Example (3)

Grammar:S′ → S (0)S → CC (1)C → cC (2) | d (3)

Updated table:state action goto

c d $ S C

0 s36 s47 1 21 acc

2 s36 s47 536 s36 s47 8947 r3 r3 r35 r189 r2 r2 r2

LR(1) items:I0 : S′ → •S, $ I4 : C → d•, c | d

S → •CC , $ I5 : S → CC•, $C → •cC , c | d I6 : C → c • C , $C → •d, c | d C → •cC , $

I1 : S′ → S•, $ C → •d, $I2 : S → C • C , $ I7 : C → d•, $

C → •cC , $ I8 : C → cC•, c | dC → •d, $ I9 : C → cC•, $

I3 : C → c • C , c | dC → •cC , c | dC → •d, c | d

Merged states:I36 : C → c • C , c | d | $

C → •cC , c | d | $C → •d, c | d | $

I47 : C → d•, c | d | $I89 : C → cC•, c | d | $

Martin Sulzmann Syntax Analysis (Part 2) 41 / 42

Page 42: Syntax Analysis (Part 2) - HS-KARLSRUHEsuma0002/CS4212/syntax2.pdf · The Characteristic Finite State Machine (CFSM) for a grammar is a DFA which recognizes viable prefixes of right-sentential

Summary

LR more powerful than LL.

ANTLR is a “pretty cool” Java-based LL parser.

LALR good enough in practice (see yacc).

Precedence can also be used to resolve shift/reduce conflicts.E.g. higher precedence ⇒ shift, left associative ⇒ reduce.

Error recovery important (but no time!), please consult text book.

Martin Sulzmann Syntax Analysis (Part 2) 42 / 42