LL and LR Parsingkjleach.eecs.umich.edu/c18/l6.pdfparsing predicts which production rule to use from...

76
LL and LR Parsing Lecture 6 February 5, 2018

Transcript of LL and LR Parsingkjleach.eecs.umich.edu/c18/l6.pdfparsing predicts which production rule to use from...

Page 1: LL and LR Parsingkjleach.eecs.umich.edu/c18/l6.pdfparsing predicts which production rule to use from k tokens of lookahead. LL(1) parsing is a special case using one token of lookahead.

LL and LR ParsingLecture 6

February 5, 2018

Page 2: LL and LR Parsingkjleach.eecs.umich.edu/c18/l6.pdfparsing predicts which production rule to use from k tokens of lookahead. LL(1) parsing is a special case using one token of lookahead.

Context-free Grammars

A context-free grammar consists of

É A set of non-terminals NÉ Written in uppercase throughout these notes

É A set of terminals T comprised of tokensÉ Lowercase or punctuation throughout these notes

É A start symbol S (a non-terminal)É A set of productions (rewrite rules)

Assuming E ∈NE→ ε orE→ Y1Y2...Yn where Yi ∈N ∪T

Compiler Construction 2/49

Page 3: LL and LR Parsingkjleach.eecs.umich.edu/c18/l6.pdfparsing predicts which production rule to use from k tokens of lookahead. LL(1) parsing is a special case using one token of lookahead.

Context-free ?Production rules hint at expressiveness!

Regular A→ aB,C→ εContext-free A→ αContext-sensitive αAβ→ αγβType-0 α→β

α,β,γ ∈ {N ∪T }∗

“What just happened? We must be missing some context...”

Compiler Construction 3/49

Page 4: LL and LR Parsingkjleach.eecs.umich.edu/c18/l6.pdfparsing predicts which production rule to use from k tokens of lookahead. LL(1) parsing is a special case using one token of lookahead.

Parsing and Context-free Grammars

É Lexical AnalysisÉ Regular Expressions specify a Regular Language

containing strings of characters (lexeme) thatcorrespond to a token

É ParsingÉ Context-free Grammars specify a Context-free

Language containing strings of tokens thatcorrespond to a grammatical rule (production)

Compiler Construction 4/49

Page 5: LL and LR Parsingkjleach.eecs.umich.edu/c18/l6.pdfparsing predicts which production rule to use from k tokens of lookahead. LL(1) parsing is a special case using one token of lookahead.

Generativeness

É Regular expressions and context-free grammarsare generativeÉ You can generate every string in the language

using the regex or grammar!

Compiler Construction 5/49

Page 6: LL and LR Parsingkjleach.eecs.umich.edu/c18/l6.pdfparsing predicts which production rule to use from k tokens of lookahead. LL(1) parsing is a special case using one token of lookahead.

Generating Strings

É Consider regex: ab*aÉ You can generate aa, aba, abba, abbba, ...

É Consider context-free grammar:

E → ( E ) E| ε

É You can generate ε, (), (()), (())(), ...

É Generating strings with a grammar can bethought of as creating a parse tree!

Compiler Construction 6/49

Page 7: LL and LR Parsingkjleach.eecs.umich.edu/c18/l6.pdfparsing predicts which production rule to use from k tokens of lookahead. LL(1) parsing is a special case using one token of lookahead.

Language membership

É We care about whether an input string oftokens is syntactically correct (e.g., obeys ourlanguage’s grammar)

É So far, we have looked at theoreticalimplications of grammars

L(G) = {a1...an|S→∗ a1...an}For an input string x, is x ∈ L(G)?

Parsing part 1: We need a yes/no answer!

Compiler Construction 7/49

Page 8: LL and LR Parsingkjleach.eecs.umich.edu/c18/l6.pdfparsing predicts which production rule to use from k tokens of lookahead. LL(1) parsing is a special case using one token of lookahead.

Language membership

S → a B| b C

B → b b CC → c c

What strings are in this language? (Hint: there’sonly two!)If my input string is “dabc”, we ask: can thegrammar generate this string? (No)

É N.B. it doesn’t matter how from a theoreticalperspective, that’s the job of the parsingalgorithm!

Compiler Construction 8/49

Page 9: LL and LR Parsingkjleach.eecs.umich.edu/c18/l6.pdfparsing predicts which production rule to use from k tokens of lookahead. LL(1) parsing is a special case using one token of lookahead.

Parsing Algorithms

É LL (top down)É Reads input from left to right and uses left-most

derivations to construct a parse tree

É LR (bottom up)É Reads input from left to right and uses right-most

derivations to construct a parse tree

É Both algorithms are driven by the inputgrammar and the input to be parsed.

Compiler Construction 9/49

Page 10: LL and LR Parsingkjleach.eecs.umich.edu/c18/l6.pdfparsing predicts which production rule to use from k tokens of lookahead. LL(1) parsing is a special case using one token of lookahead.

Parsing Algorithm Intuition

É You start with a sequence of tokens, t1t2t3t4t5É and also a grammar!

É Two general approaches to constructing theparse treeÉ top-down parsing is when you predict the

grammatical rule used to produce the tokens seenso far

É bottom-up parsing is when you consider tokensone at a time until you match a grammatical rule

Compiler Construction 10/49

Page 11: LL and LR Parsingkjleach.eecs.umich.edu/c18/l6.pdfparsing predicts which production rule to use from k tokens of lookahead. LL(1) parsing is a special case using one token of lookahead.

Top Down Parsing

S → a B cB → C x BB → εC → d

| a B c

Input string:“adxdxc”

S

a d x d x c

Compiler Construction 11/49

Page 12: LL and LR Parsingkjleach.eecs.umich.edu/c18/l6.pdfparsing predicts which production rule to use from k tokens of lookahead. LL(1) parsing is a special case using one token of lookahead.

Top Down Parsing

S → a B cB → C x BB → εC → d

| a B c

Input string:“adxdxc”

S

BB

aa d x d x cc

Compiler Construction 11/49

Page 13: LL and LR Parsingkjleach.eecs.umich.edu/c18/l6.pdfparsing predicts which production rule to use from k tokens of lookahead. LL(1) parsing is a special case using one token of lookahead.

Top Down Parsing

S → a B cB → C x BB → εC → d

| a B c

Input string:“adxdxc”

S

B

CC

BB

a d xx d x c

Compiler Construction 11/49

Page 14: LL and LR Parsingkjleach.eecs.umich.edu/c18/l6.pdfparsing predicts which production rule to use from k tokens of lookahead. LL(1) parsing is a special case using one token of lookahead.

Top Down Parsing

S → a B cB → C x BB → εC → d

| a B c

Input string:“adxdxc”

S

B

C

B

a dd x d x c

Compiler Construction 11/49

Page 15: LL and LR Parsingkjleach.eecs.umich.edu/c18/l6.pdfparsing predicts which production rule to use from k tokens of lookahead. LL(1) parsing is a special case using one token of lookahead.

Top Down Parsing

S → a B cB → C x BB → εC → d

| a B c

Input string:“adxdxc”

S

B

C

CC

B

BB

a d x d xx c

Compiler Construction 11/49

Page 16: LL and LR Parsingkjleach.eecs.umich.edu/c18/l6.pdfparsing predicts which production rule to use from k tokens of lookahead. LL(1) parsing is a special case using one token of lookahead.

Top Down Parsing

S → a B cB → C x BB → εC → d

| a B c

Input string:“adxdxc”

S

B

C

C

B

B

a d x dd x c

Compiler Construction 11/49

Page 17: LL and LR Parsingkjleach.eecs.umich.edu/c18/l6.pdfparsing predicts which production rule to use from k tokens of lookahead. LL(1) parsing is a special case using one token of lookahead.

Top Down Parsing

S → a B cB → C x BB → εC → d

| a B c

Input string:“adxdxc”

S

B

C

C

B

B

a d x d x cε

Compiler Construction 11/49

Page 18: LL and LR Parsingkjleach.eecs.umich.edu/c18/l6.pdfparsing predicts which production rule to use from k tokens of lookahead. LL(1) parsing is a special case using one token of lookahead.

Bottom-up Parsing

S → a B cB → C x BB → εC → d

| a B c

Input string:“adxdxc”

Tokens right now: a

aa d x d x c

Compiler Construction 12/49

Page 19: LL and LR Parsingkjleach.eecs.umich.edu/c18/l6.pdfparsing predicts which production rule to use from k tokens of lookahead. LL(1) parsing is a special case using one token of lookahead.

Bottom-up Parsing

S → a B cB → C x BB → εC → d

| a B c

Input string:“adxdxc”

Tokens right now: ad

a dd x d x c

Compiler Construction 12/49

Page 20: LL and LR Parsingkjleach.eecs.umich.edu/c18/l6.pdfparsing predicts which production rule to use from k tokens of lookahead. LL(1) parsing is a special case using one token of lookahead.

Bottom-up Parsing

S → a B cB → C x BB → εC → d

| a B c

Input string:“adxdxc”

Tokens right now: aC

CC

a d x d x c

Compiler Construction 12/49

Page 21: LL and LR Parsingkjleach.eecs.umich.edu/c18/l6.pdfparsing predicts which production rule to use from k tokens of lookahead. LL(1) parsing is a special case using one token of lookahead.

Bottom-up Parsing

S → a B cB → C x BB → εC → d

| a B c

Input string:“adxdxc”

Tokens right now: aCx

C

a d xx d x c

Compiler Construction 12/49

Page 22: LL and LR Parsingkjleach.eecs.umich.edu/c18/l6.pdfparsing predicts which production rule to use from k tokens of lookahead. LL(1) parsing is a special case using one token of lookahead.

Bottom-up Parsing

S → a B cB → C x BB → εC → d

| a B c

Input string:“adxdxc”

Tokens right now: aCxd

C

a d x dd x c

Compiler Construction 12/49

Page 23: LL and LR Parsingkjleach.eecs.umich.edu/c18/l6.pdfparsing predicts which production rule to use from k tokens of lookahead. LL(1) parsing is a special case using one token of lookahead.

Bottom-up Parsing

S → a B cB → C x BB → εC → d

| a B c

Input string:“adxdxc”

Tokens right now: aCxC

C

CC

a d x d x c

Compiler Construction 12/49

Page 24: LL and LR Parsingkjleach.eecs.umich.edu/c18/l6.pdfparsing predicts which production rule to use from k tokens of lookahead. LL(1) parsing is a special case using one token of lookahead.

Bottom-up Parsing

S → a B cB → C x BB → εC → d

| a B c

Input string:“adxdxc”

Tokens right now: aCxCx

C

C

a d x d xx c

Compiler Construction 12/49

Page 25: LL and LR Parsingkjleach.eecs.umich.edu/c18/l6.pdfparsing predicts which production rule to use from k tokens of lookahead. LL(1) parsing is a special case using one token of lookahead.

Bottom-up Parsing

S → a B cB → C x BB → εC → d

| a B c

Input string:“adxdxc”

Tokens right now: aCxCxε

C

C

a d x d x cε

Compiler Construction 12/49

Page 26: LL and LR Parsingkjleach.eecs.umich.edu/c18/l6.pdfparsing predicts which production rule to use from k tokens of lookahead. LL(1) parsing is a special case using one token of lookahead.

Bottom-up Parsing

S → a B cB → C x BB → εC → d

| a B c

Input string:“adxdxc”

Tokens right now: aCxCxB

C

C BB

a d x d x cε

Compiler Construction 12/49

Page 27: LL and LR Parsingkjleach.eecs.umich.edu/c18/l6.pdfparsing predicts which production rule to use from k tokens of lookahead. LL(1) parsing is a special case using one token of lookahead.

Bottom-up Parsing

S → a B cB → C x BB → εC → d

| a B c

Input string:“adxdxc”

Tokens right now: aCxB

C

C

BB

B

a d x d x cε

Compiler Construction 12/49

Page 28: LL and LR Parsingkjleach.eecs.umich.edu/c18/l6.pdfparsing predicts which production rule to use from k tokens of lookahead. LL(1) parsing is a special case using one token of lookahead.

Bottom-up Parsing

S → a B cB → C x BB → εC → d

| a B c

Input string:“adxdxc”

Tokens right now: aB

BB

C

C

B

B

a d x d x cε

Compiler Construction 12/49

Page 29: LL and LR Parsingkjleach.eecs.umich.edu/c18/l6.pdfparsing predicts which production rule to use from k tokens of lookahead. LL(1) parsing is a special case using one token of lookahead.

Bottom-up Parsing

S → a B cB → C x BB → εC → d

| a B c

Input string:“adxdxc”

Tokens right now: aBc

B

C

C

B

B

a d x d x ccε

Compiler Construction 12/49

Page 30: LL and LR Parsingkjleach.eecs.umich.edu/c18/l6.pdfparsing predicts which production rule to use from k tokens of lookahead. LL(1) parsing is a special case using one token of lookahead.

Bottom-up Parsing

S → a B cB → C x BB → εC → d

| a B c

Input string:“adxdxc”

Tokens right now: S

S

B

C

C

B

B

a d x d x cε

Compiler Construction 12/49

Page 31: LL and LR Parsingkjleach.eecs.umich.edu/c18/l6.pdfparsing predicts which production rule to use from k tokens of lookahead. LL(1) parsing is a special case using one token of lookahead.

LL(k) parsing

A LL parser read tokens from left to right andconstructs a top-down leftmost derivation. LL(k)parsing predicts which production rule to use fromk tokens of lookahead. LL(1) parsing is a special

case using one token of lookahead. LL(1) parsing isfast and easy, but does not work if the grammar isambiguous, left-recursive, or non-left-factored.

Compiler Construction 13/49

Page 32: LL and LR Parsingkjleach.eecs.umich.edu/c18/l6.pdfparsing predicts which production rule to use from k tokens of lookahead. LL(1) parsing is a special case using one token of lookahead.

General LL(1) Algorithm

É Process 1 token at a timeÉ Consider a ‘current’ non-terminal symbol,

start with S

É While input is not emptyÉ Given next 1 token (t) and ‘current’ non-terminal

N , choose a rule R s.t. (N → α)É For each element X in rule R from left to right

É If X is a non-terminal, ‘expand’ X by recursing! Set‘current’ to X and consider same token t.

É If X is a terminal and if t matches. If it matches,consume t from input, loop

É Note the need for particular types ofgrammars! What if we have a rule S→ Sα?

Compiler Construction 14/49

Page 33: LL and LR Parsingkjleach.eecs.umich.edu/c18/l6.pdfparsing predicts which production rule to use from k tokens of lookahead. LL(1) parsing is a special case using one token of lookahead.

Recursive Descent Parsing

É Recursive Descent Parsing can parse LL(k)grammars with backtracingÉ We can use RDP to parse LL(1) grammars by

recursing through the rules of the grammar basedupon the next available token

É Intuition: Construct mutually-recursivefunctions that consume tokens according to thegrammar rules!

É TL;DR “Try all productions exhaustively,backtrack”

Compiler Construction 15/49

Page 34: LL and LR Parsingkjleach.eecs.umich.edu/c18/l6.pdfparsing predicts which production rule to use from k tokens of lookahead. LL(1) parsing is a special case using one token of lookahead.

Recursive Descent ParsingE → T + E | TT → ( E ) | i n t | i n t ∗ T

Input: int * int

1. Try E0→ T1+E2

2. Try T1→ (E3)É Nope! token ‘int’ does not match ‘(’ in T1→ (E3)

3. Try T1→ int. Match!

É But the next token ‘*’ does not match ‘+’ from E0

4. Try T1→ int ∗T2

É Matches ‘int’, but ‘+’ from E0 remains unmatched

5. Exhausted choices for T1, so we backtrack to E0

Compiler Construction 16/49

Page 35: LL and LR Parsingkjleach.eecs.umich.edu/c18/l6.pdfparsing predicts which production rule to use from k tokens of lookahead. LL(1) parsing is a special case using one token of lookahead.

Recursive Descent Parsing (2)

E → T + E | TT → ( E ) | i n t | i n t ∗ T

Input: int * int

6. Try E0→ T1

7. Exhaustively try T1→ α productions

É Succeed with T1→ int and T2→ int

E→ T → int ∗T → int ∗ int

Compiler Construction 17/49

Page 36: LL and LR Parsingkjleach.eecs.umich.edu/c18/l6.pdfparsing predicts which production rule to use from k tokens of lookahead. LL(1) parsing is a special case using one token of lookahead.

Recursive Descent Parsing

S → a B| b C

B → b b CC → c c

void S() {if (next_char () == ’a’)

{ consume(’a’); B(); }else if (next_char () == ’b’)

{ consume(’b’); C(); }else { error(); }

}

void B() {if (next_char () == ’b’){ consume(’b’); consume(’b’)

; C(); }else { error(); }

void C() {if (next_char () == ’c’){ consume(’c’); consume(’c’)

; }else { error(); }

}Compiler Construction 18/49

Page 37: LL and LR Parsingkjleach.eecs.umich.edu/c18/l6.pdfparsing predicts which production rule to use from k tokens of lookahead. LL(1) parsing is a special case using one token of lookahead.

Recursive Descent Parsing

T → l i n e \ n umber\n B| ε

B → i f \n T| e l s e \n T| c l a s s \n C| s t r i n g \n C

C → t e x t \n T

That’s right, subsequent assignments PA3 throughPA6 provide inputs that can be parsed throughrecursive descent!

Compiler Construction 19/49

Page 38: LL and LR Parsingkjleach.eecs.umich.edu/c18/l6.pdfparsing predicts which production rule to use from k tokens of lookahead. LL(1) parsing is a special case using one token of lookahead.

Recursive Descent Parsing

Observations

É At any given moment, the fringe of the parsetree is: t1t2...tkA...É Try all productions for A: if A→ BC is a

production, the new fringe is t1t2...tkBC...É Backtrack when the fringe does not match the

input string

Compiler Construction 20/49

Page 39: LL and LR Parsingkjleach.eecs.umich.edu/c18/l6.pdfparsing predicts which production rule to use from k tokens of lookahead. LL(1) parsing is a special case using one token of lookahead.

What Could Go Wrong?

Compiler Construction 21/49

Page 40: LL and LR Parsingkjleach.eecs.umich.edu/c18/l6.pdfparsing predicts which production rule to use from k tokens of lookahead. LL(1) parsing is a special case using one token of lookahead.

Recursive Descent Failure

S → S a

void S() {S();if (next_char () == ’a’){ consume(’a’); }

}

Compiler Construction 22/49

Page 41: LL and LR Parsingkjleach.eecs.umich.edu/c18/l6.pdfparsing predicts which production rule to use from k tokens of lookahead. LL(1) parsing is a special case using one token of lookahead.

Eliminating Left Recursion

É Left-recursive grammars have someproduction rule

S →+ S α

Recursive Descent (and LL(k))parsers cannot parse left-recursivegrammars!

Compiler Construction 23/49

Page 42: LL and LR Parsingkjleach.eecs.umich.edu/c18/l6.pdfparsing predicts which production rule to use from k tokens of lookahead. LL(1) parsing is a special case using one token of lookahead.

Eliminating Left Recursion

Consider the left-recursive grammar:

S → S α | β

S generates all strings starting with β followed by anumber of α

Rewrite using right-recursion

S → β TT → α T | ε

Compiler Construction 24/49

Page 43: LL and LR Parsingkjleach.eecs.umich.edu/c18/l6.pdfparsing predicts which production rule to use from k tokens of lookahead. LL(1) parsing is a special case using one token of lookahead.

Concrete Left Recursion Elimination

S → 1 | S 0

Can be rewritten as

S → 1 TT → 0 T | ε

Compiler Construction 25/49

Page 44: LL and LR Parsingkjleach.eecs.umich.edu/c18/l6.pdfparsing predicts which production rule to use from k tokens of lookahead. LL(1) parsing is a special case using one token of lookahead.

More Left Recursion Elimination

In general

S → Sα 1 | . . . | Sαn | β1 | . . . | βm

All strings dervied from S start with one ofβ1, ...,βm and continue with several instances ofα1, ...,αn.

Rewrite as

S → β1 T | . . . | βm TT → α 1 T | . . . | αn T | ε

Compiler Construction 26/49

Page 45: LL and LR Parsingkjleach.eecs.umich.edu/c18/l6.pdfparsing predicts which production rule to use from k tokens of lookahead. LL(1) parsing is a special case using one token of lookahead.

Recursive Descent Summary

É Simple and general parsingstrategyÉ Left-recursion must be

eliminated first!É There’s an algorithm for that

É Requires significantbacktrackingÉ Backtracking is avoidable for some grammars!

Compiler Construction 27/49

Page 46: LL and LR Parsingkjleach.eecs.umich.edu/c18/l6.pdfparsing predicts which production rule to use from k tokens of lookahead. LL(1) parsing is a special case using one token of lookahead.

LL(1) Predictive Parsing

É LL(1) parsing assumes that for eachnon-terminal and token there is only oneproduction that could lead to successÉ This sounds deterministic! We can use a

table-based approach like with lexingÉ One dimension for current non-terminal to

expandÉ One dimension for next token seen on the inputÉ Each table entry contains one production

Compiler Construction 28/49

Page 47: LL and LR Parsingkjleach.eecs.umich.edu/c18/l6.pdfparsing predicts which production rule to use from k tokens of lookahead. LL(1) parsing is a special case using one token of lookahead.

Predictive Parsing and Left Factoring

S → a B| b C

B → b b CC → c c

vs.

T →T + E | TT →i n t | i n t ∗ T | ( E )

É Left grammar: Easy! One token→One rule

É Right grammar: Hard! Two T productions start with‘int’

É We must left-factor before using LL(1) predictiveparsing

Compiler Construction 29/49

Page 48: LL and LR Parsingkjleach.eecs.umich.edu/c18/l6.pdfparsing predicts which production rule to use from k tokens of lookahead. LL(1) parsing is a special case using one token of lookahead.

Left Factoring

E → T + E | TT → i n t | i n t ∗ T | ( E )

Factor out the common prefixes of productionrules

E → T XX → + E | εT → ( E ) | i n t YY → ∗ T | ε

Compiler Construction 30/49

Page 49: LL and LR Parsingkjleach.eecs.umich.edu/c18/l6.pdfparsing predicts which production rule to use from k tokens of lookahead. LL(1) parsing is a special case using one token of lookahead.

Parse Tables!

É Parse tables are a fast implementation of LL(1)parsersÉ N.B. LL(1) grammars represent a subset of

context-free grammarsÉ Restrict ambiguities in resolving rules to make a

table possible!

É Table T is 2-dimensional:T [A][t] =A→ Y1Y2...Ym means “when youare in production rule A and see token t, startconsidering A→ Y1Y2...Ym”

Compiler Construction 31/49

Page 50: LL and LR Parsingkjleach.eecs.umich.edu/c18/l6.pdfparsing predicts which production rule to use from k tokens of lookahead. LL(1) parsing is a special case using one token of lookahead.

Parse Tables!

E → T XX → + E | εT → ( E ) | i n t YY → ∗ T | ε

LL(1) Parsing Table ($ means end of input)int * + ( ) $

T int Y ( E )E T X T XX + E ε εY *T ε ε ε

Compiler Construction 32/49

Page 51: LL and LR Parsingkjleach.eecs.umich.edu/c18/l6.pdfparsing predicts which production rule to use from k tokens of lookahead. LL(1) parsing is a special case using one token of lookahead.

Parse Tables!

É T[E][int] = T XÉ Interpretation: “If I’m considering nonterminal E

and I see ‘int’, follow production E→ TX

LL(1) Parsing Table ($ means end of input)int * + ( ) $

T int Y ( E )E T X T XX + E ε εY *T ε ε ε

Compiler Construction 33/49

Page 52: LL and LR Parsingkjleach.eecs.umich.edu/c18/l6.pdfparsing predicts which production rule to use from k tokens of lookahead. LL(1) parsing is a special case using one token of lookahead.

Parse Tables!

É T[Y][+] = εÉ Interpretation: “If I’m considering nonterminal Y

and I see a ‘+’, get rid of the Y”

LL(1) Parsing Table ($ means end of input)int * + ( ) $

T int Y ( E )E T X T XX + E ε εY *T ε ε ε

Compiler Construction 34/49

Page 53: LL and LR Parsingkjleach.eecs.umich.edu/c18/l6.pdfparsing predicts which production rule to use from k tokens of lookahead. LL(1) parsing is a special case using one token of lookahead.

Parse Tables!

É Blank entries indicate errors! ConsiderT[E][*]É Interpretation: “There is no way to derive a string

starting with * from non-terminal E.”

LL(1) Parsing Table ($ means end of input)int * + ( ) $

T int Y ( E )E T X T XX + E ε εY *T ε ε ε

Compiler Construction 35/49

Page 54: LL and LR Parsingkjleach.eecs.umich.edu/c18/l6.pdfparsing predicts which production rule to use from k tokens of lookahead. LL(1) parsing is a special case using one token of lookahead.

Using Parse Tables

É Much like recursive descent

É For each non-terminal SÉ Look at next token aÉ Choose production shown in T[S][a]

É We use a stack to track pending non-terminalsÉ Reject when we encounter an error state

(a blank)É Accept when we encounter an end-of-input

Compiler Construction 36/49

Page 55: LL and LR Parsingkjleach.eecs.umich.edu/c18/l6.pdfparsing predicts which production rule to use from k tokens of lookahead. LL(1) parsing is a special case using one token of lookahead.

LL(1) Predictive Parsing with Tablepush($); // we succeed if we get to the endpush(S); // start symboldo {

X = pop();if (X == $) { accept (); }if (is_terminal(X)){

if (X == next_token ()) {consume(next_token ());

} else { error (); }} else {

// X is non terminalif (T[X][ next_token ()] == "X → Y1 Y2 ... Ym")

{push(Ym); ... push(Y2); push(Y1);

} else { error (); }}

} while (X != $);

Compiler Construction 37/49

Page 56: LL and LR Parsingkjleach.eecs.umich.edu/c18/l6.pdfparsing predicts which production rule to use from k tokens of lookahead. LL(1) parsing is a special case using one token of lookahead.

Stack Input Action

int * + ( ) $

T int Y ( E )E T X T XX + E ε εY *T ε ε ε

Compiler Construction 38/49

Page 57: LL and LR Parsingkjleach.eecs.umich.edu/c18/l6.pdfparsing predicts which production rule to use from k tokens of lookahead. LL(1) parsing is a special case using one token of lookahead.

Stack Input ActionE $ int * int $ T X

int * + ( ) $

T int Y ( E )E T X T XX + E ε εY *T ε ε ε

Compiler Construction 38/49

Page 58: LL and LR Parsingkjleach.eecs.umich.edu/c18/l6.pdfparsing predicts which production rule to use from k tokens of lookahead. LL(1) parsing is a special case using one token of lookahead.

Stack Input ActionE $ int * int $ T XT X $ int * int $ int Y

int * + ( ) $

T int Y ( E )E T X T XX + E ε εY *T ε ε ε

Compiler Construction 38/49

Page 59: LL and LR Parsingkjleach.eecs.umich.edu/c18/l6.pdfparsing predicts which production rule to use from k tokens of lookahead. LL(1) parsing is a special case using one token of lookahead.

Stack Input ActionE $ int * int $ T XT X $ int * int $ int Yint Y X $ int * int $ consume

int * + ( ) $

T int Y ( E )E T X T XX + E ε εY *T ε ε ε

Compiler Construction 38/49

Page 60: LL and LR Parsingkjleach.eecs.umich.edu/c18/l6.pdfparsing predicts which production rule to use from k tokens of lookahead. LL(1) parsing is a special case using one token of lookahead.

Stack Input ActionE $ int * int $ T XT X $ int * int $ int Yint Y X $ int * int $ consumeY X $ * int $ * T

int * + ( ) $

T int Y ( E )E T X T XX + E ε εY *T ε ε ε

Compiler Construction 38/49

Page 61: LL and LR Parsingkjleach.eecs.umich.edu/c18/l6.pdfparsing predicts which production rule to use from k tokens of lookahead. LL(1) parsing is a special case using one token of lookahead.

Stack Input ActionE $ int * int $ T XT X $ int * int $ int Yint Y X $ int * int $ consumeY X $ * int $ * T* T X $ * int $ consume

int * + ( ) $

T int Y ( E )E T X T XX + E ε εY *T ε ε ε

Compiler Construction 38/49

Page 62: LL and LR Parsingkjleach.eecs.umich.edu/c18/l6.pdfparsing predicts which production rule to use from k tokens of lookahead. LL(1) parsing is a special case using one token of lookahead.

Stack Input ActionE $ int * int $ T XT X $ int * int $ int Yint Y X $ int * int $ consumeY X $ * int $ * T* T X $ * int $ consumeT X $ int $ int Y

int * + ( ) $

T int Y ( E )E T X T XX + E ε εY *T ε ε ε

Compiler Construction 38/49

Page 63: LL and LR Parsingkjleach.eecs.umich.edu/c18/l6.pdfparsing predicts which production rule to use from k tokens of lookahead. LL(1) parsing is a special case using one token of lookahead.

Stack Input ActionE $ int * int $ T XT X $ int * int $ int Yint Y X $ int * int $ consumeY X $ * int $ * T* T X $ * int $ consumeT X $ int $ int Yint Y X $ int $ consume

int * + ( ) $

T int Y ( E )E T X T XX + E ε εY *T ε ε ε

Compiler Construction 38/49

Page 64: LL and LR Parsingkjleach.eecs.umich.edu/c18/l6.pdfparsing predicts which production rule to use from k tokens of lookahead. LL(1) parsing is a special case using one token of lookahead.

Stack Input ActionE $ int * int $ T XT X $ int * int $ int Yint Y X $ int * int $ consumeY X $ * int $ * T* T X $ * int $ consumeT X $ int $ int Yint Y X $ int $ consumeY X $ $ εX $ $ ε

int * + ( ) $

T int Y ( E )E T X T XX + E ε εY *T ε ε ε

Compiler Construction 38/49

Page 65: LL and LR Parsingkjleach.eecs.umich.edu/c18/l6.pdfparsing predicts which production rule to use from k tokens of lookahead. LL(1) parsing is a special case using one token of lookahead.

Stack Input ActionE $ int * int $ T XT X $ int * int $ int Yint Y X $ int * int $ consumeY X $ * int $ * T* T X $ * int $ consumeT X $ int $ int Yint Y X $ int $ consumeY X $ $ εX $ $ ε$ $ ACCEPT

int * + ( ) $

T int Y ( E )E T X T XX + E ε εY *T ε ε ε

Compiler Construction 38/49

Page 66: LL and LR Parsingkjleach.eecs.umich.edu/c18/l6.pdfparsing predicts which production rule to use from k tokens of lookahead. LL(1) parsing is a special case using one token of lookahead.

LL(1) Languages

É LL(1) languages can be LL(1) parsedÉ Formally, A language Q is LL(1) if there exists an

LL(1) table such that the LL(1) parsing algorithmusing that table accepts exactly the strings in Q.

É No table entry can be multiply definedÉ This restricts the grammar!

É Once we construct the table1. The parsing algorithm is simple and fast2. No backtracking is necessary

É Wouldn’t it be nice to generate a parsing tablefrom a CFG?

Compiler Construction 39/49

Page 67: LL and LR Parsingkjleach.eecs.umich.edu/c18/l6.pdfparsing predicts which production rule to use from k tokens of lookahead. LL(1) parsing is a special case using one token of lookahead.

FIRST and FOLLOW sets

É FIRST(α) is the set of all terminal symbolsthat can begin some derivation starting with α

___α→ ...→ aβ

FIRST(α) = { a ∈ T | α→∗ a β} ∪ {ε | α→∗ ε}

Example:

S → a | b S c

FIRST(S) = {a, b}

Compiler Construction 40/49

Page 68: LL and LR Parsingkjleach.eecs.umich.edu/c18/l6.pdfparsing predicts which production rule to use from k tokens of lookahead. LL(1) parsing is a special case using one token of lookahead.

Example FIRST sets

S → a S e | S T TT → R S e | QR → x S x | εQ → S T | ε

FIRST(S) = ?FIRST(T) = ?FIRST(R) = ?FIRST(Q) = ?

Compiler Construction 41/49

Page 69: LL and LR Parsingkjleach.eecs.umich.edu/c18/l6.pdfparsing predicts which production rule to use from k tokens of lookahead. LL(1) parsing is a special case using one token of lookahead.

FOLLOW setsÉ FOLLOW(A) is the set of terminals

(including $) that follows a non-terminal A

FOLLOW(A) ={ a ∈ T | S→+ ...Aa...} ∪ {$ | S→+ ...A}

É Compute FIRST sets for all non-terminals

É Add $ to FOLLOW(S) (the start symbol always endswith end-of-input)

É For all productions Y → ...XA1...An

É Add FIRST(Ai)-{ε} to FOLLOW(X). Stop ifε 6∈ FIRST (Ai).

É Add FOLLOW(Y) to FOLLOW(X)Compiler Construction 42/49

Page 70: LL and LR Parsingkjleach.eecs.umich.edu/c18/l6.pdfparsing predicts which production rule to use from k tokens of lookahead. LL(1) parsing is a special case using one token of lookahead.

Example FOLLOW Set

E → T XX → + E | εT → ( E ) | i n t YY → ∗ T | ε

FOLLOW(“+”) = { int, ( }FOLLOW(“(”) = { int, ( }FOLLOW(X) = { $, ) }FOLLOW(Y) = { +, ), $ }

Compiler Construction 43/49

Page 71: LL and LR Parsingkjleach.eecs.umich.edu/c18/l6.pdfparsing predicts which production rule to use from k tokens of lookahead. LL(1) parsing is a special case using one token of lookahead.

Back to Parsing Tables

É Recall: We want to build a LL(1) Parsing Table

For each production A→ α in G do:É For each terminal b ∈ FIRST(α) do

É T[A][b] = αÉ If α→∗ ε, for each b ∈ FOLLOW(A) do

É T[A][b] = α

Compiler Construction 44/49

Page 72: LL and LR Parsingkjleach.eecs.umich.edu/c18/l6.pdfparsing predicts which production rule to use from k tokens of lookahead. LL(1) parsing is a special case using one token of lookahead.

Parsing TableE → T XX → + E | εT → ( E ) | i n t YY → ∗ T | ε

Where do we put Y →∗T ?

É Well, FIRST(*T) = {*}, thus column * of row Y gets *T

int * + ( ) $

T int Y ( E )E T X T XX + E ε εY *T ε ε ε

Compiler Construction 45/49

Page 73: LL and LR Parsingkjleach.eecs.umich.edu/c18/l6.pdfparsing predicts which production rule to use from k tokens of lookahead. LL(1) parsing is a special case using one token of lookahead.

Parsing TableE → T XX → + E | εT → ( E ) | i n t YY → ∗ T | ε

Where do we put Y → ε?É Well, FOLLOW(Y) = {$, +, )}, thus columns $, +, and )

in row Y get Y → ε

int * + ( ) $

T int Y ( E )E T X T XX + E ε εY *T ε ε ε

Compiler Construction 46/49

Page 74: LL and LR Parsingkjleach.eecs.umich.edu/c18/l6.pdfparsing predicts which production rule to use from k tokens of lookahead. LL(1) parsing is a special case using one token of lookahead.

Notes on LL(1) Parsing Tables

É If any entry is multiply defined then G is notLL(1)É G is ambiguousÉ G is left-recursiveÉ G is not left-factored

Compiler Construction 47/49

Page 75: LL and LR Parsingkjleach.eecs.umich.edu/c18/l6.pdfparsing predicts which production rule to use from k tokens of lookahead. LL(1) parsing is a special case using one token of lookahead.

Ambiguity in parse tables

E → E + T T → FE → T F → i dT → T ∗ F F → ( E )

For the E productions, we need FIRST(T) = {(, id} andFIRST(E) = {(, id}

But now, which rule ( E→ E+T or E→ T ) gets put inT[E][(] and T[E][id]??

+ * ( ) id $

E ? ?TF

Compiler Construction 48/49

Page 76: LL and LR Parsingkjleach.eecs.umich.edu/c18/l6.pdfparsing predicts which production rule to use from k tokens of lookahead. LL(1) parsing is a special case using one token of lookahead.

Simple Parsing Strategies

É Recursive Descent ParsingÉ Backtracking is annoying, BUT super useful for

PA3-6

É Predictive Parsing a.k.a. LL(k)É Predict production from k tokens of lookaheadÉ Build LL(1) tableÉ Parsing is now fast and easy!

É Next up, LR Parsing, a more powerful strategyfor parsing non-LL(1) grammars

Compiler Construction 49/49