Parsing - acm.sjtu.edu.cn · Grammar for Balanced Braces B -> ε B -> ‘{’ B ‘}’ production...

37
Parsing Xiao Jia 2013/03/15 1

Transcript of Parsing - acm.sjtu.edu.cn · Grammar for Balanced Braces B -> ε B -> ‘{’ B ‘}’ production...

Page 1: Parsing - acm.sjtu.edu.cn · Grammar for Balanced Braces B -> ε B -> ‘{’ B ‘}’ production rule 产生式 head body 9

Parsing

Xiao Jia

2013/03/15

1

Page 2: Parsing - acm.sjtu.edu.cn · Grammar for Balanced Braces B -> ε B -> ‘{’ B ‘}’ production rule 产生式 head body 9

• RE for balanced braces? – {}

– {{}}

– {{{{}}}}

– …

2

Page 3: Parsing - acm.sjtu.edu.cn · Grammar for Balanced Braces B -> ε B -> ‘{’ B ‘}’ production rule 产生式 head body 9

Grammar for Balanced Braces

B -> ε

B -> ‘{’ B ‘}’

3

Page 4: Parsing - acm.sjtu.edu.cn · Grammar for Balanced Braces B -> ε B -> ‘{’ B ‘}’ production rule 产生式 head body 9

Grammar for Balanced Braces

B -> ε

B -> ‘{’ B ‘}’

-- OR --

B -> ε | ‘{’ B ‘}’

4

Page 5: Parsing - acm.sjtu.edu.cn · Grammar for Balanced Braces B -> ε B -> ‘{’ B ‘}’ production rule 产生式 head body 9

Grammar for Balanced Braces

B -> ε

B -> ‘{’ B ‘}’

L[[ B ]] = ?

5

Page 6: Parsing - acm.sjtu.edu.cn · Grammar for Balanced Braces B -> ε B -> ‘{’ B ‘}’ production rule 产生式 head body 9

Grammar for Balanced Braces

B -> ε

B -> ‘{’ B ‘}’

6

Page 7: Parsing - acm.sjtu.edu.cn · Grammar for Balanced Braces B -> ε B -> ‘{’ B ‘}’ production rule 产生式 head body 9

Grammar for Balanced Braces

B -> ε

B -> ‘{’ B ‘}’

production rule 产生式

7

Page 8: Parsing - acm.sjtu.edu.cn · Grammar for Balanced Braces B -> ε B -> ‘{’ B ‘}’ production rule 产生式 head body 9

Grammar for Balanced Braces

B -> ε

B -> ‘{’ B ‘}’

production rule 产生式

head

8

Page 9: Parsing - acm.sjtu.edu.cn · Grammar for Balanced Braces B -> ε B -> ‘{’ B ‘}’ production rule 产生式 head body 9

Grammar for Balanced Braces

B -> ε

B -> ‘{’ B ‘}’

production rule 产生式

head body

9

Page 10: Parsing - acm.sjtu.edu.cn · Grammar for Balanced Braces B -> ε B -> ‘{’ B ‘}’ production rule 产生式 head body 9

Grammar for Balanced Braces

B -> ε

B -> ‘{’ B ‘}’

production rule 产生式

terminal 终结符

10

Page 11: Parsing - acm.sjtu.edu.cn · Grammar for Balanced Braces B -> ε B -> ‘{’ B ‘}’ production rule 产生式 head body 9

Grammar for Balanced Braces

B -> ε

B -> ‘{’ B ‘}’

production rule 产生式

terminal 终结符

nonterminal 非终结符

11

Page 12: Parsing - acm.sjtu.edu.cn · Grammar for Balanced Braces B -> ε B -> ‘{’ B ‘}’ production rule 产生式 head body 9

Example

list -> list ‘+’ digit

list -> list ‘-’ digit

list -> digit

digit -> ‘0’ | ‘1’ | ‘2’ | … | ‘9’

9-5+2

3-1

7 12

Page 13: Parsing - acm.sjtu.edu.cn · Grammar for Balanced Braces B -> ε B -> ‘{’ B ‘}’ production rule 产生式 head body 9

Example

list -> list ‘+’ digit

list -> list ‘-’ digit

list -> digit

digit -> ‘0’ | ‘1’ | ‘2’ | … | ‘9’

Terminals: ‘+’ ‘-’ ‘0’ ‘1’ ‘2’ … ‘9’

Nonterminals: list, digit

13

Page 14: Parsing - acm.sjtu.edu.cn · Grammar for Balanced Braces B -> ε B -> ‘{’ B ‘}’ production rule 产生式 head body 9

Example

list -> list ‘+’ digit

list -> list ‘-’ digit

list -> digit

digit -> ‘0’ | ‘1’ | ‘2’ | … | ‘9’

Terminals: ‘+’ ‘-’ ‘0’ ‘1’ ‘2’ … ‘9’

Nonterminals: list, digit

Start symbol: list 14

Page 15: Parsing - acm.sjtu.edu.cn · Grammar for Balanced Braces B -> ε B -> ‘{’ B ‘}’ production rule 产生式 head body 9

Context-free Grammar (CFG)

1. a set of terminals (tokens) T

2. a set of nonterminals N

3. a set of production rules P

4. a start symbol S ∈ N

G = <T, N, P, S>

15

Page 16: Parsing - acm.sjtu.edu.cn · Grammar for Balanced Braces B -> ε B -> ‘{’ B ‘}’ production rule 产生式 head body 9

Languages

Recursively enumerable

Context-sensitive

Context-free

Regular

16

Undecidable

Page 17: Parsing - acm.sjtu.edu.cn · Grammar for Balanced Braces B -> ε B -> ‘{’ B ‘}’ production rule 产生式 head body 9

Languages

Recursively enumerable

Context-sensitive

Context-free

Regular

17

Undecidable Chomsky Hierarchy Type-0 Type-1 Type-2 Type-3

Page 18: Parsing - acm.sjtu.edu.cn · Grammar for Balanced Braces B -> ε B -> ‘{’ B ‘}’ production rule 产生式 head body 9

Languages

Recursively enumerable

Context-sensitive

Context-free

Regular

18

Undecidable Chomsky Hierarchy Type-0 Type-1 Type-2 Type-3

Parsing Lexical Analysis

Page 19: Parsing - acm.sjtu.edu.cn · Grammar for Balanced Braces B -> ε B -> ‘{’ B ‘}’ production rule 产生式 head body 9

Example

{ an bn | n ≥ 1 }

ab

aabb

{ an bn cn | n ≥ 1 }

abc

aabbcc

… 19

Page 20: Parsing - acm.sjtu.edu.cn · Grammar for Balanced Braces B -> ε B -> ‘{’ B ‘}’ production rule 产生式 head body 9

Languages

Recursively enumerable

Context-sensitive

Context-free

Regular

20

Undecidable Chomsky Hierarchy Type-0 Type-1 Type-2 Type-3

Parsing Lexical Analysis

Semantic Analysis

Page 21: Parsing - acm.sjtu.edu.cn · Grammar for Balanced Braces B -> ε B -> ‘{’ B ‘}’ production rule 产生式 head body 9

Languages

Recursively enumerable

Context-sensitive

Context-free

Regular

21

Undecidable Chomsky Hierarchy Type-0 Type-1 Type-2 Type-3

Parsing Lexical Analysis

Semantic Analysis

Turing machine

Page 22: Parsing - acm.sjtu.edu.cn · Grammar for Balanced Braces B -> ε B -> ‘{’ B ‘}’ production rule 产生式 head body 9

Languages

Recursively enumerable

Context-sensitive

Context-free

Regular

22

Undecidable Chomsky Hierarchy Type-0 Type-1 Type-2 Type-3

Parsing Lexical Analysis

Semantic Analysis

Turing machine recursive

visibly pushdown

Page 23: Parsing - acm.sjtu.edu.cn · Grammar for Balanced Braces B -> ε B -> ‘{’ B ‘}’ production rule 产生式 head body 9

Example

Grammar:

1. S -> S + S

2. S -> 1

3. S -> a

String:

1 + 1 + a

23

Page 24: Parsing - acm.sjtu.edu.cn · Grammar for Balanced Braces B -> ε B -> ‘{’ B ‘}’ production rule 产生式 head body 9

Example

Grammar:

1. S -> S + S

2. S -> 1

3. S -> a

String:

1 + 1 + a

24

S -> S + S (1) -> 1 + S (2) -> 1 + S + S (1) -> 1 + 1 + S (2) -> 1 + 1 + a (3)

Page 25: Parsing - acm.sjtu.edu.cn · Grammar for Balanced Braces B -> ε B -> ‘{’ B ‘}’ production rule 产生式 head body 9

Example

Grammar:

1. S -> S + S

2. S -> 1

3. S -> a

String:

1 + 1 + a

25

S -> S + S (1) -> 1 + S (2) -> 1 + S + S (1) -> 1 + 1 + S (2) -> 1 + 1 + a (3)

A derivation is a sequence of rule applications that transforms the start symbol into the string

Page 26: Parsing - acm.sjtu.edu.cn · Grammar for Balanced Braces B -> ε B -> ‘{’ B ‘}’ production rule 产生式 head body 9

How to determine the next nonterminal to rewrite

• Leftmost derivation: – always the leftmost nonterminal

• Rightmost derivation: – always the rightmost nonterminal

26

Page 27: Parsing - acm.sjtu.edu.cn · Grammar for Balanced Braces B -> ε B -> ‘{’ B ‘}’ production rule 产生式 head body 9

27

Leftmost derivation: S -> S + S (1) -> 1 + S (2) -> 1 + S + S (1) -> 1 + 1 + S (2) -> 1 + 1 + a (3)

Rightmost derivation: S -> S + S (1) -> S + a (3) -> S + S + a (1) -> S + 1 + a (2) -> 1 + 1 + a (2)

1. S -> S + S 2. S -> 1 3. S -> a

1 + 1 + a

Page 28: Parsing - acm.sjtu.edu.cn · Grammar for Balanced Braces B -> ε B -> ‘{’ B ‘}’ production rule 产生式 head body 9

Ambiguity

• G = a grammar

• L = the language generated by G

• G is ambiguous if there exist two or more derivations for some string S ∈ L

28

Page 29: Parsing - acm.sjtu.edu.cn · Grammar for Balanced Braces B -> ε B -> ‘{’ B ‘}’ production rule 产生式 head body 9

Derivations & Parse Trees

• A derivation imposes a hierarchical structure on the string that is derived

29

S -> S + S (1) -> 1 + S (2) -> 1 + S + S (1) -> 1 + 1 + S (2) -> 1 + 1 + a (3)

S

S S

S S

+

+ 1

1 a

Page 30: Parsing - acm.sjtu.edu.cn · Grammar for Balanced Braces B -> ε B -> ‘{’ B ‘}’ production rule 产生式 head body 9

Derivations & Parse Trees

• A derivation imposes a hierarchical structure on the string that is derived

30

S

S S

S S

+

+ 1

1 a

parse tree

-- OR --

concrete syntax tree

Page 31: Parsing - acm.sjtu.edu.cn · Grammar for Balanced Braces B -> ε B -> ‘{’ B ‘}’ production rule 产生式 head body 9

Parse Tree vs. Syntax Tree

• Parse tree, or concrete syntax tree

• Syntax tree, or abstract syntax tree

31

Page 32: Parsing - acm.sjtu.edu.cn · Grammar for Balanced Braces B -> ε B -> ‘{’ B ‘}’ production rule 产生式 head body 9

Parse Tree vs. Syntax Tree

• Parse tree, or concrete syntax tree

• Syntax tree, or abstract syntax tree

32

2 * (3 + 4)

Page 33: Parsing - acm.sjtu.edu.cn · Grammar for Balanced Braces B -> ε B -> ‘{’ B ‘}’ production rule 产生式 head body 9

Parse Tree vs. Syntax Tree

• Parse tree, or concrete syntax tree

• Syntax tree, or abstract syntax tree

33

2 * (3 + 4) E

T T

E

*

T T +

2

3 4

( )

syntactic details

OP(*)

NUM(2) OP(+)

NUM(3) NUM(4)

Page 34: Parsing - acm.sjtu.edu.cn · Grammar for Balanced Braces B -> ε B -> ‘{’ B ‘}’ production rule 产生式 head body 9

Parse Tree vs. Syntax Tree

• Nodes in a parse tree exactly correspond to terminals and nonterminals in the grammar

• Nodes in an AST correspond to semantic structures in the meaning of the language

34

Parser Semantic Analyzer

token stream parse tree AST ?

Page 35: Parsing - acm.sjtu.edu.cn · Grammar for Balanced Braces B -> ε B -> ‘{’ B ‘}’ production rule 产生式 head body 9

Semantic Actions

E -> T ‘+’ T

| T ‘*’ T

T -> integer

| ‘(’ E ‘)’

35

Parser Semantic Analyzer

token stream parse tree AST ?

Page 36: Parsing - acm.sjtu.edu.cn · Grammar for Balanced Braces B -> ε B -> ‘{’ B ‘}’ production rule 产生式 head body 9

Semantic Actions

E -> T ‘+’ T { $$ = new Op(‘+’, $1, $3); }

| T ‘*’ T { $$ = new Op(‘*’, $1, $3); }

T -> integer { $$ = new Num($1); }

| ‘(’ E ‘)’ { $$ = $2; }

36

Parser Semantic Analyzer

token stream parse tree AST ?

Page 37: Parsing - acm.sjtu.edu.cn · Grammar for Balanced Braces B -> ε B -> ‘{’ B ‘}’ production rule 产生式 head body 9

Questions?

• Readings:

• “Dragon Book”, Ch.1 – Ch.4

• Parsing Techniques: A Practical Guide

by Dick Grune and Ceriel Jacobs

37