Lecture 5 Lexical Analysis (4)prl.korea.ac.kr/~pronto/home/courses/cose312/2015/slides/... ·...

29
COSE312: Compilers Lecture 5 — Lexical Analysis (4) Hakjoo Oh 2015 Fall Hakjoo Oh COSE312 2015 Fall, Lecture 5 September 14, 2015 1 / 29

Transcript of Lecture 5 Lexical Analysis (4)prl.korea.ac.kr/~pronto/home/courses/cose312/2015/slides/... ·...

Page 1: Lecture 5 Lexical Analysis (4)prl.korea.ac.kr/~pronto/home/courses/cose312/2015/slides/... · 2020-01-30 · COSE312: Compilers Lecture 5 | Lexical Analysis (4) Hakjoo Oh 2015 Fall

COSE312: Compilers

Lecture 5 — Lexical Analysis (4)

Hakjoo Oh2015 Fall

Hakjoo Oh COSE312 2015 Fall, Lecture 5 September 14, 2015 1 / 29

Page 2: Lecture 5 Lexical Analysis (4)prl.korea.ac.kr/~pronto/home/courses/cose312/2015/slides/... · 2020-01-30 · COSE312: Compilers Lecture 5 | Lexical Analysis (4) Hakjoo Oh 2015 Fall

Part 3: Automation

Transform the lexical specification into an executable string recognizers:

Hakjoo Oh COSE312 2015 Fall, Lecture 5 September 14, 2015 2 / 29

Page 3: Lecture 5 Lexical Analysis (4)prl.korea.ac.kr/~pronto/home/courses/cose312/2015/slides/... · 2020-01-30 · COSE312: Compilers Lecture 5 | Lexical Analysis (4) Hakjoo Oh 2015 Fall

From NFA to DFA

Transform an NFA(N,Σ, δN , n0, NA)

into an equivalent DFA

(D,Σ, δD, d0, DA).

Running example:

0start 1 2 3

4 5

6 7

8 9a ε ε

ε

ε

b

c

ε

ε

εε

ε

Hakjoo Oh COSE312 2015 Fall, Lecture 5 September 14, 2015 3 / 29

Page 4: Lecture 5 Lexical Analysis (4)prl.korea.ac.kr/~pronto/home/courses/cose312/2015/slides/... · 2020-01-30 · COSE312: Compilers Lecture 5 | Lexical Analysis (4) Hakjoo Oh 2015 Fall

ε-Closures

ε-closure(I): the set of states reachable from I without consuming anysymbols.

0start 1 2 3

4 5

6 7

8 9a ε ε

ε

ε

b

c

ε

ε

εε

ε

ε-closure({1}) = {1, 2, 3, 4, 6, 9}ε-closure({1, 5}) = {1, 2, 3, 4, 6, 9} ∪ {3, 4, 5, 6, 8, 9}

Hakjoo Oh COSE312 2015 Fall, Lecture 5 September 14, 2015 4 / 29

Page 5: Lecture 5 Lexical Analysis (4)prl.korea.ac.kr/~pronto/home/courses/cose312/2015/slides/... · 2020-01-30 · COSE312: Compilers Lecture 5 | Lexical Analysis (4) Hakjoo Oh 2015 Fall

Subset Construction

Input: an NFA (N,Σ, δN , n0, NA).

Output: a DFA (D,Σ, δD, d0, DA).

Key Idea: the DFA simulates the NFA by considering every possibilityat once. A DFA state d ∈ D is a set of NFA state, i.e., d ⊆ N .

Hakjoo Oh COSE312 2015 Fall, Lecture 5 September 14, 2015 5 / 29

Page 6: Lecture 5 Lexical Analysis (4)prl.korea.ac.kr/~pronto/home/courses/cose312/2015/slides/... · 2020-01-30 · COSE312: Compilers Lecture 5 | Lexical Analysis (4) Hakjoo Oh 2015 Fall

Running Example (1/5)

The initial DFA state d0 = ε-closure({0}) = {0}.

{0}start

Hakjoo Oh COSE312 2015 Fall, Lecture 5 September 14, 2015 6 / 29

Page 7: Lecture 5 Lexical Analysis (4)prl.korea.ac.kr/~pronto/home/courses/cose312/2015/slides/... · 2020-01-30 · COSE312: Compilers Lecture 5 | Lexical Analysis (4) Hakjoo Oh 2015 Fall

Running Example (2/5)

For the initial state S, consider every x ∈ Σ and compute thecorresponding next states:

ε-closure(⋃s∈S

δ(s, a)).

ε-closure(⋃

s∈{0} δ(s, a)) = {1, 2, 3, 4, 6, 9}ε-closure(

⋃s∈{0} δ(s, b)) = ∅

ε-closure(⋃

s∈{0} δ(s, c)) = ∅

{0}start{1, 2, 3,4, 6, 9}

a

Hakjoo Oh COSE312 2015 Fall, Lecture 5 September 14, 2015 7 / 29

Page 8: Lecture 5 Lexical Analysis (4)prl.korea.ac.kr/~pronto/home/courses/cose312/2015/slides/... · 2020-01-30 · COSE312: Compilers Lecture 5 | Lexical Analysis (4) Hakjoo Oh 2015 Fall

Running Example (3/5)

For the state {1, 2, 3, 4, 6, 9}, compute the next states:

ε-closure(⋃

s∈{1,2,3,4,6,9} δ(s, a)) = ∅ε-closure(

⋃s∈{1,2,3,4,6,9} δ(s, b)) = {3, 4, 5, 6, 8, 9}

ε-closure(⋃

s∈{1,2,3,4,6,9} δ(s, c)) = {3, 4, 6, 7, 8, 9}

{0}start{1, 2, 3,4, 6, 9}

{3, 4, 5,6, 8, 9}

{3, 4, 6,7, 8, 9}

a

b

c

Hakjoo Oh COSE312 2015 Fall, Lecture 5 September 14, 2015 8 / 29

Page 9: Lecture 5 Lexical Analysis (4)prl.korea.ac.kr/~pronto/home/courses/cose312/2015/slides/... · 2020-01-30 · COSE312: Compilers Lecture 5 | Lexical Analysis (4) Hakjoo Oh 2015 Fall

Running Example (4/5)

Compute the next states of {3, 4, 5, 6, 8, 9}:ε-closure(

⋃s∈{3,4,5,6,8,9} δ(s, a)) = ∅

ε-closure(⋃

s∈{3,4,5,6,8,9} δ(s, b)) = {3, 4, 5, 6, 8, 9}ε-closure(

⋃s∈{3,4,5,6,8,9} δ(s, c)) = {3, 4, 6, 7, 8, 9}

{0}start{1, 2, 3,4, 6, 9}

{3, 4, 5,6, 8, 9}

{3, 4, 6,7, 8, 9}

a

b

c

c

b

Hakjoo Oh COSE312 2015 Fall, Lecture 5 September 14, 2015 9 / 29

Page 10: Lecture 5 Lexical Analysis (4)prl.korea.ac.kr/~pronto/home/courses/cose312/2015/slides/... · 2020-01-30 · COSE312: Compilers Lecture 5 | Lexical Analysis (4) Hakjoo Oh 2015 Fall

Running Example (5/5)

Compute the next states of {3, 4, 6, 7, 8, 9}:ε-closure(

⋃s∈{3,4,6,7,8,9} δ(s, a)) = ∅

ε-closure(⋃

s∈{3,4,6,7,8,9} δ(s, b)) = {3, 4, 5, 6, 8, 9}ε-closure(

⋃s∈{3,4,6,7,8,9} δ(s, c)) = {3, 4, 6, 7, 8, 9}

{0}start{1, 2, 3,4, 6, 9}

{3, 4, 5,6, 8, 9}

{3, 4, 6,7, 8, 9}

a

b

c

c

b

b

c

Hakjoo Oh COSE312 2015 Fall, Lecture 5 September 14, 2015 10 / 29

Page 11: Lecture 5 Lexical Analysis (4)prl.korea.ac.kr/~pronto/home/courses/cose312/2015/slides/... · 2020-01-30 · COSE312: Compilers Lecture 5 | Lexical Analysis (4) Hakjoo Oh 2015 Fall

Subset Construction Algorithm

Hakjoo Oh COSE312 2015 Fall, Lecture 5 September 14, 2015 11 / 29

Page 12: Lecture 5 Lexical Analysis (4)prl.korea.ac.kr/~pronto/home/courses/cose312/2015/slides/... · 2020-01-30 · COSE312: Compilers Lecture 5 | Lexical Analysis (4) Hakjoo Oh 2015 Fall

Running Example (1/5)

0start 1 2 3

4 5

6 7

8 9a ε ε

ε

ε

b

c

ε

ε

εε

ε

The initial state d0 = ε-closure({0}) = {0}. Initialize D and W :

D = {{0}}, W = {{0}}

Hakjoo Oh COSE312 2015 Fall, Lecture 5 September 14, 2015 12 / 29

Page 13: Lecture 5 Lexical Analysis (4)prl.korea.ac.kr/~pronto/home/courses/cose312/2015/slides/... · 2020-01-30 · COSE312: Compilers Lecture 5 | Lexical Analysis (4) Hakjoo Oh 2015 Fall

Running Example (2/5)

Choose q = {0} from W . For all c ∈ Σ, update δD:

a b c

{0} {1, 2, 3, 4, 6, 9} ∅ ∅

Update D and W :

D = {{0}, {1, 2, 3, 4, 6, 9}}, W = {{1, 2, 3, 4, 6, 9}}

Hakjoo Oh COSE312 2015 Fall, Lecture 5 September 14, 2015 13 / 29

Page 14: Lecture 5 Lexical Analysis (4)prl.korea.ac.kr/~pronto/home/courses/cose312/2015/slides/... · 2020-01-30 · COSE312: Compilers Lecture 5 | Lexical Analysis (4) Hakjoo Oh 2015 Fall

Running Example (3/5)

Choose q = {1, 2, 3, 4, 6, 9} from W . For all c ∈ Σ, update δD:

a b c{0} {1, 2, 3, 4, 6, 9} ∅ ∅

{1, 2, 3, 4, 6, 9} ∅ {3, 4, 5, 6, 8, 9} {3, 4, 6, 7, 8, 9}

Update D and W :

D = {{0}, {1, 2, 3, 4, 6, 9}, {3, 4, 5, 6, 8, 9}, {3, 4, 6, 7, 8, 9}}W = {{3, 4, 5, 6, 8, 9}, {3, 4, 6, 7, 8, 9}}

Hakjoo Oh COSE312 2015 Fall, Lecture 5 September 14, 2015 14 / 29

Page 15: Lecture 5 Lexical Analysis (4)prl.korea.ac.kr/~pronto/home/courses/cose312/2015/slides/... · 2020-01-30 · COSE312: Compilers Lecture 5 | Lexical Analysis (4) Hakjoo Oh 2015 Fall

Running Example (4/5)

Choose q = {3, 4, 5, 6, 8, 9} from W . For all c ∈ Σ, update δD:

a b c{0} {1, 2, 3, 4, 6, 9} ∅ ∅

{1, 2, 3, 4, 6, 9} ∅ {3, 4, 5, 6, 8, 9} {3, 4, 6, 7, 8, 9}{3, 4, 5, 6, 8, 9} ∅ {3, 4, 5, 6, 8, 9} {3, 4, 6, 7, 8, 9}

D and W :

D = {{0}, {1, 2, 3, 4, 6, 9}, {3, 4, 5, 6, 8, 9}, {3, 4, 6, 7, 8, 9}}W = {{3, 4, 6, 7, 8, 9}}

Hakjoo Oh COSE312 2015 Fall, Lecture 5 September 14, 2015 15 / 29

Page 16: Lecture 5 Lexical Analysis (4)prl.korea.ac.kr/~pronto/home/courses/cose312/2015/slides/... · 2020-01-30 · COSE312: Compilers Lecture 5 | Lexical Analysis (4) Hakjoo Oh 2015 Fall

Running Example (5/5)

Choose q = {3, 4, 6, 7, 8, 9} from W . For all c ∈ Σ, update δD:

a b c{0} {1, 2, 3, 4, 6, 9} ∅ ∅

{1, 2, 3, 4, 6, 9} ∅ {3, 4, 5, 6, 8, 9} {3, 4, 6, 7, 8, 9}{3, 4, 5, 6, 8, 9} ∅ {3, 4, 5, 6, 8, 9} {3, 4, 6, 7, 8, 9}{3, 4, 6, 7, 8, 9} ∅ {3, 4, 5, 6, 8, 9} {3, 4, 6, 7, 8, 9}

D and W :

D = {{0}, {1, 2, 3, 4, 6, 9}, {3, 4, 5, 6, 8, 9}, {3, 4, 6, 7, 8, 9}}W = ∅

The while loop terminates. The accepting states:

DA = {{1, 2, 3, 4, 6, 9}, {3, 4, 5, 6, 8, 9}, {3, 4, 6, 7, 8, 9}}

Hakjoo Oh COSE312 2015 Fall, Lecture 5 September 14, 2015 16 / 29

Page 17: Lecture 5 Lexical Analysis (4)prl.korea.ac.kr/~pronto/home/courses/cose312/2015/slides/... · 2020-01-30 · COSE312: Compilers Lecture 5 | Lexical Analysis (4) Hakjoo Oh 2015 Fall

Algorithm for computing ε-Closures

The definition

ε-closure(I) is the set of states reachable from Iwithout consuming any symbols.

is neither formal nor constructive.

To be formal and constructive,1 define ε-closure(I) by inductive definition,2 compute the set by fixed point computation.

Hakjoo Oh COSE312 2015 Fall, Lecture 5 September 14, 2015 17 / 29

Page 18: Lecture 5 Lexical Analysis (4)prl.korea.ac.kr/~pronto/home/courses/cose312/2015/slides/... · 2020-01-30 · COSE312: Compilers Lecture 5 | Lexical Analysis (4) Hakjoo Oh 2015 Fall

Inductive Definition

Let I be a set of NFA states. The ε closure, T = ε-closure(I), is thesmallest set that satisfies the two conditions:

1 I ⊆ T .

2 If S ⊆ T , then⋃

s∈S δ(s, ε) ⊆ T .

or alternatively, T = ε-closure(I) is the smallest set that satisfies the twoconditions.

1 I ⊆ T .

2⋃

s∈T δ(s, ε) ⊆ T .

or alternatively, T = ε-closure(I) is the smallest set such that

I ∪⋃s∈T

δ(s, ε) ⊆ T.

The inductively defined set can be computed by formulating the set by aleast fixed point of a function F , and compute the least fixed point viafixed point iteration.

Hakjoo Oh COSE312 2015 Fall, Lecture 5 September 14, 2015 18 / 29

Page 19: Lecture 5 Lexical Analysis (4)prl.korea.ac.kr/~pronto/home/courses/cose312/2015/slides/... · 2020-01-30 · COSE312: Compilers Lecture 5 | Lexical Analysis (4) Hakjoo Oh 2015 Fall

Least Fixed Point

F : a function defined over sets: e.g.,I F1(X) = X ∪ {1, 2, 3}I F2(X) = X = {1, 2}

A set X is a (pre-)fixed point of F if

X ⊇ F (X).

fixF : the least fixed point of F , i.e.,I fixF ⊇ F (fixF )I X ⊇ F (X) =⇒ X ⊇ fixF

fixF can be computed by the algorithm:

T = ∅repeatT ′ = TT = T ′ ∪ F (T ′)

until T = T ′

Hakjoo Oh COSE312 2015 Fall, Lecture 5 September 14, 2015 19 / 29

Page 20: Lecture 5 Lexical Analysis (4)prl.korea.ac.kr/~pronto/home/courses/cose312/2015/slides/... · 2020-01-30 · COSE312: Compilers Lecture 5 | Lexical Analysis (4) Hakjoo Oh 2015 Fall

Computing ε-Closures

To compute T = ε-closure(I),

1 define a function F such that T = fixF , and

2 compute fixF by fixed point iteration.

Hakjoo Oh COSE312 2015 Fall, Lecture 5 September 14, 2015 20 / 29

Page 21: Lecture 5 Lexical Analysis (4)prl.korea.ac.kr/~pronto/home/courses/cose312/2015/slides/... · 2020-01-30 · COSE312: Compilers Lecture 5 | Lexical Analysis (4) Hakjoo Oh 2015 Fall

Computing ε-Closures

1. The inductive definition:

T = ε-closure(I) is the smallest set such that

I ∪⋃s∈T

δ(s, ε) ⊆ T.

can be re-stated by:

T = ε-closure(I) is the smallest set such that

T ⊇ F (T )

where

F (X) = I ∪( ⋃s∈X

δ(s, ε)).

Thus, T = fixF .Hakjoo Oh COSE312 2015 Fall, Lecture 5 September 14, 2015 21 / 29

Page 22: Lecture 5 Lexical Analysis (4)prl.korea.ac.kr/~pronto/home/courses/cose312/2015/slides/... · 2020-01-30 · COSE312: Compilers Lecture 5 | Lexical Analysis (4) Hakjoo Oh 2015 Fall

Computing ε-Closures

2. Compute fixF via fixed point iteration algorithm:

T = ∅repeatT ′ = TT = T ′ ∪ F (T ′)

until T = T ′

ex) ε-closure({1})

Iteration T ′ T

1 ∅ {1}2 {1} {1, 2}3 {1, 2} {1, 2, 3, 9}4 {1, 2, 3, 9} {1, 2, 3, 4, 6, 9}5 {1, 2, 3, 4, 6, 9} {1, 2, 3, 4, 6, 9}

Hakjoo Oh COSE312 2015 Fall, Lecture 5 September 14, 2015 22 / 29

Page 23: Lecture 5 Lexical Analysis (4)prl.korea.ac.kr/~pronto/home/courses/cose312/2015/slides/... · 2020-01-30 · COSE312: Compilers Lecture 5 | Lexical Analysis (4) Hakjoo Oh 2015 Fall

cf) Computer Science is full of fixed points

Every inductively defined set is defined by fixed points.

The set N = {0, 1, 2, 3, . . .} of natural numbers can be defined bya least fixed point

N = fixF.

What is F ?

Let G = (N,→) be a graph, where N is the set of nodes and(→) ⊆ N ×N denotes edges. Let I ⊆ N be a set of initial nodes.The set RI of all nodes reachable from I can be defined by a leastfixed point:

RI = fixF

What is F ?

Hakjoo Oh COSE312 2015 Fall, Lecture 5 September 14, 2015 23 / 29

Page 24: Lecture 5 Lexical Analysis (4)prl.korea.ac.kr/~pronto/home/courses/cose312/2015/slides/... · 2020-01-30 · COSE312: Compilers Lecture 5 | Lexical Analysis (4) Hakjoo Oh 2015 Fall

Efficient Fixed Point Computation via Worklist Algorithm

Recall the fixed point algorithm:

T = ∅repeatT ′ = TT = T ′ ∪ F (T ′)

until T = T ′

and the computation of ε-closure({1}):

Iteration T ′ T

1 ∅ {1}2 {1} {1, 2}3 {1, 2} {1, 2, 3, 9}4 {1, 2, 3, 9} {1, 2, 3, 4, 6, 9}5 {1, 2, 3, 4, 6, 9} {1, 2, 3, 4, 6, 9}

Hakjoo Oh COSE312 2015 Fall, Lecture 5 September 14, 2015 24 / 29

Page 25: Lecture 5 Lexical Analysis (4)prl.korea.ac.kr/~pronto/home/courses/cose312/2015/slides/... · 2020-01-30 · COSE312: Compilers Lecture 5 | Lexical Analysis (4) Hakjoo Oh 2015 Fall

Efficient Fixed Point Computation via Worklist Algorithm

The algorithm involves many redundant computations.

The first iteration:

F (∅) = {1} ∪( ⋃s∈∅

δ(s, ε))

= {1}

The second iteration:

F ({1}) = {1} ∪( ⋃s∈{1}

δ(s, ε))

= {1} ∪ δ(1, ε)

The third iteration:

F ({1, 2}) = {1} ∪( ⋃s∈{1,2}

δ(s, ε))

= {1} ∪ δ(1, ε) ∪ δ(2, ε)

The fourth iteration:

F ({1, 2, 3, 9}) = {1} ∪(⋃

s∈{1,2,3,9} δ(s, ε))

= {1} ∪ δ(1, ε) ∪ δ(2, ε) ∪ δ(3, ε) ∪ δ(9, ε)

Hakjoo Oh COSE312 2015 Fall, Lecture 5 September 14, 2015 25 / 29

Page 26: Lecture 5 Lexical Analysis (4)prl.korea.ac.kr/~pronto/home/courses/cose312/2015/slides/... · 2020-01-30 · COSE312: Compilers Lecture 5 | Lexical Analysis (4) Hakjoo Oh 2015 Fall

Efficient Fixed Point Computation via Worklist Algorithm

The fifth iteration:

F ({1, 2, 3, 4, 6, 9})= {1} ∪

(⋃s∈{1,2,3,4,6,9} δ(s, ε)

)= {1} ∪ δ(1, ε) ∪ δ(2, ε) ∪ δ(3, ε) ∪ δ(4, ε) ∪ δ(6, ε) ∪ δ(9, ε)

Hakjoo Oh COSE312 2015 Fall, Lecture 5 September 14, 2015 26 / 29

Page 27: Lecture 5 Lexical Analysis (4)prl.korea.ac.kr/~pronto/home/courses/cose312/2015/slides/... · 2020-01-30 · COSE312: Compilers Lecture 5 | Lexical Analysis (4) Hakjoo Oh 2015 Fall

Efficient Fixed Point Computation via Worklist Algorithm

The worklist algorithm can compute fixed points with less redundancies:

Hakjoo Oh COSE312 2015 Fall, Lecture 5 September 14, 2015 27 / 29

Page 28: Lecture 5 Lexical Analysis (4)prl.korea.ac.kr/~pronto/home/courses/cose312/2015/slides/... · 2020-01-30 · COSE312: Compilers Lecture 5 | Lexical Analysis (4) Hakjoo Oh 2015 Fall

T and W are initially T = W = {1}.Choose 1 and compute δ(1, ε) = {2}. Add 2 to T and W :

T = {1, 2}, W = {2}Choose 2 and compute δ(2, ε) = {3, 9}. Add them to T and W :

T = {1, 2, 3, 9}, W = {3, 9}Choose 3 and compute δ(3, ε) = {4, 6}. Add them to T and W :

T = {1, 2, 3, 4, 6, 9}, W = {4, 6, 9}Choose 4 and compute δ(4, ε) = ∅. Nothing is added.

T = {1, 2, 3, 4, 6, 9}, W = {6, 9}Choose 6 and compute δ(6, ε) = ∅. Nothing is added.

T = {1, 2, 3, 4, 6, 9}, W = {9}Choose 9 and compute δ(9, ε) = ∅. Nothing is added.

T = {1, 2, 3, 4, 6, 9}, W = {}The worklist is empty and the algorithm terminates.

Hakjoo Oh COSE312 2015 Fall, Lecture 5 September 14, 2015 28 / 29

Page 29: Lecture 5 Lexical Analysis (4)prl.korea.ac.kr/~pronto/home/courses/cose312/2015/slides/... · 2020-01-30 · COSE312: Compilers Lecture 5 | Lexical Analysis (4) Hakjoo Oh 2015 Fall

Summary

Subset construction:

Goal: convert an NFA to an equivalent DFA

Key idea: simulate the NFA by considering every possibility at once

ε-closures:

defined by least fixed points

computed by fixed point algorithmsI naitve iterative algorithmI worklist algorithm

Hakjoo Oh COSE312 2015 Fall, Lecture 5 September 14, 2015 29 / 29