Lecture4 lexical analysis2

20

Click here to load reader

Transcript of Lecture4 lexical analysis2

Page 1: Lecture4 lexical analysis2

Lexical Analysis II

Lecture 4

Lexical Analysis II

Page 2: Lecture4 lexical analysis2

Finite Automata

• Regular Expression = specification

• Finite Automata = implementation

• A Finite Automata consists of

– An input alphabet (Σ)

– A finite set of states (S)

– A start state (say n)

– A set of accepting states [F ϵ S]

– A set of transitions

S0 S1

a

Start State Accepting

State

8-Jan-15 2CS 346 Lecture 4

Page 3: Lecture4 lexical analysis2

• If end of input and in accepting state �

ACCEPT

• Otherwise � REJECT

• Example: A FA that accepts only “1”

Finite Automata

• Example: A FA that accepts only “1”

• Input = “1” � Accept but

• Input “0” or “10” � Reject

S0 S1

1

8-Jan-15 3CS 346 Lecture 4

Page 4: Lecture4 lexical analysis2

• Language of a FA ≡ Set of all accepted strings

– e.g.: Any number of 1’s followed by a ‘0’

– Alphabet: {0, 1}

0

1

Finite Automata

• Another kind of transition: e-moves

– Control can move to S1 on all input symbols that

takes control to S0

S0 S1

0

S0 S1

ɛ

8-Jan-15 4CS 346 Lecture 4

Page 5: Lecture4 lexical analysis2

Non-deterministic Finite Automata (NFA)

• Can have multiple transition for one input for a

given state

• Can have ε move

• Example: RE for (a | b)*abb

S0 S1 S2 S3 S4

ε a b b

a | b

8-Jan-15 5CS 346 Lecture 4

Page 6: Lecture4 lexical analysis2

• A DFA is a special case of an NFA:

– One transition per input for a state

– No ε move.

• NFA and DFA recognize the same set of languages (regular

Deterministic Finite Automata (DFA)

S1 S2 S3 S4

a b c

• NFA and DFA recognize the same set of languages (regular languages)

• DFA are faster to execute– There are no choice to consider

• NFA, in general, smaller. NFA can choose. – Exponentially smaller

• There exists space-time trade of between DRA and NFA8-Jan-15 6CS 346 Lecture 4

Page 7: Lecture4 lexical analysis2

Motivation Automatic Lexical Analyser Construction

To convert a specification into code:

• Write down the RE for the input language.

• Convert the RE to a NFA (Thompson’s construction)

• Build the DFA that simulates the NFA (subset construction)

• Shrink the DFA (Hopcroft’s algorithm)• Shrink the DFA (Hopcroft’s algorithm)(for the curious: there is a full cycle - DFA to RE construction is all pairs, all paths)

Lexical analyser generators:

• lex or flex work along these lines.

• Algorithms are well-known and understood.

• Key issue is the interface to parser.

8-Jan-15 7CS 346 Lecture 4

Page 8: Lecture4 lexical analysis2

RE to NFA using Thompson’s construction

S0 S1

ɛ aS0 S1

NFA for ɛ NFA for a

NFA for ab (concatenation)

S0 S1 S2 S3

a bε

S0 S1 S2 S3a

ε

S0S5

S4

S2

S3

S1

a

b

ε

ε

ε

ε

ε ε

ε

NFA for a | b (union) NFA for a* (iteration)

NFA for ab (concatenation)

8-Jan-15 8CS 346 Lecture 4

Page 9: Lecture4 lexical analysis2

Example: Construct the NFA of a (b|c)*

S0 S1

aS0 S1

cS0 S1

b

S0S5

S

S2

S

S1

b

c

ε

ε

ε

ε

First: NFAs

for a, b, c

S0 S1

S2

S4

S3

S5

S6 S7

b

c

ε

ε

εε

ε

ε

ε

ε

S4S3

ε

ε

S0 S1 S2 S3

S4

S6

S5

S7

S8 S9

ε εε

ε

ε

ε

ε

ε

b

c

a

b | c

S0 S1

a

Second: NFA for b|c

S4 S5 ε

ε

Third: NFA for (b|c)*

Fourth: NFA

for a(b|c)*

Of course, a humanwould design a simpler one… But, we can automate production of the complex one...

8-Jan-15 9CS 346 Lecture 4

Page 10: Lecture4 lexical analysis2

NFA to DFA: two key functions• move(si,a): the (union of the) set of states to which there is a

transition on input symbol a from state si

• εεεε-closure(si): the (union of the) set of states reachable by εεεε from si.

Example (see the diagram below):

• ε-closure(3)={3,4,7}; ε-closure({3,10})={3,4,7,10};

• move(ε-closure({3,10}),a)=8;

10

4 7 83

The Algorithm:

• start with the ε-closure of s0 from NFA.

• Do for each unmarked state until there are no unmarked states:

– for each symbol take their ε-closure(move(state,symbol))

ε

ε

8-Jan-15 10CS 346 Lecture 4

Page 11: Lecture4 lexical analysis2

NFA to DFA with subset construction

Initially, ε-closure is the only state in Dstates and it is unmarked.while there is an unmarked state T in Dstates

mark Tfor each input symbol a

U:=ε-closure(move(T,a))if U is not in Dstates then add U as unmarked to DstatesDtable[T,a]:=U

• Dstates (set of states for DFA) and Dtable form the DFA.• Each state of DFA corresponds to a set of NFA states that NFA

could be in after reading some sequences of input symbols.• This is a fixed-point computation.

It sounds more complex than it actually is!

8-Jan-15 11CS 346 Lecture 4

Page 12: Lecture4 lexical analysis2

Example: NFA for (a | b)*abb

• A=ε-closure(0)={0,1,2,4,7}• for each input symbol (that is, a and b):

0 1

2 3

6

4 5

7 8 9 10

ε

εε

εεε

b

a

b ba

ε

ε

• for each input symbol (that is, a and b):– B=ε-closure(move(A,a))=ε-closure({3,8})={1,2,3,4,6,7,8}– C=ε-closure(move(A,b))=ε-closure({5})={1,2,4,5,6,7}– Dtable[A,a]=B; Dtable[A,b]=C

• B and C are unmarked. Repeating the above we end up with:– C={1,2,4,5,6,7}; D={1,2,4,5,6,7,9}; E={1,2,4,5,6,7,10}; and– Dtable[B,a]=B; Dtable[B,b]=D; Dtable[C,a]=B; Dtable[C,b]=C;

Dtable[D,a]=B; Dtable[D,b]=E; Dtable[E,a]=B; Dtable[E,b]=C; no more unmarked sets at this point!

8-Jan-15 12CS 346 Lecture 4

Page 13: Lecture4 lexical analysis2

Result of applying subset constructionTransition table:

state a bA B CB B DC B CD B EE(final) B C

A

B

D E

Cb

a

a

aa

a

b

bb

b

8-Jan-15 13CS 346 Lecture 4

Page 14: Lecture4 lexical analysis2

Another NFA version of the same RE

Apply the subset construction algorithm:

Iteration State Contains ε-closure(move(s,a)) ε-closure(move(s,b))

0 A N0,N1 N1,N2 N1

N0 N1 N2 N3 N4ε

a|b

a b b

Note:

• iteration 3 adds nothing new, so the algorithm stops.

• state E contains N4 (final state)

0 A N0,N1 N1,N2 N1

1 B N1,N2 N1,N2 N1,N3

C N1 N1,N2 N1

2 D N1,N3 N1,N2 N1,N4

3 E N1,N4 N1,N2 N1

8-Jan-15 14CS 346 Lecture 4

Page 15: Lecture4 lexical analysis2

DFA Minimisation: the problem

A

B

D E

Cb

a

a

aa

a

b

bb

b

• Problem: can we minimize the number of states?

• Answer: yes, if we can find groups of states where, for

each input symbol, every state of such a group will

have transitions to the same group.

a

8-Jan-15 15CS 346 Lecture 4

Page 16: Lecture4 lexical analysis2

DFA minimisation: the algorithm(Hopcroft’s algorithm: simple version)

Divide the states of the DFA into two groups: those containing final states and those containing non-final states.

while there are group changesfor each group

for each input symbolif for any two states of the group and a given input if for any two states of the group and a given input

symbol, their transitions do not lead to the same group, these states must belong to different groups.

For the curious, there is an alternative approach: create a graph in which there is anedge between each pair of states which cannot coexist in a group because of theconflict above. Then use a graph colouring algorithm to find the minimum numberof colours needed so that any two nodes connected by an edge do not have thesame colour (we’ll examine graph colouring algorithms later on, in registerallocation)

8-Jan-15 16CS 346 Lecture 4

Page 17: Lecture4 lexical analysis2

How does it work? Recall (a | b)* abb

In each iteration, we consider any non-single-member groups and we

consider the partitioning criterion for all pairs of states in the group.

Iteration Current groups Split on a Split on b

0 {E}, {A,B,C,D} None {A,B,C}, {D}

1 {E}, {D}, {A,B,C} None {A,C}, {B}

2 {E}, {D}, {B}, {A, C} None None

A

B

D E

Cb

a

a

aa

a

b

bb

b

B

D E

A, C

a

a

aa

b

bb

b

8-Jan-15 17CS 346 Lecture 4

Page 18: Lecture4 lexical analysis2

From NFA to minimized DFA: recall a (b | c)*

S0 S1 S2 S3

S4

S6

S5

S7

S8 S9

a ε ε

ε

ε

ε

ε

ε

ε

ε

b

c

ε-closure(move(s,*))DFA

states

NFA

states a b c

b

states states a b c

A S0 S1,S2,S3,

S4,S6,S9

None None

B S1,S2,S3,

S4,S6,S9

None S5,S8,S9,

S3,S4,S6

S7,S8,S9,

S3,S4,S6

C S5,S8,S9,

S3,S4,S6

None S5,S8,S9,

S3,S4,S6

S7,S8,S9,

S3,S4,S6

D S7,S8,S9,

S3,S4,S6

None S5,S8,S9,

S3,S4,S6

S7,S8,S9,

S3,S4,S6

A

D

C

Ba

c

c

c b

b

8-Jan-15 18CS 346 Lecture 4

Page 19: Lecture4 lexical analysis2

DFA minimisation: recall a (b | c)*

Apply the minimisation algorithm to produce the minimal DFA:

C

ac b

b

bCurrent groups Split on a Split on b Split on c

0 {B, C, D} {A} None None None

A

D

Ba

c

c

c b

A B,C,D

b | c

aRemember, last time, we saidthat a human could constructa simpler automaton than Thompson’s construction?Well, algorithms can produce the same DFA!

8-Jan-15 19CS 346 Lecture 4

Page 20: Lecture4 lexical analysis2