Deterministic Finite Automata - Research School of ... · Deterministic Finite Automata COMP2600...

Deterministic Finite Automata

COMP2600 — Formal Methods for Software Engineering

Katya Lebedeva

Australian National University

Semester 2, 2016

Slides by Ranald Clouston and Katya Lebedeva.

COMP 2600 — Deterministic Finite Automata 1

A Deterministic Finite State Automaton (DFA) is a quintuple

(Σ,S,s0,F,δ)

where

1. Σ is a finite set of input symbols (the alphabet)

2. S is a finite state of states

3. s0 is the start (or initial) state: s0 ∈ S

4. F is a set of final (or accepting) states: F ⊆ S

5. δ : S×Σ → S is a transition function


Example

��- S0

@@R0��61

��S1

��1 ��

0

�� S2

� 1

��0

• alphabet - {0,1}

• states - {S0,S1,S2}

• initial state - S0

• final states - {S2}

• transition function

0 1S0 S1 S0S1 S1 S2S2 S1 S0

Note that the actual names of states are irrelevant.


Transition Function

as a diagram

��- S0

@@R0��61

��S1

��1 ��0

�� S2

� 1

��0

as a transition table 0 1S0 S1 S0S1 S1 S2S2 S1 S0

δ(S0,0) = S1 δ(S0,1) = S0

δ(S1,0) = S1 δ(S1,1) = S2

δ(S2,0) = S1 δ(S2,1) = S0


Extended Transition Function

We want to extend δ to multiple transitions:

δ : S×Σ → S

δ∗ : S×Σ∗ → S

��- S0

@@R0��61

��S1

��1 ��0

�� S2

� 1

��0

Input 0101 takes the DFA from S0 to S2. Hence δ∗(S0,0101) = S2.

Input 1011 takes the DFA from S1 to S0. Hence δ∗(S1,1011) = S0.


δ∗(s,w) = q denotes that starting at state s, the input string w will take the

automaton to state q.

In other words:

If

w = a1a2 . . .an

and

δ(s,a1) = p1

δ(p1,a2) = p2

. . .

δ(pn−1,an) = q

then

δ∗(s,a1a2 . . .an) = q


Definition by induction

Basis: δ∗(s,ε) = s (1)

Induction: δ∗(s,aα) = δ

∗(δ(s,a),α) (2)

Note that a ∈ Σ (i.e. a is a symbol) and α is a string.

Thus δ∗ is defined by breaking the input string into a single symbol followed

by a string. We exploit the fact that strings are defined inductively!

Since we informally thinking about δ∗ as operating on a string, it should not

matter how we break the input string for δ∗. Indeed, we can show that for any

state s and strings α and β the following holds:

δ∗(s,αβ) = δ

∗(δ∗(s,α),β)


Append Theorem

δ∗(s,αβ) = δ

∗(δ∗(s,α),β)

Proof:

By induction on the length of α.

Base case: α = ε

LHS = δ∗(s,εβ) = δ

∗(s,β)

RHS = δ∗(δ∗(s,ε),β)

= δ∗(s,β) = LHS (by (1))


Inductive case:

Suppose δ∗(s,αβ) = δ∗(δ∗(s,α),β) (IH)

LHS = δ∗(s,(aα)β)

= δ∗(s,a(αβ))

= δ∗(δ(s,a),αβ) (by (2))

= δ∗(δ∗(δ(s,a),α),β) (by IH)

RHS = δ∗(δ∗(s,aα),β)

= δ∗(δ∗(δ(s,a),α),β) (by (2))

Corollary: when β is a single symbol b

δ∗(s,αb) = δ(δ∗(s,α),b)


Example:

��- S0

@@R0��61

��S1

��1 ��0

�� S2

� 1

��

0

δ∗(S1,1011) = δ

∗(δ(S1,1),011)

= δ∗(S2,011)

= δ∗(S1,11)

= δ∗(S2,1)

= δ∗(S0,ε)

= S0


Language accepted by a DFA

We say a DFA accepts a string if, starting from the start state, it terminates in

one of the final states. More precisely, let A = (S,Σ,δ,s0,F) be a DFA and

w be a string in Σ∗.

We say w is accepted by A if

δ∗(s0,w) ∈ F

The language accepted by A is the set of all strings accepted by A:

L(A) = {w ∈ Σ∗ | δ∗(s0,w) ∈ F}

That is, w ∈ L(A) iff δ∗(s0,w) ∈ F .


A1:��- S0

@@R0��61

��S1

��1 ��

0

�� S2

� 1

��0

0011101 takes the machine from state S0 through the sequence of states S1,

S1, S2, S0, S0, S1 to the final state S2

δ∗(S0,0011101) = δ∗(S1,011101) = δ∗(S1,11101) = . . .= δ∗(S1,1) = S2


A1:��- S0

@@R0��61

��S1

��1 ��

0

�� S2

� 1

��0

Strings accepted by A1: 01, 001, 101, 0001, 0101, 00101101 . . .

Strings not accepted by A1: ε, 0, 1, 00, 10, 11, 100 . . .

What is the difference between strings that are accepted by A1 and that are

not?

How do we justify our guess at this answer?


To prove that a machine M accepts the language that is characterised by a

predicate P:

1. prove that any string satisfying P is accepted by M

2. prove that any string accepted by M satisfies P

Our claim: A1 accepts the language with only words finishing with 01.

We need to prove the following:

1. If a string ends in 01, then it is accepted by A1. I.e.:

for all α ∈ Σ∗, δ

∗(S0,α01) ∈ F

2. If a string is accepted by A1, then it ends in 01. I.e.:

for all w ∈ Σ∗, if δ

∗(S0,w) ∈ F then exists α ∈ Σ∗. w = α01


Part 1: for all α ∈ Σ∗, δ∗(S0,α01) = S2

By the “append” theorem,

δ∗(S0,α01) = δ

∗(δ∗(S0,α),01) A1:��- S0

@@R0��61

��S1

��1 ��0

�� S2

� 1

��0

Observe that δ∗(s,01) = S2 holds for all s ∈ S.

Indeed, there exist only three possible cases:

δ∗(S0,01) = δ

∗(S1,1) = S2

δ∗(S1,01) = δ

∗(S1,1) = S2

δ∗(S2,01) = δ

∗(S1,1) = S2

Hence,

δ∗(S0,α01) = δ

∗(δ∗(S0,α),01) = S2


Part 2: for all w if δ∗(S0,w) = S2 then exists α ∈ Σ∗. w = α01

• w = ε

δ∗(S0,ε) = S0 and S0 is not equal to S2. Hence, the antecedent is falseand the claim is true by the rules of propositional logic.

• |w|= 1 (w is a single symbol)

The antecedent is false because there is no single step path from S0 toS2. Hence, the claim is true.

• |w|> 1. There are paths from S0 to S2 of two or more steps.

Suppose δ∗(S0,αxy) = S2.

By the corollary to the “append” theorem,

δ(δ∗(S0,αx),y) = S2.

��- S0

@@R0��61

��S1

��1 ��0

�� S2

� 1

��0

By the definition of δ, y must be 1 and δ∗(S0,αx) must be S1.Similarly, since δ(δ∗(S0,α),x) = S1, x is 0 by the definition of δ. �


Limitations of DFAs

What class of languages can be recognised by DFAs ?

A very important example:

Consider this language: L = { anbn | n ∈ N}

That is, L = {ε,ab,aabb,aaabbb,a4b4,a5b5, ...}

This language cannot be recognised by any finite state automaton!

This is because DFA cannot remember how many ‘a’s it has seen.


Proof by contradiction:

Suppose A is a DFA that accepts L. That is L = L(A).

Each of the following expressions denotes a state of A

δ∗(S0,a), δ

∗(S0,aa), δ∗(S0,a3) . . .

Since this list is infinite and the number of states in A is finite, some of these

expressions must denote the same state.

Choose distinct i and j such that δ∗(S0,ai) = δ∗(S0,a j).

What we have done here is pick two initial string fragments that the automaton will not

be able to distinguish in terms of what is allowed for the rest of the string


Since aibi is accepted, we know

δ∗(S0,aibi) ∈ F

By the “append” theorem

δ∗(S0,aibi) = δ

∗(δ∗(S0,ai),bi) ∈ F

Since δ∗(S0,ai) = δ∗(S0,a j)

δ∗(δ∗(S0,a j),bi) ∈ F

δ∗(δ∗(S0,a j),bi) = δ

∗(S0,a jbi) ∈ F

Thus a jbi is accepted by A.

But a jbi is not in L, contradicting the initial assumption.


Pigeon-Hole Principle

We used the pigeon-hole principle in the preceding proof:

If we have more pigeons than pigeon-holes, then at least two pigeons must

be sharing a hole.

In our example we have infinitely many pigeons (words of the language)

crammed into finitely many holes (states of the automaton).

This is a useful technique to prove that a certain infinite language cannot be

recognised by a finite state automata.

But be careful: there are infinite languages that can be recognised by DFAs,

as we have seen.


Equivalent Automata

Two automata are said to be equivalent if they accept the same language.

A2: ��S0

-��

@@R

a

b

��S1

-

��?a

b �� S2

?

��?

a

b

��S3

-

��6 ab

�� S4

6

��6ab A3: ��

S0-

��

@@R

a

b

��S1 Q

QQs

��?a

b

�� S2

��?a,b

��S3 �

��3

��6a

b

Can we simplify a DFA? Is there an equivalent DFA with fewer states?


Equivalent States

Two states S j and Sk of a DFA are equivalent if, for all input strings w

δ∗(S j,w) ∈ F if and only if δ

∗(Sk,w) ∈ F

Note that δ∗(S j,w) and δ∗(Sk,w) may be different states - we only care thatboth are in, or not in, F .

In the following example, S2 is equivalent to S4.

A2: ��S0

-��

@@R

a

b

��S1

-

��?a

b �� S2

?

��?

a

b

��S3

-

��6 ab

�� S4

6

��6ab


Algorithm for Finding Equivalent States

There is an iterative algorithm to compute a list of equivalence classes of

states.

• The working data structure for the algorithm is a list of groups of states

• Each group contains states that appear to be equivalent, given the tests

we have done so far

• On each iteration, we test one of the groups with a symbol from the al-

phabet.

• If we notice different behaviour, we split the group.


Informal Description of the Algorithm

• Initialise: Divide the states into two groups: a group for non-final states

and a group for final states.

We assume that the states in the same group are equivalent, and we try

to distinguish them.

• Loop: Pick a group, {s1, ...s j} and a symbol x. If states δ(si,x) for

i = 1, j are all in the same group, then the group {s1, ...s j} is not (yet)

split. Otherwise split the group according to the result.

• Terminate when we cannot any more split any group by any letter of the

alphabet.


A2: ��S0

-��

@@R

a

b

��S1

-

��?a

b �� S2

?

��?

a

b

��S3

-

��6 ab

�� S4

6

��6ab

[[s0,s1,s3], [s2,s4]]

?a

[[s0,s1], [s3], [s2,s4]]

?b

[[s0], [s1], [s3], [s2,s4]]?a

[[s0], [s1], [s3], [s2,s4]]?b

[[s0], [s1], [s3], [s2,s4]]


Elimination of States

Suppose A = (Σ,S,s0,F,δ) is a DFA with state Sk equivalent to state S j.

(and Sk is not s0.)

We can eliminate Sk from this automaton by defining a new automaton A′ =(Σ,S′,s0,F ′,δ′) as follows:

• S′ is S without Sk

• F ′ is F without Sk

• δ′(s,w) =

S j if δ(s,w) = Sk

δ(s,w) otherwise.


Example

A2: ��S0

-��

@@R

a

b

��S1

-

��?a

b �� S2

?

��?

a

b

��S3

-

��6 ab

�� S4

6

��6ab

Since S2 ≡ S4 in A2, we eliminate one of them: S4.

• New set of states is {S0,S1,S2,S3}

• New set of final states is {S2}

• New transition function is:

��S0

-��

@@R

a

b

��S1 Q

QQs

��?a

b

�� S2

��?a,b

��S3 �

��3

��6a

b


Minimisation of a DFA

Consider the DFA below:

��S1

- -

��?a

b �� S2

��?a,b

��S3

@@@I

��

a b

None of S1, S2, S3 are equivalent...

But S3 is inaccessible from the start state. That is why it can be deleted (along

with the transitions emerging from it).

Deleting equivalent and inaccessible states will give a minimal DFA.


Deterministic Finite Automata - Research School of ... · Deterministic Finite Automata COMP2600...

Documents

Transcript of Deterministic Finite Automata - Research School of ... · Deterministic Finite Automata COMP2600...