Notes to Closure Properties

Notes to Closure Properties

Simplification of the automata design

L.∅ = ∅.L = ∅{λ}.L = L.{λ} = L

(L∗)∗ = L∗

(L1 ∪ L2)∗ = L∗1(L2.L∗

1)∗ = L∗2(L1.L∗

2)∗

(L1.L2)R = LR2 .LR

1∂w (L1 ∪ L2) = ∂w (L1) ∪ ∂w (L2)∂w (Σ∗ − L) = Σ∗ − ∂w L

Proof of non–regularityL = {w |w ∈ {0, 1}∗, |w |1 = |w |2} is not regular sinceL ∩ {0i 1j |i , j ≥ 0} = {0i 1i |i ≥ 0} is not regular (pumping lemma).

Automata and Grammars Regular Expressions,Kleene Theorem, Subst., Homom. 4 August 9, 2019 75 / 75 - 95

Regular Expressions (RegE)

Definition 4.1 (Regular Expression (RegE), value of a RegE L(α))Regular expressions α, β ∈ RegE (Σ) over a finite non–empty alphabetΣ = {x1, x2, . . . , xn} and their value L(α) is defined by induction:

Basis:

expression α for value L(α) ≡ [α]∅ empty expression L(∅) = {} ≡ ∅λ empty string L(λ) = {λ}a a ∈ Σ L(a) = {a}.

Induction:expression value remarkα + β L(α + β) = L(α) ∪ L(β)αβ L(αβ) = L(α)L(β) . may be usedα∗ L(α∗) = L(α)∗

(α) L((α)) = L(α) brackets do not change the value.The class of regular expressions over Σ: RegE (Σ) is the smallest class closedunder operations above.


Examples, Precedence

Example 4.1 (Regular Expressions)The language of alternating 0’s and 1’s may be written:

either (01)∗ + (10)∗ + 1(01)∗ + 0(10)∗

or (λ + 1)(01)∗(λ + 0).The language L((0∗10∗10∗1)∗0∗) = {w |w ∈ {0, 1}∗, |w |1 = 3k, k ≥ 0}.

Definition 4.2 (Precedence)The star ∗is the operator with highest precedence, then concatenation ., thelowest precedence has the union +.

Theorem 4.1 (RegE and DFA !Kleene theorem (a variant))Any language recognizable by a DFA can be expressed by a regular expression.Any language of a regular expression can be recognized by a λ-NFA(therefore also a DFA).


Example

1 2

1

0

0,1

R(2)12 = 1∗0(0 + 1)∗

R(0)11 λ + 1 =

R(0)12 0 =

R(0)21 ∅ =

R(0)22 (λ + 0 + 1) =

R(1)11 λ + 1 + (λ + 1)(λ + 1)∗(λ + 1) =1∗

R(1)12 0 + (λ + 1)(λ + 1)∗0 =1∗0

R(1)21 ∅ + ∅(λ + 1)∗(λ + 1) =∅

R(1)22 λ + 0 + 1 + ∅(λ + 1)∗0 =λ + 0 + 1

R(2)11 1∗ + 1∗0(λ + 0 + 1)∗∅ =1∗

R(2)12 1∗0 + 1∗0(λ + 0 + 1)∗(λ + 0 + 1) =1∗0(0 + 1)∗

R(2)21 ∅ + (λ + 0 + 1)(λ + 0 + 1)∗∅ =∅

R(2)22 λ + 0 + 1 + (λ + 0 + 1)(λ + 0 + 1)∗(λ + 0 + 1)=(0 + 1)∗


From a DFA to RegE

From a DFA to RegELet us have a DFA A, QA = {1, . . . , n} with n states.Let R(k)

ij be a regular expression, L(R (k)ij ) = {w |δ∗

≤k(i , w) = j} the set ofwords transferring the state i into j in A where no state with an index higherthan k is on the path.We iteratively construct R (k)

ij pro k = 0, . . . , n.k = 0, i �= j : R (0)

ij = a1 + a2 + . . . + am where a1, a2, . . . , am are symbols onedges i into j (or R (0)

ij = ∅ or R(0)ij = a for m = 0, 1).

k = 0, i = j : loops, R (0)ii = λ + a1 + a2 + . . . + am where a1, a2, . . . , am are

symbols on loops in i .


Induction. We have ∀i , j ∈ Q R (k)ij . We construct R(k+1)

ij .

i k+1 jR(k)

i,(k+1) R(k)(k+1),j

R(k)i,j

R(k)(k+1),(k+1)

R(k+1)ij = R(k)

ij + R(k)i(k+1)(R

(k)(k+1)(k+1))∗R(k)

(k+1)j

Paths from i into j not meeting (k + 1) are already in R (k)ij .

Paths from i into j through (k + 1) with possible loops can be expressedR(k)

i(k+1)(R(k)(k+1)(k+1))∗R(k)

(k+1)j .

Finally RegE = ⊕j∈FAR(n)1j the union over all accepting states j .


DFA to RegE by Successive State Elimination

Previous method may generate up to 4n symbols.Following algorithm sometimes avoids duplicity.We allow regular expressions to anotate the graph (a transformation of theautomaton).

State s selected for elimination

q1 p1

s

q2

qk

pm

R11

R1mR21

R2m

R31

R3m

Q1

Q2

Qk

P1

Pm

S

After s is eliminated.

q1 p1

q2

qk

pm

R11 + Q1S∗P1

R1m + Q1S∗PmR21 + Q2S∗P1

R2m + Q2S∗Pm

Rk1 + QkS∗P1

Rkm + QkS∗Pm


A RegE from a DFA

For every accepting state q ∈ F we eliminate all states p ∈ Q \ {q, q0}.

For q �= q0 we take

RegE (q) = (R + SU∗T )∗SU∗.

RS

T

U

For q = q0 we take

RegE (q) = R∗.

R

And take the union (addition) over all accepting states:RegE (DFA) = ⊕q∈F RegE (q).


ExampleExample 4.2DFA that accepts 1 at the second last or the third-last position.

A B C D

0,1

1 0,1 0,1 the original automaton

A B C D

0+1

1 0+1 0+1 We replace strings by a RegE.

A C D

0+1

1(0+1) 0+1 We eliminate B.

A D

0+1

1(0+1)(0+1) We eliminate C .

We get RegE: (0 + 1)∗1(0 + 1) + (0 + 1)∗1(0 + 1)(0 + 1).

[Elimination Order]We start by non-accepting nor initial nodes q /∈ F , q �= q0.


From a RegE to λ–NFA

From a RegE to λ–NFA

By induction by the structure of R. Ineach step we construct λ-NFA E thatrecognizes the same language L(R) =L(E ) with three additional properties:

1. Exactly one accepting state.2. No edges into the inital state.3. No edges from the accepting state.

Basis:

λ Empty string λ

Empty set ∅a A single string a

INDUCTION:

Addition R + S: R

S

λ

λ

λ

λ

Concatenation RS: R Sλ

Iteration R∗: Rλ λ

λ

λAutomata and Grammars Regular Expressions,Kleene Theorem, Subst., Homom. 4 August 9, 2019 84 / 75 - 95

Pattern search in the text

Static text: we create indexes rather than using RegE.RegE are useful in the dynamic text (like news).

Example 4.3 (Search for streets in addresses on the web)Street identification Streen|St\.|Avenue|Ave\.|Road|Rd\the name before ’[A-Z][a-z]*( [A-Z][a-z]*)*’house number [0-9]+[A-Z]?

all together ’[0-9]+[A-Z]? [A-Z][a-z]*( [A-Z][a-z]*)*Streen|St\.|Avenue|Ave\.|Road|Rd\. ’

We are missing:Bouleward, Place, WayStreets without any identifier (almost all Czech streets)Street names with numbers.. . .


Converting Among Representations

Converting NFA to DFAλ closure in O(n3). Searchn states multiplied by n2

arcs for λ transitions.Subset construction, DFAwith possibly 2n states. Foreach state, O(n3) time tocompute transitions.

λ−NFA NFA

RegE DFA

O(n32n)

O(n32n)O(n)

O(n34n)

O(n)

O(n)

Converting DFA to NFAJust modify transition table by putting set-brackets around states and addingcolumn for λ in the case of λ−NFA.

Automaton to Regular Expression ConversionO(n34n)

RegE to Automaton Conversionλ−NFA in the time O(n).


String Substitution, (String) Homomorphism

Definition 4.3 (String Substitution, (String) Homomorphism)We have a finite alphabet Σ. For each x ∈ Σ we have σ(x) a language over thealphabet Yx . Further, we define:

σ(λ) = {λ}σ(u.v) = σ(u).σ(v)

The mapping σ : Σ∗ → P(Y ∗) where Y =�

x∈Σ Yx is substitution.σ(L) =

�w∈L σ(w)

e–free, λ–free substitution is a substitution where none σ(x) contains λ.For w = a1 . . . an ∈ Σn σ(w) = σ(a1) . . . σ(an).

Example 4.4 (substitution)σ(0) = {aibj , i , j ≥ 0}, σ(1) = {cd}σ(010) = {aibjcdakbl , i , j , k, l ≥ 0}


(String) Homomorphism

Definition 4.4 ((String) Homomorphism)Homomorphism h is a special case of a substitution where h(x) = wx ∀x ∈ Σ. If∀x : wx �= λ is is e–free (λ–free) homomorphism.Inverse homomorphism h−1(L) = {w |h(w) ∈ L}.

Example 4.5 (homomorphism)The function h defined by: h(0) = ab, and h(1) = λ is a homomorphism. Forexample, h(0011) = abab.For L = 10∗1 is h(L) = (ab)∗.

Theorem (Closure under homomorphism)If language L and all ∀x ∈ Σ σ(x) are regular, so is also σ(L), h(L), h−1(L).


Homomorphism preserves regularity

Theorem 4.2If L is a regular language over alphabet Σ, and h is a homomorphism on Σ, thenh(L) is also regular.

Proof.Let L = L(R) for some regular expression R. The proof is done by structuralinduction on sub-expressions E of R: we claim L(h(E )) = h(L(E )).

Basis: h({λ}) = λ, h(∅) = ∅. If E = a then L(E ) = {a}, soh(L(E )) = {h(a)}. Thus, L(h(E )) = {h(a)}.Induction:

Union: L(h(F + G)) = L(h(F ) + h(G)) = L(h(F )) ∪ L(h(G)) andh(L(F + G)) = h(L(F ) ∪ L(G)) = h(L(F )) ∪ h(L(G)). Right sides are equalfrom inductive hypothesis therefore left sides also equal.concatenation, closure proofs are similar.


Inverse Homomorphism

Definition 4.5 (Inversehomomorphism)Suppose h is a homomorphism fromsome alphabet Σ to strings inanother alphabet T . Then h−1(L) ’hinverse of L’ is the set of strings win Σ∗ such that h(w) is in L.

Example 4.6Let L = (00 + 1)∗, h(a) = 01 andh(b) = 10.We claim h−1(L) = (ba)∗.

Proof: h((ba)∗) ∈ L is easy to see.Other w generates isolated 0 (4 casesto consider).

L

h(L)h

h

L

h−1(L)

A homomorphism applied in theforward and inverse direction.


Inverse Homomorphism DFATheorem 4.3If h is a homomorphism from alphabet Σ to alphabet T , and L is a regularlanguage over T , then h−1(L) is also a regular language.

Proof.The proof starts with a DFA A for L. Weconstruct a DFA for h−1(L).

For A = (Q, T , δ, q0, F ) we defineB(Q, Σ, δB , q0, F ) whereδB(q, a) = δ∗(q, h(a)) (δ∗ operates onstrings).By induction on |w |,δ∗

B(q0, w) = δ∗(q0, h(w)).Therefore, B accepts exactly thosestrings w that are in h−1(L).

StartA

Input w

Input h(w) to A

h

Accept/reject

The DFA for h−1(L) ap-plies h to its input, andthen simulates the DFAfo L


Visit every state exampleExample 4.7Suppose A = (Q, Σ, δ, q0, F ) is an DFA. The language L of all strings w in Σ∗

such that δ∗(q0, w) is in F and for every state q ∈ Q there is some prefix xq of wsuch that δ∗(q0, xq) = q.This language is regular.

M = L(A) the language accepted by DFA A in the usual way.T We define a new alphabet T of triples {[paq]; p, q ∈ Q, a ∈ Σ, δ(p, a) = q}.h We define the homomorphism h([paq]) = a for all p, q, a.

L1 Language L1 = h−1(M) is regular since M is regular (DFA and inversehomomorphism).h−1(101) includes 23 = 8 strings, like[p1p][q0q][p1p] ∈ {[p1p], [q1q]}{[p0q], [q0q]}{[p1p], [q1q]}.We construct L from L1 (next slide).

p q

1

0 0,1


L2 Enforce start at q0. DefineE1 =

�a∈Σ,q∈Q{[q0aq]} =

E1 = {[q0a1q0], [q0a2q1], . . . , [q0amqn]}.Then, L2 = L1 ∩ L(E1.T ∗).

L3 Adjacent states must equal. Definenon-matching pairsE2 =

�q �=r ,p,q,r ,s∈Q,a,b∈Σ{[paq][rbs]}.

Define L3 = L2 − L(T ∗.E2.T ∗),L3 It ends in accepting state since we

started from M language of acceptingcomputations on the DFA A.

L4 All states. For each state q ∈ Q, defineEq be the regular expression that is thesum of all the symbols in T such that qappears in neither its first or lastposition. We substract L(E ∗

q ) from L3.L4 = L3 − �

q∈Q{E∗q }.

L Remove states, leave symbols.L = h(L4). We conclude L is regular.

In brief:M = L(A)

Inverse homomorphismL1 h−1(M) ⊆ {[qap]}∗

Intersection with a RLL2 + q0

Difference with a RLL3 + adjacent states equal

Difference with a RLL4 + all states on the path

HomomorphismL h([qap]) = a


Decision Properties of Regular Languages

Lemma (Testing Emptiness of Regular Languages)For finite automatons, it is a question of graph reachability of any final state fromthe initial one. Reachability is O(n2).

LemmaFor regular expression, we can convert it to λ−NFA in O(n) time and than checkreachability.

It can be done also by direct inspection:Basis: ∅ denotes empty language; λ and a are not empty.Induction:

R = R1 + R2 is empty iff both L(R1) and L(R2) are empty.R = R1R2 is empty iff either L(R1) or L(R2) is empty.R = R∗

1 is never empty, in includets λ.R = (R1) is empty iff R1 is empty, since they are the same language.


Testing Membership in a Regular Language

Given a string w ; |w | = n and a regular language L, is w ∈ L?DFA: Run automaton; if |w | = n, suitable representation, constant timetransitions, it is O(n).NFA with s states: running time O(ns2).Each input symbol can be processedby taking the previous set of states, which numbers at most s states.λ−NFA - first compute λ−closure. Then, for each symbol proceed it andcompute λ− closure of the result.For a regular expression of size s we convert it to an λ−NFA with at most 2sstates and then simulate, taking O(ns2).


Notes to Closure Properties

Documents

Transcript of Notes to Closure Properties