Finite State Machines for Strings over Infinite Alphabets

Finite State Machines for Strings over Infinite Alphabets

F. Neven, T. Schwentick and V. Vianu

Automata Seminar - Spring 2007Tamar Aizikowitz

ACM Transactions on Computational Logic, Vol. V No. N, 01/03

2 of 47

Finite Machine for Infinite Alphabet?

Finite automaton: Transitions based on current state and input value δ defined for QΣ

Infinite alphabet infinite transition function? Solution:

Store a finite number of values Transitions based on stored values New values can be stored during computation

3 of 47

Register Automata

Suggested by Kaminski and Francez, 1994 Finite automata + finite number of registers Registers store values from alphabet Register operations:

Compare register value with current value Store current value in register

Transitions specify change of state, whether value is stored and movement of head.

4 of 47

Infinite Alphabets - Definitions

D : an infinite set (e.g. set of data values) D-string : w=d1dn s.t. di D dom(w) = {1,…,|w|} valw(i) = di for i dom(w) ⊳,⊲ D delimit input string 2-way automata work on w = ⊳v⊲ dom+(w) = {0,…,|w|+1} where: valw(0) = ⊳

valw(|w|+1) = ⊲

5 of 47

Nondeterministic 2-Way k-Register Automata (2N-RA)

A = D , Q , q0 , τ0 , , F D – infinite alphabet Q , q0 , F – as usual

τ0:{1,…,k} → D{⊳,⊲} – initial register assignment – transition function

Two types of transitions: (i,q) → ( p,d ) – current value = register i value

q → ( p,i,d ) – store current value in register i d {stay,right,left} – movement direction of head

6 of 47

Configurations

Configuration: γ = [ j , q , τ ]

Initial configuration: γ0 = [1,q0,τ0]

Accepting configuration: γf = [ j,qf,τ] , qf F

Head Position Current

State

Register Assignment

7 of 47

Computations

[ j, q, τ] [ j’, q’, τ’] iff:

(1) (i,q) → (q’,d) δ , j’ = j+d , valw( j) = τ(i) and τ = τ’

or

(2) q → (q’,i,d) δ , j’ = j+d and τ’= τ | τ(i) ← valw( j)

Note: Type 2 transition relevant only if no type 1 transition applies (why?)

w accepted by A iff there exists γf s.t. γ0 * γf

8 of 47

Variants

Deterministic: at most one transition applies to each configuration.

One way: no left moves in transition function.

xC-RA: denotation for various models Where x{1,2} and C{D,N}

9 of 47

Example 1: 1N-RA

L1={d1dn | i, j : ij di=dj}

contains all words where some value appears more than once

Construction idea: Read input string from left to right “Guess” i and store value in register Look for stored value in remaining input

10 of 47

Example 1: Continued…

A = D , {q0, q1, qf} , q0 , <#,#> , , {qf}

qf : Accepting

configuration reached!

q1 – look for j : Go right:

q1 → (1,q1) If found value,

move to qf :(2,q1) → qf

q0 - look for i : Go right:

q0 → (1,q0) Guess i, store

value, move to q1:q0 → (2,q1)

“Trash” register

Register for storing repeating

value

11 of 47

Example 1: Concluded

Example of run on w = 13234 …

1 3 2 3 4

# # q0→(1,q0)q0

1 3

#1 q0→(2,q1)

2

1 3 q13 q1→(1,q1)2

3

2 3 (2,q1)→qf

4

3 q f

W ACCEPTED!

12 of 47

Example 2: 2N-RA

L2={d1dn | i, j : ij → di dj}

contains all words with distinct values

Construction idea: Scan symbols from left to right. For each symbol:

Store value in register Look for stored value in remaining input If found reject Else proceed to check next symbol (how?)

13 of 47

Example 2: Continued…

A = D,{q0, q1, q2, q3, qrej, qacc}, q0 ,<⊲,#,⊳>,,{qacc}

(1,q0) → (q1,right)

q1 → (2,q2,right)

q2 → (1,q2,right)

(2,q2) → (qrej,stay)

(3,q2) → (q3,left)

q3 → (1,q3,left)

(2,q3) → (q1,right)

(3,q1) → (qacc,stay)

⊲ di dj ⊳⊲ di dj ⊳

⊲ # ⊳? didi

di

didi ⊳

14 of 47

Logic

Variants of first order and monadic second order logic over D-strings.

w represented by logical structure: Domain dom(w) with natural ordering < Value function val:dom(w)→D instantiated by valw

Atomic Formulae: x = y, x < y val(x) = val(y) val(x) = d for dD{⊳,⊲}

15 of 47

FO* and MSO*

The logic FO*

Atomic formulae Boolean connectives First order quantification over dom+(w)

The logic MSO*

FO*

Quantification over unary predicates on dom+(w)

16 of 47

FO* and MSO* Definability

L(φ):= {w D* | w φ}

For example…

What φ defines L1?

xy( x y val(x) = val(y)) What φ defines L2?

xy( x y → val(x) val(y))

17 of 47

RAs vs. MSO*

Theorem 3.1: 2D-RA MSO*

Proof: Consider the language L of strings u#v where the number of unique symbols appearing in u equals the number of unique symbols appearing in v.

Part 1: There exists a 2D-RA which accepts L. Part 2: L is not MSO* definable.

2D-RA MSO*

18 of 47

Proof: Preliminaries

Nu / Nv = the set of unique symbols in u / v

L={u#v | |Nu|= |Nv|}

lmow(d ) = leftmost occurrence of d in w

Nu={a1,…,an} and Nv={b1,…,bm} s.t. for every i < j, lmou(ai) < lmou(aj) and lmov(bi) < lmov(bj).

Note: u#v L iff n = m

19 of 47

Proof: Part 1 (L is 2N-RA)

Question: How can we build a 2D-RA for L? Basic concept:

Visit lmou(a1), lmov(b1), lmou(a2), … in order

If lmou(an) and lmov(bm) are reached simultaneously accept

Else rejectHow can we visit the lmo-s in order?

Finding lmou(a1), lmov(b1) is easy… (how?)

20 of 47

Proof: Part 1 Concluded

Assume ai is stored in a register. Compute lmou(ai+1) as follows: Move head to lmou(ai)

Go left until ⊳ Go right until ai (leftmost occurrence)

For positions lmou(ai)+j (start j=1) test if lmou(ai+1) Store value and proceed to move left If value is encountered then check next position (j++) Else, if ⊳ is reached then lmou(ai+1)= lmou(ai)+j

Similar for bi-s… Language accepted

21 of 47

Proof: Part 2 (L not MSO*)

Assume by contradiction that φ* is an MSO*

sentence s.t. u#vφ* iff |Nu|=|Nv|. Let C be the set of D-symbols appearing

in φ*. w is admissible iff: w is of the form u#v w contains no symbols from C Nu Nv = Each D-symbol occurs at most once in u or v

22 of 47

Proof: Part 2 Continued…

Let φ be φ* by replacing: val(x) = val(y) by x = y val(x) = d by false if d #

For every admissible string w=d1dn#e1em: an#am φ d1dn#e1em φ letters don’t matter in φ

d1dn#e1em φ* w has no letters from C n = m because all letters are different

φ is MSO

23 of 47


For every n, there exists an admissible string d1dn#e1en (why?)

For every n, an#an φ Note: φ is in MSO (no value comparisons) Define a formula for the form an#am:

ψ:= x(val(x)=# y(val(y)=a (val(y)=# y=x)))

L’={an#an | n} is MSO definable by φ ψ L’ is regular Contradiction!

24 of 47

2N-RA vs. FO*

Theorem 3.7: (weak version) FO* 2N-RA

Proof: Define a language L D* s.t:

Part 1: No 2N-RA can accept L.

Part 2: L is FO* definable.

FO* 2N-RA

25 of 47

Proof: Part 1 (L not 2N-RA)

Based on communication complexity methodology: Input string divided between two parties I and II Parties can send messages according to a

pre-defined protocol String is accepted if both parties accept Each party has unlimited computational power Restriction only on form of messages

26 of 47


We consider strings of the form u#v u,v encode sets of subsets of D L={u#v| u,v represent the same set of sets} Claim: L cannot be accepted by 2N-RAs Assume by contradiction that there exists a

2N-RA A s.t. L(A) = L We simulate A by defining an appropriate

protocol…

27 of 47


Define communication protocol as follows: I is given u while II is given v I simulates A until A tries to cross # to the right Sends configuration information to II II simulates A until A tries to cross # to the left Sends configuration information to I So on until one of the parties reaches an

accepting configuration or gets stuck. If A exists such a protocol will accept L

28 of 47


It remains to define an appropriate protocol… Restrict u#v to at most N data values Assume A has |Q| states and k registers M:=|Q|Nk different messages needed

Each message needs to be sent no more than once in each direction (why?)

At most M 2M different possible series of messages (dialogs) need to be considered

29 of 47


M 2M is exponential in N Number of sets of sets of N values is 22N

For large N, there exist u,v s.t: u#u and v#v are accepted by the same dialogue u,v represent different sets of sets

u#v is also accepted No such protocol can accept L No 2N-RA can accept L

30 of 47

Proof: Part 2 (L is FO*)

We show that L is FO* definable…

First we define an encoding for u,v: Assume $ not in D u,v of the form $d11dn1$d12dn2$$d1mdnm$

Each d1jdnj represent a subset of D-values

Goal: Define a formula verifying that every subset in u appears in v and vice versa.

31 of 47


We start with some smaller formulae… w is of the form u#v

form:= x(val(x) = # y(val(y) = # → y=x))

x is in the interval [y,z]

x[y,z]:= y < x x < z The interval [y,z] represents a subset

subs(y,z):= val(y)=$ val(z)=$ y < z x(x[y,z] → val(x)# val(x)

$)

32 of 47


Some more… The subset [y,z] is a subset of [y’,z’]

[y,z][y’,z’]:= x(x[y,z] → x’(x’[y’z’] val(x)=val(x’)))

The subset [y,z] equals the subset [y’,z’][y,z]=[y’,z’]:= [y,z][y’,z’] [y’,z’][y,z]

The subset [y,z] is in u[y,z]u:= sub(y,z) x(val(x)=# → z < x)

The subset [y,z] is in v[y,z]v:= sub(y,z) x(val(x)=# → x < y)

33 of 47


Two last formulae… Every subset in u appears in v

usubv:= yz([y,z]u → y’z’(([y’,z’]v [y,z]=[y’,z’]))

vsubu defined similarly And now to put it all together…

φ:= form usubv vsubu It follows that wφ iff wL

L is FO* definable.

34 of 47

Decision Problems

Kaminski and Francez showed that emptiness for 1N-RAs is decidable

And what of universality?

We will show that universality for 1N-RA is undecidable by reduction from a known undecidable problem, PCP.

35 of 47

Post Correspondence Problem

Introduced by Emil Post in 1946

Input: A sequence of pairs (x1,y1),…,(xn,yn) s.t. xi,yi {a,b}* for i=1,…,n

Solution: A set of indices α1,…, αm {1,…,n} s. t. xα1

xαm = yα1

yαm

Output: Does the given input instance have a solution.

36 of 47

PCP Example

Input:

Solution:

Index 1 2 3 4

x values a ba aa bb

y values aa ab b b

1 1 3 2 4

a a aa ba bb

aa aa b ab b

37 of 47

PCP Undecidability

PCP is known to be undecidable. Proof sketch: Reduction from Lu:

Given a Turing Machine M and a word w Define PCP instance P based on M and w s.t. P

has a solution iff M accepts w A solution for P encodes a run of M on w x-series is always ‘one step ahead’ of y-series y series can ‘catch up’ only if computation in x

series reaches an accepting state

38 of 47

PCP Undecidability Continued

Start computation:

Encode transitions:

Add instance pairs of the following forms:

Copy symbols:

qacc ‘eats’ symbols:

#

#q0w#

qia

bqj

aqib

qjac

a

a

qacc#

#

aqacc

qacc

qacca

qacc

39 of 47

Undecidability of Universality

Theorem 5.1: It is undecidable whether a given 1N-RA is universal.

Proof: For a given PCP instance P, construct a 1N-RA A s.t. A accepts an input string iff it does not represent a solution for P. P has no solution iff A is universal Decidability of universality leads to decidability of PCP Universality of 1N-RA is undecidable

40 of 47

PCP Encoding

Assume w.l.g. that Sym={1,…,n,a,b,#,$}D Candidate: a string u#v s.t:

u encodes xα1 , … , xαm

v encodes yβ1 , … , yβl

Candidate is a solution if: l = m αi = βi

xα1xαm

= yα1yαm

Matching pairs

41 of 47

PCP Encoding Continued

xαj encoding: $ γ αj δ1 a1 δk ak

$ acts as separator γ represents j by a unique value αj 1,…,m

δi encode positions in the word γ and δ values appear only once in u / v xαj

= a1ak

yβj encoded similarly

42 of 47

PCP Encoding Example

Index 1 2 3 4

x values a ba aa bb

y values aa ab b b

1 1 2 1 3 3 4 2 5 4

a a aa ba bb

aa aa b ab b

$ 1 1 1 a $ 2 1 2 a $ 3 3 3 a 4 a

$ 4 2 5 b 6 a $ 5 4 7 b 8 b

#

$ 1 1 1 a 2 a $ 2 1 3 a 4 a $ 3 3 5 b

$ 4 2 6 a 7 b $ 5 4 8 b

43 of 47

PCP Encoding Continued

u#v is syntactically correct if: γ-projection of u = γ-projection of v δ-projection of u = δ-projection of v

u#v represents a solution if: u#v is syntactically correct For each γ, the number to the right of γ is the

same in u and in v For each δ, the symbol to the right of δ is the

same in u and in v

44 of 47

Construction of A

Assume the values of Sym are stored in the initial register assignment

A works as follows: “Guesses” why w is not a valid solution Checks whether w meets the chosen criteria If yes, accepts Else rejects

w has an accepting computation w meets some criteria for being “wrong” w is not a solution for the PCP instance

45 of 47

When is w “wrong”

w is of the wrong form: w u#v u or v ($γαδ…)*

xi a1ak or yj a1ak in u or v

γ-projections are wrong: First / last γ in u first / last γ in v Two γ’s are the same in u / v γ1 and γ2 are successors in u but not in v

46 of 47

When is w “wrong” Concluded

δ projections are wrong: Similar to γ-projections

w does not represent a solution: The α-value for some γ in u is different than the

corresponding β-value in v The a- / b-value for some δ in u is different than

the corresponding a- / b-value in v

47 of 47

Equivalence and Inclusion

Corollary 5.2: Equivalence of 1N-RAs is undecidable.

Proof: Assume equivalence was decidable Build an Automaton AD* that accepts every

possible input word Universality is decidable by checking

equivalence to AD* Contradiction!

Corollary: Inclusion is also undecidable.

Finite State Machines for Strings over Infinite Alphabets

Documents

Transcript of Finite State Machines for Strings over Infinite Alphabets