Finite State Machines for Strings over Infinite Alphabets
F. Neven, T. Schwentick and V. Vianu
Automata Seminar - Spring 2007Tamar Aizikowitz
ACM Transactions on Computational Logic, Vol. V No. N, 01/03
2 of 47
Finite Machine for Infinite Alphabet?
Finite automaton: Transitions based on current state and input value δ defined for QΣ
Infinite alphabet infinite transition function? Solution:
Store a finite number of values Transitions based on stored values New values can be stored during computation
3 of 47
Register Automata
Suggested by Kaminski and Francez, 1994 Finite automata + finite number of registers Registers store values from alphabet Register operations:
Compare register value with current value Store current value in register
Transitions specify change of state, whether value is stored and movement of head.
4 of 47
Infinite Alphabets - Definitions
D : an infinite set (e.g. set of data values) D-string : w=d1dn s.t. di D dom(w) = {1,…,|w|} valw(i) = di for i dom(w) ⊳,⊲ D delimit input string 2-way automata work on w = ⊳v⊲ dom+(w) = {0,…,|w|+1} where: valw(0) = ⊳
valw(|w|+1) = ⊲
5 of 47
Nondeterministic 2-Way k-Register Automata (2N-RA)
A = D , Q , q0 , τ0 , , F D – infinite alphabet Q , q0 , F – as usual
τ0:{1,…,k} → D{⊳,⊲} – initial register assignment – transition function
Two types of transitions: (i,q) → ( p,d ) – current value = register i value
q → ( p,i,d ) – store current value in register i d {stay,right,left} – movement direction of head
6 of 47
Configurations
Configuration: γ = [ j , q , τ ]
Initial configuration: γ0 = [1,q0,τ0]
Accepting configuration: γf = [ j,qf,τ] , qf F
Head Position Current
State
Register Assignment
7 of 47
Computations
[ j, q, τ] [ j’, q’, τ’] iff:
(1) (i,q) → (q’,d) δ , j’ = j+d , valw( j) = τ(i) and τ = τ’
or
(2) q → (q’,i,d) δ , j’ = j+d and τ’= τ | τ(i) ← valw( j)
Note: Type 2 transition relevant only if no type 1 transition applies (why?)
w accepted by A iff there exists γf s.t. γ0 * γf
8 of 47
Variants
Deterministic: at most one transition applies to each configuration.
One way: no left moves in transition function.
xC-RA: denotation for various models Where x{1,2} and C{D,N}
9 of 47
Example 1: 1N-RA
L1={d1dn | i, j : ij di=dj}
contains all words where some value appears more than once
Construction idea: Read input string from left to right “Guess” i and store value in register Look for stored value in remaining input
10 of 47
Example 1: Continued…
A = D , {q0, q1, qf} , q0 , <#,#> , , {qf}
qf : Accepting
configuration reached!
q1 – look for j : Go right:
q1 → (1,q1) If found value,
move to qf :(2,q1) → qf
q0 - look for i : Go right:
q0 → (1,q0) Guess i, store
value, move to q1:q0 → (2,q1)
“Trash” register
Register for storing repeating
value
11 of 47
Example 1: Concluded
Example of run on w = 13234 …
1 3 2 3 4
# # q0→(1,q0)q0
1 3
#1 q0→(2,q1)
2
1 3 q13 q1→(1,q1)2
3
2 3 (2,q1)→qf
4
3 q f
W ACCEPTED!
12 of 47
Example 2: 2N-RA
L2={d1dn | i, j : ij → di dj}
contains all words with distinct values
Construction idea: Scan symbols from left to right. For each symbol:
Store value in register Look for stored value in remaining input If found reject Else proceed to check next symbol (how?)
13 of 47
Example 2: Continued…
A = D,{q0, q1, q2, q3, qrej, qacc}, q0 ,<⊲,#,⊳>,,{qacc}
(1,q0) → (q1,right)
q1 → (2,q2,right)
q2 → (1,q2,right)
(2,q2) → (qrej,stay)
(3,q2) → (q3,left)
q3 → (1,q3,left)
(2,q3) → (q1,right)
(3,q1) → (qacc,stay)
⊲ di dj ⊳⊲ di dj ⊳
⊲ # ⊳? didi
di
didi ⊳
14 of 47
Logic
Variants of first order and monadic second order logic over D-strings.
w represented by logical structure: Domain dom(w) with natural ordering < Value function val:dom(w)→D instantiated by valw
Atomic Formulae: x = y, x < y val(x) = val(y) val(x) = d for dD{⊳,⊲}
15 of 47
FO* and MSO*
The logic FO*
Atomic formulae Boolean connectives First order quantification over dom+(w)
The logic MSO*
FO*
Quantification over unary predicates on dom+(w)
16 of 47
FO* and MSO* Definability
L(φ):= {w D* | w φ}
For example…
What φ defines L1?
xy( x y val(x) = val(y)) What φ defines L2?
xy( x y → val(x) val(y))
17 of 47
RAs vs. MSO*
Theorem 3.1: 2D-RA MSO*
Proof: Consider the language L of strings u#v where the number of unique symbols appearing in u equals the number of unique symbols appearing in v.
Part 1: There exists a 2D-RA which accepts L. Part 2: L is not MSO* definable.
2D-RA MSO*
18 of 47
Proof: Preliminaries
Nu / Nv = the set of unique symbols in u / v
L={u#v | |Nu|= |Nv|}
lmow(d ) = leftmost occurrence of d in w
Nu={a1,…,an} and Nv={b1,…,bm} s.t. for every i < j, lmou(ai) < lmou(aj) and lmov(bi) < lmov(bj).
Note: u#v L iff n = m
19 of 47
Proof: Part 1 (L is 2N-RA)
Question: How can we build a 2D-RA for L? Basic concept:
Visit lmou(a1), lmov(b1), lmou(a2), … in order
If lmou(an) and lmov(bm) are reached simultaneously accept
Else rejectHow can we visit the lmo-s in order?
Finding lmou(a1), lmov(b1) is easy… (how?)
20 of 47
Proof: Part 1 Concluded
Assume ai is stored in a register. Compute lmou(ai+1) as follows: Move head to lmou(ai)
Go left until ⊳ Go right until ai (leftmost occurrence)
For positions lmou(ai)+j (start j=1) test if lmou(ai+1) Store value and proceed to move left If value is encountered then check next position (j++) Else, if ⊳ is reached then lmou(ai+1)= lmou(ai)+j
Similar for bi-s… Language accepted
21 of 47
Proof: Part 2 (L not MSO*)
Assume by contradiction that φ* is an MSO*
sentence s.t. u#vφ* iff |Nu|=|Nv|. Let C be the set of D-symbols appearing
in φ*. w is admissible iff: w is of the form u#v w contains no symbols from C Nu Nv = Each D-symbol occurs at most once in u or v
22 of 47
Proof: Part 2 Continued…
Let φ be φ* by replacing: val(x) = val(y) by x = y val(x) = d by false if d #
For every admissible string w=d1dn#e1em: an#am φ d1dn#e1em φ letters don’t matter in φ
d1dn#e1em φ* w has no letters from C n = m because all letters are different
φ is MSO
23 of 47
Proof: Part 2 Concluded
For every n, there exists an admissible string d1dn#e1en (why?)
For every n, an#an φ Note: φ is in MSO (no value comparisons) Define a formula for the form an#am:
ψ:= x(val(x)=# y(val(y)=a (val(y)=# y=x)))
L’={an#an | n} is MSO definable by φ ψ L’ is regular Contradiction!
24 of 47
2N-RA vs. FO*
Theorem 3.7: (weak version) FO* 2N-RA
Proof: Define a language L D* s.t:
Part 1: No 2N-RA can accept L.
Part 2: L is FO* definable.
FO* 2N-RA
25 of 47
Proof: Part 1 (L not 2N-RA)
Based on communication complexity methodology: Input string divided between two parties I and II Parties can send messages according to a
pre-defined protocol String is accepted if both parties accept Each party has unlimited computational power Restriction only on form of messages
26 of 47
Proof: Part 1 Continued…
We consider strings of the form u#v u,v encode sets of subsets of D L={u#v| u,v represent the same set of sets} Claim: L cannot be accepted by 2N-RAs Assume by contradiction that there exists a
2N-RA A s.t. L(A) = L We simulate A by defining an appropriate
protocol…
27 of 47
Proof: Part 1 Continued…
Define communication protocol as follows: I is given u while II is given v I simulates A until A tries to cross # to the right Sends configuration information to II II simulates A until A tries to cross # to the left Sends configuration information to I So on until one of the parties reaches an
accepting configuration or gets stuck. If A exists such a protocol will accept L
28 of 47
Proof: Part 1 Continued…
It remains to define an appropriate protocol… Restrict u#v to at most N data values Assume A has |Q| states and k registers M:=|Q|Nk different messages needed
Each message needs to be sent no more than once in each direction (why?)
At most M 2M different possible series of messages (dialogs) need to be considered
29 of 47
Proof: Part 1 Concluded
M 2M is exponential in N Number of sets of sets of N values is 22N
For large N, there exist u,v s.t: u#u and v#v are accepted by the same dialogue u,v represent different sets of sets
u#v is also accepted No such protocol can accept L No 2N-RA can accept L
30 of 47
Proof: Part 2 (L is FO*)
We show that L is FO* definable…
First we define an encoding for u,v: Assume $ not in D u,v of the form $d11dn1$d12dn2$$d1mdnm$
Each d1jdnj represent a subset of D-values
Goal: Define a formula verifying that every subset in u appears in v and vice versa.
31 of 47
Proof: Part 2 Continued…
We start with some smaller formulae… w is of the form u#v
form:= x(val(x) = # y(val(y) = # → y=x))
x is in the interval [y,z]
x[y,z]:= y < x x < z The interval [y,z] represents a subset
subs(y,z):= val(y)=$ val(z)=$ y < z x(x[y,z] → val(x)# val(x)
$)
32 of 47
Proof: Part 2 Continued…
Some more… The subset [y,z] is a subset of [y’,z’]
[y,z][y’,z’]:= x(x[y,z] → x’(x’[y’z’] val(x)=val(x’)))
The subset [y,z] equals the subset [y’,z’][y,z]=[y’,z’]:= [y,z][y’,z’] [y’,z’][y,z]
The subset [y,z] is in u[y,z]u:= sub(y,z) x(val(x)=# → z < x)
The subset [y,z] is in v[y,z]v:= sub(y,z) x(val(x)=# → x < y)
33 of 47
Proof: Part 2 Concluded
Two last formulae… Every subset in u appears in v
usubv:= yz([y,z]u → y’z’(([y’,z’]v [y,z]=[y’,z’]))
vsubu defined similarly And now to put it all together…
φ:= form usubv vsubu It follows that wφ iff wL
L is FO* definable.
34 of 47
Decision Problems
Kaminski and Francez showed that emptiness for 1N-RAs is decidable
And what of universality?
We will show that universality for 1N-RA is undecidable by reduction from a known undecidable problem, PCP.
35 of 47
Post Correspondence Problem
Introduced by Emil Post in 1946
Input: A sequence of pairs (x1,y1),…,(xn,yn) s.t. xi,yi {a,b}* for i=1,…,n
Solution: A set of indices α1,…, αm {1,…,n} s. t. xα1
xαm = yα1
yαm
Output: Does the given input instance have a solution.
36 of 47
PCP Example
Input:
Solution:
Index 1 2 3 4
x values a ba aa bb
y values aa ab b b
1 1 3 2 4
a a aa ba bb
aa aa b ab b
37 of 47
PCP Undecidability
PCP is known to be undecidable. Proof sketch: Reduction from Lu:
Given a Turing Machine M and a word w Define PCP instance P based on M and w s.t. P
has a solution iff M accepts w A solution for P encodes a run of M on w x-series is always ‘one step ahead’ of y-series y series can ‘catch up’ only if computation in x
series reaches an accepting state
38 of 47
PCP Undecidability Continued
Start computation:
Encode transitions:
Add instance pairs of the following forms:
Copy symbols:
qacc ‘eats’ symbols:
#
#q0w#
qia
bqj
aqib
qjac
a
a
qacc#
#
aqacc
qacc
qacca
qacc
39 of 47
Undecidability of Universality
Theorem 5.1: It is undecidable whether a given 1N-RA is universal.
Proof: For a given PCP instance P, construct a 1N-RA A s.t. A accepts an input string iff it does not represent a solution for P. P has no solution iff A is universal Decidability of universality leads to decidability of PCP Universality of 1N-RA is undecidable
40 of 47
PCP Encoding
Assume w.l.g. that Sym={1,…,n,a,b,#,$}D Candidate: a string u#v s.t:
u encodes xα1 , … , xαm
v encodes yβ1 , … , yβl
Candidate is a solution if: l = m αi = βi
xα1xαm
= yα1yαm
Matching pairs
41 of 47
PCP Encoding Continued
xαj encoding: $ γ αj δ1 a1 δk ak
$ acts as separator γ represents j by a unique value αj 1,…,m
δi encode positions in the word γ and δ values appear only once in u / v xαj
= a1ak
yβj encoded similarly
42 of 47
PCP Encoding Example
Index 1 2 3 4
x values a ba aa bb
y values aa ab b b
1 1 2 1 3 3 4 2 5 4
a a aa ba bb
aa aa b ab b
$ 1 1 1 a $ 2 1 2 a $ 3 3 3 a 4 a
$ 4 2 5 b 6 a $ 5 4 7 b 8 b
#
$ 1 1 1 a 2 a $ 2 1 3 a 4 a $ 3 3 5 b
$ 4 2 6 a 7 b $ 5 4 8 b
43 of 47
PCP Encoding Continued
u#v is syntactically correct if: γ-projection of u = γ-projection of v δ-projection of u = δ-projection of v
u#v represents a solution if: u#v is syntactically correct For each γ, the number to the right of γ is the
same in u and in v For each δ, the symbol to the right of δ is the
same in u and in v
44 of 47
Construction of A
Assume the values of Sym are stored in the initial register assignment
A works as follows: “Guesses” why w is not a valid solution Checks whether w meets the chosen criteria If yes, accepts Else rejects
w has an accepting computation w meets some criteria for being “wrong” w is not a solution for the PCP instance
45 of 47
When is w “wrong”
w is of the wrong form: w u#v u or v ($γαδ…)*
xi a1ak or yj a1ak in u or v
γ-projections are wrong: First / last γ in u first / last γ in v Two γ’s are the same in u / v γ1 and γ2 are successors in u but not in v
46 of 47
When is w “wrong” Concluded
δ projections are wrong: Similar to γ-projections
w does not represent a solution: The α-value for some γ in u is different than the
corresponding β-value in v The a- / b-value for some δ in u is different than
the corresponding a- / b-value in v
47 of 47
Equivalence and Inclusion
Corollary 5.2: Equivalence of 1N-RAs is undecidable.
Proof: Assume equivalence was decidable Build an Automaton AD* that accepts every
possible input word Universality is decidable by checking
equivalence to AD* Contradiction!
Corollary: Inclusion is also undecidable.
Top Related