Finite State Machines for Strings over Infinite Alphabets
description
Transcript of Finite State Machines for Strings over Infinite Alphabets
Finite State Machines for Strings over Infinite Alphabets
F. Neven, T. Schwentick and V. Vianu
Automata Seminar - Spring 2007Tamar Aizikowitz
ACM Transactions on Computational Logic, Vol. V No. N, 01/03
2 of 47
Finite Machine for Infinite Alphabet?
Finite automaton: Transitions based on current state and input value δ defined for QΣ
Infinite alphabet infinite transition function? Solution:
Store a finite number of values Transitions based on stored values New values can be stored during computation
3 of 47
Register Automata
Suggested by Kaminski and Francez, 1994 Finite automata + finite number of registers Registers store values from alphabet Register operations:
Compare register value with current value Store current value in register
Transitions specify change of state, whether value is stored and movement of head.
4 of 47
Infinite Alphabets - Definitions
D : an infinite set (e.g. set of data values) D-string : w=d1dn s.t. di D dom(w) = {1,…,|w|} valw(i) = di for i dom(w) ⊳,⊲ D delimit input string 2-way automata work on w = ⊳v⊲ dom+(w) = {0,…,|w|+1} where: valw(0) = ⊳
valw(|w|+1) = ⊲
5 of 47
Nondeterministic 2-Way k-Register Automata (2N-RA)
A = D , Q , q0 , τ0 , , F D – infinite alphabet Q , q0 , F – as usual
τ0:{1,…,k} → D{⊳,⊲} – initial register assignment – transition function
Two types of transitions: (i,q) → ( p,d ) – current value = register i value
q → ( p,i,d ) – store current value in register i d {stay,right,left} – movement direction of head
6 of 47
Configurations
Configuration: γ = [ j , q , τ ]
Initial configuration: γ0 = [1,q0,τ0]
Accepting configuration: γf = [ j,qf,τ] , qf F
Head Position Current
State
Register Assignment
7 of 47
Computations
[ j, q, τ] [ j’, q’, τ’] iff:
(1) (i,q) → (q’,d) δ , j’ = j+d , valw( j) = τ(i) and τ = τ’
or
(2) q → (q’,i,d) δ , j’ = j+d and τ’= τ | τ(i) ← valw( j)
Note: Type 2 transition relevant only if no type 1 transition applies (why?)
w accepted by A iff there exists γf s.t. γ0 * γf
8 of 47
Variants
Deterministic: at most one transition applies to each configuration.
One way: no left moves in transition function.
xC-RA: denotation for various models Where x{1,2} and C{D,N}
9 of 47
Example 1: 1N-RA
L1={d1dn | i, j : ij di=dj}
contains all words where some value appears more than once
Construction idea: Read input string from left to right “Guess” i and store value in register Look for stored value in remaining input
10 of 47
Example 1: Continued…
A = D , {q0, q1, qf} , q0 , <#,#> , , {qf}
qf : Accepting
configuration reached!
q1 – look for j : Go right:
q1 → (1,q1) If found value,
move to qf :(2,q1) → qf
q0 - look for i : Go right:
q0 → (1,q0) Guess i, store
value, move to q1:q0 → (2,q1)
“Trash” register
Register for storing repeating
value
11 of 47
Example 1: Concluded
Example of run on w = 13234 …
1 3 2 3 4
# # q0→(1,q0)q0
1 3
#1 q0→(2,q1)
2
1 3 q13 q1→(1,q1)2
3
2 3 (2,q1)→qf
4
3 q f
W ACCEPTED!
12 of 47
Example 2: 2N-RA
L2={d1dn | i, j : ij → di dj}
contains all words with distinct values
Construction idea: Scan symbols from left to right. For each symbol:
Store value in register Look for stored value in remaining input If found reject Else proceed to check next symbol (how?)
13 of 47
Example 2: Continued…
A = D,{q0, q1, q2, q3, qrej, qacc}, q0 ,<⊲,#,⊳>,,{qacc}
(1,q0) → (q1,right)
q1 → (2,q2,right)
q2 → (1,q2,right)
(2,q2) → (qrej,stay)
(3,q2) → (q3,left)
q3 → (1,q3,left)
(2,q3) → (q1,right)
(3,q1) → (qacc,stay)
⊲ di dj ⊳⊲ di dj ⊳
⊲ # ⊳? didi
di
didi ⊳
14 of 47
Logic
Variants of first order and monadic second order logic over D-strings.
w represented by logical structure: Domain dom(w) with natural ordering < Value function val:dom(w)→D instantiated by valw
Atomic Formulae: x = y, x < y val(x) = val(y) val(x) = d for dD{⊳,⊲}
15 of 47
FO* and MSO*
The logic FO*
Atomic formulae Boolean connectives First order quantification over dom+(w)
The logic MSO*
FO*
Quantification over unary predicates on dom+(w)
16 of 47
FO* and MSO* Definability
L(φ):= {w D* | w φ}
For example…
What φ defines L1?
xy( x y val(x) = val(y)) What φ defines L2?
xy( x y → val(x) val(y))
17 of 47
RAs vs. MSO*
Theorem 3.1: 2D-RA MSO*
Proof: Consider the language L of strings u#v where the number of unique symbols appearing in u equals the number of unique symbols appearing in v.
Part 1: There exists a 2D-RA which accepts L. Part 2: L is not MSO* definable.
2D-RA MSO*
18 of 47
Proof: Preliminaries
Nu / Nv = the set of unique symbols in u / v
L={u#v | |Nu|= |Nv|}
lmow(d ) = leftmost occurrence of d in w
Nu={a1,…,an} and Nv={b1,…,bm} s.t. for every i < j, lmou(ai) < lmou(aj) and lmov(bi) < lmov(bj).
Note: u#v L iff n = m
19 of 47
Proof: Part 1 (L is 2N-RA)
Question: How can we build a 2D-RA for L? Basic concept:
Visit lmou(a1), lmov(b1), lmou(a2), … in order
If lmou(an) and lmov(bm) are reached simultaneously accept
Else rejectHow can we visit the lmo-s in order?
Finding lmou(a1), lmov(b1) is easy… (how?)
20 of 47
Proof: Part 1 Concluded
Assume ai is stored in a register. Compute lmou(ai+1) as follows: Move head to lmou(ai)
Go left until ⊳ Go right until ai (leftmost occurrence)
For positions lmou(ai)+j (start j=1) test if lmou(ai+1) Store value and proceed to move left If value is encountered then check next position (j++) Else, if ⊳ is reached then lmou(ai+1)= lmou(ai)+j
Similar for bi-s… Language accepted
21 of 47
Proof: Part 2 (L not MSO*)
Assume by contradiction that φ* is an MSO*
sentence s.t. u#vφ* iff |Nu|=|Nv|. Let C be the set of D-symbols appearing
in φ*. w is admissible iff: w is of the form u#v w contains no symbols from C Nu Nv = Each D-symbol occurs at most once in u or v
22 of 47
Proof: Part 2 Continued…
Let φ be φ* by replacing: val(x) = val(y) by x = y val(x) = d by false if d #
For every admissible string w=d1dn#e1em: an#am φ d1dn#e1em φ letters don’t matter in φ
d1dn#e1em φ* w has no letters from C n = m because all letters are different
φ is MSO
23 of 47
Proof: Part 2 Concluded
For every n, there exists an admissible string d1dn#e1en (why?)
For every n, an#an φ Note: φ is in MSO (no value comparisons) Define a formula for the form an#am:
ψ:= x(val(x)=# y(val(y)=a (val(y)=# y=x)))
L’={an#an | n} is MSO definable by φ ψ L’ is regular Contradiction!
24 of 47
2N-RA vs. FO*
Theorem 3.7: (weak version) FO* 2N-RA
Proof: Define a language L D* s.t:
Part 1: No 2N-RA can accept L.
Part 2: L is FO* definable.
FO* 2N-RA
25 of 47
Proof: Part 1 (L not 2N-RA)
Based on communication complexity methodology: Input string divided between two parties I and II Parties can send messages according to a
pre-defined protocol String is accepted if both parties accept Each party has unlimited computational power Restriction only on form of messages
26 of 47
Proof: Part 1 Continued…
We consider strings of the form u#v u,v encode sets of subsets of D L={u#v| u,v represent the same set of sets} Claim: L cannot be accepted by 2N-RAs Assume by contradiction that there exists a
2N-RA A s.t. L(A) = L We simulate A by defining an appropriate
protocol…
27 of 47
Proof: Part 1 Continued…
Define communication protocol as follows: I is given u while II is given v I simulates A until A tries to cross # to the right Sends configuration information to II II simulates A until A tries to cross # to the left Sends configuration information to I So on until one of the parties reaches an
accepting configuration or gets stuck. If A exists such a protocol will accept L
28 of 47
Proof: Part 1 Continued…
It remains to define an appropriate protocol… Restrict u#v to at most N data values Assume A has |Q| states and k registers M:=|Q|Nk different messages needed
Each message needs to be sent no more than once in each direction (why?)
At most M 2M different possible series of messages (dialogs) need to be considered
29 of 47
Proof: Part 1 Concluded
M 2M is exponential in N Number of sets of sets of N values is 22N
For large N, there exist u,v s.t: u#u and v#v are accepted by the same dialogue u,v represent different sets of sets
u#v is also accepted No such protocol can accept L No 2N-RA can accept L
30 of 47
Proof: Part 2 (L is FO*)
We show that L is FO* definable…
First we define an encoding for u,v: Assume $ not in D u,v of the form $d11dn1$d12dn2$$d1mdnm$
Each d1jdnj represent a subset of D-values
Goal: Define a formula verifying that every subset in u appears in v and vice versa.
31 of 47
Proof: Part 2 Continued…
We start with some smaller formulae… w is of the form u#v
form:= x(val(x) = # y(val(y) = # → y=x))
x is in the interval [y,z]
x[y,z]:= y < x x < z The interval [y,z] represents a subset
subs(y,z):= val(y)=$ val(z)=$ y < z x(x[y,z] → val(x)# val(x)
$)
32 of 47
Proof: Part 2 Continued…
Some more… The subset [y,z] is a subset of [y’,z’]
[y,z][y’,z’]:= x(x[y,z] → x’(x’[y’z’] val(x)=val(x’)))
The subset [y,z] equals the subset [y’,z’][y,z]=[y’,z’]:= [y,z][y’,z’] [y’,z’][y,z]
The subset [y,z] is in u[y,z]u:= sub(y,z) x(val(x)=# → z < x)
The subset [y,z] is in v[y,z]v:= sub(y,z) x(val(x)=# → x < y)
33 of 47
Proof: Part 2 Concluded
Two last formulae… Every subset in u appears in v
usubv:= yz([y,z]u → y’z’(([y’,z’]v [y,z]=[y’,z’]))
vsubu defined similarly And now to put it all together…
φ:= form usubv vsubu It follows that wφ iff wL
L is FO* definable.
34 of 47
Decision Problems
Kaminski and Francez showed that emptiness for 1N-RAs is decidable
And what of universality?
We will show that universality for 1N-RA is undecidable by reduction from a known undecidable problem, PCP.
35 of 47
Post Correspondence Problem
Introduced by Emil Post in 1946
Input: A sequence of pairs (x1,y1),…,(xn,yn) s.t. xi,yi {a,b}* for i=1,…,n
Solution: A set of indices α1,…, αm {1,…,n} s. t. xα1
xαm = yα1
yαm
Output: Does the given input instance have a solution.
36 of 47
PCP Example
Input:
Solution:
Index 1 2 3 4
x values a ba aa bb
y values aa ab b b
1 1 3 2 4
a a aa ba bb
aa aa b ab b
37 of 47
PCP Undecidability
PCP is known to be undecidable. Proof sketch: Reduction from Lu:
Given a Turing Machine M and a word w Define PCP instance P based on M and w s.t. P
has a solution iff M accepts w A solution for P encodes a run of M on w x-series is always ‘one step ahead’ of y-series y series can ‘catch up’ only if computation in x
series reaches an accepting state
38 of 47
PCP Undecidability Continued
Start computation:
Encode transitions:
Add instance pairs of the following forms:
Copy symbols:
qacc ‘eats’ symbols:
#
#q0w#
qia
bqj
aqib
qjac
a
a
qacc#
#
aqacc
qacc
qacca
qacc
39 of 47
Undecidability of Universality
Theorem 5.1: It is undecidable whether a given 1N-RA is universal.
Proof: For a given PCP instance P, construct a 1N-RA A s.t. A accepts an input string iff it does not represent a solution for P. P has no solution iff A is universal Decidability of universality leads to decidability of PCP Universality of 1N-RA is undecidable
40 of 47
PCP Encoding
Assume w.l.g. that Sym={1,…,n,a,b,#,$}D Candidate: a string u#v s.t:
u encodes xα1 , … , xαm
v encodes yβ1 , … , yβl
Candidate is a solution if: l = m αi = βi
xα1xαm
= yα1yαm
Matching pairs
41 of 47
PCP Encoding Continued
xαj encoding: $ γ αj δ1 a1 δk ak
$ acts as separator γ represents j by a unique value αj 1,…,m
δi encode positions in the word γ and δ values appear only once in u / v xαj
= a1ak
yβj encoded similarly
42 of 47
PCP Encoding Example
Index 1 2 3 4
x values a ba aa bb
y values aa ab b b
1 1 2 1 3 3 4 2 5 4
a a aa ba bb
aa aa b ab b
$ 1 1 1 a $ 2 1 2 a $ 3 3 3 a 4 a
$ 4 2 5 b 6 a $ 5 4 7 b 8 b
#
$ 1 1 1 a 2 a $ 2 1 3 a 4 a $ 3 3 5 b
$ 4 2 6 a 7 b $ 5 4 8 b
43 of 47
PCP Encoding Continued
u#v is syntactically correct if: γ-projection of u = γ-projection of v δ-projection of u = δ-projection of v
u#v represents a solution if: u#v is syntactically correct For each γ, the number to the right of γ is the
same in u and in v For each δ, the symbol to the right of δ is the
same in u and in v
44 of 47
Construction of A
Assume the values of Sym are stored in the initial register assignment
A works as follows: “Guesses” why w is not a valid solution Checks whether w meets the chosen criteria If yes, accepts Else rejects
w has an accepting computation w meets some criteria for being “wrong” w is not a solution for the PCP instance
45 of 47
When is w “wrong”
w is of the wrong form: w u#v u or v ($γαδ…)*
xi a1ak or yj a1ak in u or v
γ-projections are wrong: First / last γ in u first / last γ in v Two γ’s are the same in u / v γ1 and γ2 are successors in u but not in v
46 of 47
When is w “wrong” Concluded
δ projections are wrong: Similar to γ-projections
w does not represent a solution: The α-value for some γ in u is different than the
corresponding β-value in v The a- / b-value for some δ in u is different than
the corresponding a- / b-value in v
47 of 47
Equivalence and Inclusion
Corollary 5.2: Equivalence of 1N-RAs is undecidable.
Proof: Assume equivalence was decidable Build an Automaton AD* that accepts every
possible input word Universality is decidable by checking
equivalence to AD* Contradiction!
Corollary: Inclusion is also undecidable.