The Complexity of Tree Transducer Output Languages

www.kmonos.net

The Complexity of Tree Transducer Output Languages

NII Logic Seminar 2009/04/15

Kazuhiro Inaba @ NIIJoint work with Sebastian Maneth @ NICTA&UNSW

www.kmonos.net

2

Table of ContentsOverview

The Problem StatementKey Lemma: “Garbage-Free Form”Results and Applications

The Proof

(Other Topics in my Thesis…?)

www.kmonos.net

3

PreliminariesΣ, Δ, Γ, …: Ranked Finite Set

Each Symbol in Σ, Δ is associated with a natural number called “rank” of the symbol

TΣ = The set of Trees over ΣTΣ ::= σ(TΣ, …, TΣ) with rank(σ) = k

Example:Σ={a(2),b(1),c(0)}, a(b(c), c)∈ TΣ

www.kmonos.net

4

Preliminariesτ ⊆ TΣ×TΔ is called a (tree-to-tree)

translation

For τ1∈ TΣ×TΓ, τ2∈ TΓ×TΔ,the sequential composition is defined as follows: τ1 ； τ2 = {(s, t) | (s, r)∈τ1, (r, t)∈τ2 }

www.kmonos.net

“Complexity of Output Languages”

Given…A tree-to-tree translation τ ⊆ TΣ×TΔ

How complex is the set range(τ) ⊆ TΔ ?

(i.e., for a tree t ∈ TΔ, how is it computationally hard to determine whether t ∈range(τ) or not ?)

www.kmonos.net

Classic Results τ: Program of the Turing-Machine

Undecidable

τ: Nondeterministic Finite-State String Transduction range(τ) is regular!

The membership of range(τ) is solved in O(n) time, O(1) space

So for τ∈ Finitely Many Compositionsof Nondet-FST

b/bb

a/aa/b/b

www.kmonos.net

7

Our Result

What is Macro Tree Transducers? A relatively powerful (yet terminating)

model of tree translation The formal definition will be explained

soon…

τ: Composition of Nondeterministic Macro Tree Transducers the membership problem of range(τ)

is NP-complete and in DSPACE(n).

www.kmonos.net

8

Actually, we’ve shownthe “Garbage-Free Form”

For any composition τ = τ1;τ2; … ;τn ⊆ TΣ×TΔ of MTTs, there exists an equivalent one τ = ρ0 ; ρ1 ; … ; ρ2n ,such thatdom(ρ0)=TΣ and range(ρ0) is

regularFor any (s0,t)∈τ, there’s s1,..,s2n s.t.

(si,si+1) ∈ρi, (s2n,t)∈ρ2n, |si+1|≦ 2|si+2|, |s2n|≦2|t|

www.kmonos.net

9

Actually, we’ve shownthe “Garbage-Free Form”

In each step of translation (except the first step), the size of the tree always becomes larger(Ignoring the constant factor “2”)

τ1 τ2 τk

ts0 s1

s2Sk-1

ρ0 ρ1 ρ2n

ts1 s2 S2ns0

www.kmonos.net

10

(Possible)Applications in Practice

Compositions of MTTs are known to be a good model for XML translationsMTT* can represent good subclasses of

XML-QL, XSLT, XQuery, …

Query Optimization by GFFVerification by the range(MTT*)

membership

www.kmonos.net

11

Corollaries of GFF(Applications in Theory)

range(τ1;τ2; … ;τn ) is in NP and in DSPACE(n)

Higher-order context free languages are context-sensitive languagesDetails are in the next slide…

www.kmonos.net

12

Chomsky Hierarchy[Chomsky1959]

Systems of Finite Description of String Languages

Type-0 GrammarG=<N, Σ, S, R> where

S ∈ NR is a set of rewrite rules of the form

α → β with α,β ∈ (N∪Σ)*[G] = {w∈Σ* | S→*w}

www.kmonos.net

13

Chomsky Hierarchy and Complexity

Type-0 (Computable Language) Type-1 (Context-Sensitive Language)

αAβ → αγβ A∈N, γ≠empty Context-senstive iff recognizable in NSPACE(n)

[Landweber63, Kuroda64] Type-2 (Context-Free)

A→α Proper Subclass of PTIME [?]

Type-3 (Regular) A→sB or A→s A,B∈N, s∈Σ Regular iff recognizable in DSPACE(1) [Folklore]

www.kmonos.net

14

Context-Free Grammar= Level-0 Grammar

Example of Type-2 Grammar:S → 0 S 0, S→1 S 1, S→0, S→1Palindromes over 0s and 1s of odd

lengthA → α

Can be regarded as a nullary (nondeterministic) function whose output is strings

www.kmonos.net

15

Natural Extension:Macro Grammar

[Fischer68]= Level-1 GrammarS→A(0), A(x) → A(xx), A(x)→x

Sequence of 0s of length 2n

Nonterminals can be parameterized by strings, i.e., they’re nondeterministic function from strings to strings

[Aho68] Level-1 languages are context-sensitive(=NSPACE(n))

www.kmonos.net

16

Natural Extenstion:Higher-order grammar

[Damm82]Level-n Grammar is n-th order

grammar, in which the nonterminals are parameterized by at most (n-1)-th order entitiesB(f,x) → B(λy.f(f(y)), xx), B(f,x)→f(1x1)S → B(λy.yy, 0)

2n Repetition of “102^n1” = Simply-Typed λ Caluculus + Strings as

the Base Type + Nondeterminism

www.kmonos.net

17

Complexity?Are Level-2, Level-3, … languages

context-sensitive (=NSPACE(n)), or do they go beyond?

[Damm82] Decidable[Maneth02] Call-by-Value Level-n

Languages are in DSPACE(n)[This Work] Call-by-Name Level-n

Languages are in DSPACE(n)!

www.kmonos.net

18

Level-n Languages and MTTs [Damm82]

For any G : CbN Level-n grammar, there exists n+1 composition of CbN MTTs s.t. [G] = range(τ1;τ2; … ;τn )Intuition: MTTs can carry out

substitutionHence, giving a complexity

upperbound for range(MTT*) also gives the upperbound for Level-n languages

www.kmonos.net

Context-Free Language

Level-1 Language＝ Macro Language

Level-2 Language

Level-n Language

range(MTT*)

CFG where nonterminals are parameterized by other nonterminals[Aho68] ⊆NSPACE(n)[Rounds73] NP-complete

Context-Sensitive Language

[Kuroda64]= NSPACE(n)

[CYK, Earley, …] ⊆PTIME

Level-n Language [Damm82][Our Work]⊆ DSPACE(n)NP-complete

Level-3 Language

www.kmonos.net

20

Brief Introduction toMacro Tree Transducers (MTTs)

www.kmonos.net

MTT

An MTT M = (Q, start, Σ, Δ, R) is a set of first-order functions of type Tree(Σ) × Tree(Δ)k Tree(Δ)

Defined by mutual induction on the1st parameter (the input tree) Application in the right-hand side is restricted only to the

direct children of the current node Can take output tree fragments via other parameters, but

not allowed to inspect or decompose them

start( A(x1) ) → double( x1, double(x1, E) )

double( A(x1), y1) → double( x1, double(x1, y1) )double( B, y1 ) → F( y1, y1 )double( B, y1 ) → G( y1, y1 )

RHS ::= F( RHS, … , RHS ) | q(xi, RHS, …, RHS) | yi

Nondeterminism!

www.kmonos.net

22

Evaluation StrategyTwo Evaluation Strategies

(analogous to the λ-calculus…)Call-by-Value (IO : Inside-Out)Call-by-Name (OI : Outside-In)

www.kmonos.net

Example (IO / call-by-value)

start(A(B)) double( B, double(B, E) ) double( B, F(E, E) )

F( F(E,E), F(E,E) ) or G( F(E,E), F(E,E) ) or double( B, G(E, E) )

F( G(E, E), G(E, E) ) or G( G(E, E), G(E, E) )


www.kmonos.net

Example (OI / call-by-name)

start(A(B)) double( B, double(B, E) ) F( double(B, E), double(B, E) ) F( F(E,E), double(B, E) ) F( F(E,E), F(E,E) )

F( F(E,E), G(E,E) ) !! F( G(E,E), double(B, E) ) F( G(E,E), F(E,E) ) !!

F( G(E,E), G(E,E) ) G( double(B, E), double(B, E) ) G( F(E,E), double(B, E) ) G( F(E,E), F(E,E) )

G( F(E,E), G(E,E) ) !! G( G(E,E), double(B, E) ) G( G(E,E), F(E,E) ) !!

G( G(E,E), G(E,E) )


www.kmonos.net

25

Evaluation StrategyTwo Strategies

(analogous to the λ-calculus…)Call-by-Value (IO : Inside-Out)Call-by-Name (OI : Outside-In)

Today, we consider OI evaluation only.Why?

MTTIO* = MTTOI* (even though MTTIO ≠ MTTOI)

MTTOI has better compositionality (as shown later)

www.kmonos.net

26

Main Result:DSPACE(n) Membership of range(MTT*) ── or, how to apply the GFF to the range complexity

www.kmonos.net

≪Approach: Generate & Test≫ Guess the input s0and all the intermediate trees s1, …, sn-1 Check whether

(s,s1)∈τ1, (s1,s2)∈τ2, …, (sn-1, t) ∈τn If it is, then t is in the output language! Otherwise, try another s, s2, …, sn-1

τ1 τ2 τn

ts0 s1 s2 Sn-1∈ ?∈ ? ∈ ?

www.kmonos.net

In order to carry out the algorithm in DSPACE(|t|) … The sizes |s0|, |s1|, |s2|, …, |sn| must be linearly bounded

by |t| i.e., there must be a constant c independent from t

such that |s| ≦ c|t| Each step to test the “translation membership” of τi

must be done in DSPACE(n)

“Garbage Free-Form” assures this property!

≪Approach: Generate & Test ≫

τ1 τ2 τn

ts0 s1 s2 Sn-1∈ ?∈ ? ∈ ?

www.kmonos.net

29

[Review] “Garbage-Free Form”

For any composition τ = τ1;τ2; … ;τn ⊆ TΣ×TΔ of MTTs, there exists an equivalent one τ = ρ0 ; ρ1 ; … ; ρ2n ,such thatdom(ρ0)=TΣ and range(ρ0) is

regularFor any (s0,t)∈τ, there’s s1,..,s2n s.t.

(si,si+1) ∈ρi, (s2n,t)∈ρ2n, |si+1|≦ 2|si+2|, |s2n|≦2|t|

www.kmonos.net

30

[Review] “Garbage-Free Form”

In each step of translation (except the first step), the size of the tree always becomes larger(Ignoring the constant factor “2”)

τ1 τ2 τk

ts0 s1

s2Sk-1

ρ0 ρ1 ρ2n

ts1 s2 S2ns0

www.kmonos.net

31

“Translation Membership”Let τ be a MTT.

For trees s and t, can we check whether (s,t)∈τ or not within O(|s|+|t|) space?

The answer is Yes, but it is hard to prove it directly.Let us consider subclasses of MTTs…

www.kmonos.net

32

In Search of a Good Subclass…

[Engelfriet&Vogler85] MTT ⊆ T ; LMTT

Hence, range(MTT*) = range((T∪LMTT)*)T : MTTs without accumulating params.LMTT: “Linear” MTTs, each input

variable occurs at most once in rhs.T and LMTT is weak enough to show

the DSPACE(n) translation memshipBut, too weak to have the garbage-

free form

www.kmonos.net

33

Our Idea : Path-linear MTT Introduce a new

class “PLMTT” T∪LMTT ⊆ PLMTT

Still weak enough to show the DSPACE(n) translation memshp

Strong enough to have the GFF, as will be shown

“Path-linear” = on a path of nested application, each variable occurs linearly

Linear: f(x1, g(x2), h(x3))

Path-linear(but not linear): f(x1, g(x2), h(x2))

Not path-linear: f(x1, g(x1), h(x2))

www.kmonos.net

34

Linear Space“Translation Membership”

Lemma: Let τ ∈ PLMTT. For trees s and t, we can check whether (s, t)∈τ or not within O(|s|+|t|) space.

Proof: Basically, try all nondeterministic computation by backtracking. (Some clever stack-sharing is required)Path-linearity assures the length of the

backtracking stack is O(|s|).

www.kmonos.net

35

Key Theorem:the Garbage-Free Form of (PL)MTT*

www.kmonos.net

36

Garbage-Free = No-deletion In each step of translation (except

the first step), the size of the tree always becomes larger

τ1 τ2 τk

ts0 s1

s2Sk-1

ρ0 ρ1 ρ2n

ts1 s2 S2ns0

www.kmonos.net

Proof Sketch ofthe Garbage-Free Form

“Factor out” the deletion

τ1 ; τ2 == τ1 ; (D ; ρ2)

== (τ1 ; D) ; ρ2

== τ’1 ; ρ2

Decompose τ2

to ‘deleting part’ Dand ‘nondeleting’ τ’2

Associativity

Compose τ1 with D(Right-Compositionality

of OI MTTs)

www.kmonos.net

Three Types of Deletion “Input-Deletion”

E.g., f( A(x1, x2) ) B( f(x1) ) Discarding the “x2” subtree!

“Skipping” E.g., f( A(x1) ) f(x1) No output is generated at the unary node A.

“Erasure” E.g., f( L, y1 ) y1 (Mainly at leaf nodes) No new output symbol is

generated at the node.

www.kmonos.net

39

Three Types of Deletion If there is no input-deletion,

skipping, and erasure during the computation,|in| ≦ 2|out|

Intuition: No input-deletion visits all nodes No skipping outputs at least one node for each

input unary node No erasure outputs at least one node for each input

leaf node For any tree, 2*(#unary + #leaf) ≧ #nodes

(cf., the number of matches in a knockout tournament)

www.kmonos.net

40

How to eliminate“erasing” rules

Look-ahead + Inline-ExpansionDecompose τ into τ = E ; τne

E : Bottom-up translation that annotates each input tree with

τne: τ without erasing rules

www.kmonos.net

41

How to eliminate“erasing” rules : τ = E ; τneExample

A

B

C

B

C

C

τ: f( C, y1 ) → y1

g( B(x1,x2) ) → X( f(x1, Y) )

Applying f to the 1st child invokes

erasure…

A

B

C

B

C

CApplying f to the 1st

child invokes erasure…

E

www.kmonos.net

42

How to eliminate“erasing” rules : τ = E ; τne

τne: f( C, y1 ) → y1

g( B (x1,x2) ) → X( Y )Applying f to the 1st


A

B

C

B

C

C

Applying f to the 1st child invokes

erasure…

A

B

C

B

C

CApplying f to the 1st


E

www.kmonos.net

43

How to eliminate“erasing” rules : τ = E ; τneMore complex case

τ: f( C, y1 ) → y1

h( B(x1,x2), y1 ) → f(x1, y1)A

B

C

B

C

C

f(1st) erasureh(2nd) erasure

A

B

C

B

C

C f(1st) erasureE

h(1st) erasure

www.kmonos.net

44

Problem!“Inline-Expansion” is not always

possible for nondeterministic MTTsτ: f( C, y1, y2 ) y1

f( C, y1, y2 ) y2

g( B (x1,x2) ) h( x1, f(x1, Y, Z) )

h( C, y1 ) X( y1, y1 )τne: g( B (x1,x2) ) h( x1, Y ) g( B (x1,x2) ) h( x1, Z ) h( C, y1 ) X( y1, y1 )

er…

er…er…

DifferentTranslation!

www.kmonos.net

45

Solution!Extend MTTs with “inline-

nondeterminism”τ: f( C, y1, y2 ) y1

f( C, y1, y2 ) y2

g( B (x1,x2) ) h( x1, f(x1, Y, Z) )

h( C, y1 ) X( y1, y1 )

τne: g( B (x1,x2) ) h( x1, +(Y,Z) ) h( C, y1 ) X( y1, y1 )

er…

er…SameTranslation!

www.kmonos.net

Solution:“MTT with Choice and

Failure”MTT + inline-nondeterminism

+ inline-partiality

Same expressiveness: MTT = MTTCFMuch more flexible syntax

Allows inline-expansion for free!

RHS ::= F( RHS, … , RHS ) | q(xi, RHS, …, RHS) | yi

| +(RHS, RHS) | θ

www.kmonos.net

47

Note on Path-linearity Inline-Expansion

…does not preserve linearity.…does preserve path-linearity.

This is the reason for using PLMTTs.

τ: f( C, y1 ) X( y1, y1 ) g( B(x1,x2) ) f( x1, h(x2, Y) )

τne: g( B(x1,x2) ) X( h(x2, Y), h(x2, Y) )

www.kmonos.net

48

How to eliminate“Input-Deletion”: τne = I ; τnei

Exhaustively trying all deletion by using nondeterminism

A

B

C

B

C

CI

A0 A1

B00

A1

B10

CA0

B01

C

B11

C

or or or

or…

www.kmonos.net

49

How to eliminate“Input-Deletion”: τne = I ; τnei

Exampleτne: f( B(x1,x2) ) g( x1, g(x1, Y) ) f( B(x1,x2) ) g( x1, g(x2, Y) ) f( B(x1,x2) ) g( x2, g(x1, Y) )

τnei: f( B10(x1) ) g( x1, g(x1, Y) ) f( B10(x1) ) g( x1, θ ) f( B10(x1) ) g( x2, g(x1, Y) ) …

www.kmonos.net

50

How to eliminate“Skipping”: τnei = S ; τneis

Same as for the “input-deletion”Try all possible deletion

nondeterminisitically…A

B

C

BS

A

CBBCABB

BA

CB

oror or...

www.kmonos.net

[Review] Proof Sketch ofthe Garbage-Free Form

“Factor out” the deletion

τ1 ; τ2 == τ1 ; (E;I;S ; ρ2)

== (τ1 ; E;I;S) ; ρ2

== τ’1 ; ρ2

Decompose τ2

to ‘deleting part’ E;I;Sand ‘nondeleting’ τ’2

Associativity

Compose τ1 with D(Right-Compositionality of OI MTTs with linear top-down transducers)

www.kmonos.net

Summary Membership problem of range(MTT*) is in

DSPACE(n) Note: Almost the same proof shows that it is in

NP Note: NP-hardness was already known

[Rounds73]Proof is by

Garbage-Free FormFor GFF, we needed a class that is robust

w.r.t. the inline-expansionTranslation Membership

For TrnMem, we needed a class with a certain kind of linearity

“Path-linear MTT with Choice and Failure”!

www.kmonos.net

53

Related Workrange(T*) is in DSPACE(n)

　　　　　　　　　　　　　　　　　　 [Baker1978]T : the class of translations realizable

by MTTs with no accumulating params.

range(DtMTT*) is in DSPACE(n) [Maneth2002]DtMTT : that of deterministic total

MTTs

www.kmonos.net

54

Other Topics in my Thesis PTIME “Translation Membership” for IO

MTTs “Multi-Return Macro Tree Transducer”

Nondet IO MTTs have poor compositionality. I’ve proposed a slight extension of MTTs with

better compositionality: DT ; MRMTT ; DtT ⊆ MRMTT

A proof that the translation called “twist” cannot be expressed by a single MTT No general techniques; tough work Solves several open conjectures on the

strictness of inclusion between two classes

The Complexity of Tree Transducer Output Languages

Documents

Transcript of The Complexity of Tree Transducer Output Languages