The Complexity of Tree Transducer Output Languages

54
www.kmonos.net The Complexity of Tree Transducer Output Languages NII Logic Seminar 2009/04/15 Kazuhiro Inaba @ NII Joint work with Sebastian Maneth @ NICTA&UNSW

description

The Complexity of Tree Transducer Output Languages. NII Logic Seminar 2009/04/15 Kazuhiro Inaba @ NII Joint work with Sebastian Maneth @ NICTA&UNSW. Table of Contents. Overvie w The Problem Statement Key Lemma: “Garbage-Free Form” Results and Applications The Proof - PowerPoint PPT Presentation

Transcript of The Complexity of Tree Transducer Output Languages

Page 1: The Complexity of   Tree Transducer   Output Languages

www.kmonos.net

The Complexity of Tree Transducer Output Languages

NII Logic Seminar 2009/04/15

Kazuhiro Inaba @ NIIJoint work with Sebastian Maneth @ NICTA&UNSW

Page 2: The Complexity of   Tree Transducer   Output Languages

www.kmonos.net

2

Table of ContentsOverview

The Problem StatementKey Lemma: “Garbage-Free Form”Results and Applications

The Proof

(Other Topics in my Thesis…?)

Page 3: The Complexity of   Tree Transducer   Output Languages

www.kmonos.net

3

PreliminariesΣ, Δ, Γ, …: Ranked Finite Set

Each Symbol in Σ, Δ is associated with a natural number called “rank” of the symbol

TΣ = The set of Trees over ΣTΣ ::= σ(TΣ, …, TΣ) with rank(σ) = k

Example:Σ={a(2),b(1),c(0)}, a(b(c), c)∈ TΣ

Page 4: The Complexity of   Tree Transducer   Output Languages

www.kmonos.net

4

Preliminariesτ ⊆ TΣ×TΔ is called a (tree-to-tree)

translation

For τ1∈ TΣ×TΓ, τ2∈ TΓ×TΔ,the sequential composition is defined as follows: τ1 ; τ2 = {(s, t) | (s, r)∈τ1, (r, t)∈τ2 }

Page 5: The Complexity of   Tree Transducer   Output Languages

www.kmonos.net

“Complexity of Output Languages”

Given…A tree-to-tree translation τ ⊆ TΣ×TΔ

How complex is the set range(τ) ⊆ TΔ ?

(i.e., for a tree t ∈ TΔ, how is it computationally hard to determine whether t ∈range(τ) or not ?)

Page 6: The Complexity of   Tree Transducer   Output Languages

www.kmonos.net

Classic Results τ: Program of the Turing-Machine

Undecidable

τ: Nondeterministic Finite-State String Transduction range(τ) is regular!

The membership of range(τ) is solved in O(n) time, O(1) space

So for τ∈ Finitely Many Compositionsof Nondet-FST

b/bb

a/aa/b/b

Page 7: The Complexity of   Tree Transducer   Output Languages

www.kmonos.net

7

Our Result

What is Macro Tree Transducers? A relatively powerful (yet terminating)

model of tree translation The formal definition will be explained

soon…

τ: Composition of Nondeterministic Macro Tree Transducers the membership problem of range(τ)

is NP-complete and in DSPACE(n).

Page 8: The Complexity of   Tree Transducer   Output Languages

www.kmonos.net

8

Actually, we’ve shownthe “Garbage-Free Form”

For any composition τ = τ1;τ2; … ;τn ⊆ TΣ×TΔ of MTTs, there exists an equivalent one τ = ρ0 ; ρ1 ; … ; ρ2n ,such thatdom(ρ0)=TΣ and range(ρ0) is

regularFor any (s0,t)∈τ, there’s s1,..,s2n s.t.

(si,si+1) ∈ρi, (s2n,t)∈ρ2n, |si+1|≦ 2|si+2|, |s2n|≦2|t|

Page 9: The Complexity of   Tree Transducer   Output Languages

www.kmonos.net

9

Actually, we’ve shownthe “Garbage-Free Form”

In each step of translation (except the first step), the size of the tree always becomes larger(Ignoring the constant factor “2”)

τ1 τ2 τk

ts0 s1

s2Sk-1

ρ0 ρ1 ρ2n

ts1 s2 S2ns0

Page 10: The Complexity of   Tree Transducer   Output Languages

www.kmonos.net

10

(Possible)Applications in Practice

Compositions of MTTs are known to be a good model for XML translationsMTT* can represent good subclasses of

XML-QL, XSLT, XQuery, …

Query Optimization by GFFVerification by the range(MTT*)

membership

Page 11: The Complexity of   Tree Transducer   Output Languages

www.kmonos.net

11

Corollaries of GFF(Applications in Theory)

range(τ1;τ2; … ;τn ) is in NP and in DSPACE(n)

Higher-order context free languages are context-sensitive languagesDetails are in the next slide…

Page 12: The Complexity of   Tree Transducer   Output Languages

www.kmonos.net

12

Chomsky Hierarchy[Chomsky1959]

Systems of Finite Description of String Languages

Type-0 GrammarG=<N, Σ, S, R> where

S ∈ NR is a set of rewrite rules of the form

α → β with α,β ∈ (N∪Σ)*[G] = {w∈Σ* | S→*w}

Page 13: The Complexity of   Tree Transducer   Output Languages

www.kmonos.net

13

Chomsky Hierarchy and Complexity

Type-0 (Computable Language) Type-1 (Context-Sensitive Language)

αAβ → αγβ A∈N, γ≠empty Context-senstive iff recognizable in NSPACE(n)

[Landweber63, Kuroda64] Type-2 (Context-Free)

A→α Proper Subclass of PTIME [?]

Type-3 (Regular) A→sB or A→s A,B∈N, s∈Σ Regular iff recognizable in DSPACE(1) [Folklore]

Page 14: The Complexity of   Tree Transducer   Output Languages

www.kmonos.net

14

Context-Free Grammar= Level-0 Grammar

Example of Type-2 Grammar:S → 0 S 0, S→1 S 1, S→0, S→1Palindromes over 0s and 1s of odd

lengthA → α

Can be regarded as a nullary (nondeterministic) function whose output is strings

Page 15: The Complexity of   Tree Transducer   Output Languages

www.kmonos.net

15

Natural Extension:Macro Grammar

[Fischer68]= Level-1 GrammarS→A(0), A(x) → A(xx), A(x)→x

Sequence of 0s of length 2n

Nonterminals can be parameterized by strings, i.e., they’re nondeterministic function from strings to strings

[Aho68] Level-1 languages are context-sensitive(=NSPACE(n))

Page 16: The Complexity of   Tree Transducer   Output Languages

www.kmonos.net

16

Natural Extenstion:Higher-order grammar

[Damm82]Level-n Grammar is n-th order

grammar, in which the nonterminals are parameterized by at most (n-1)-th order entitiesB(f,x) → B(λy.f(f(y)), xx), B(f,x)→f(1x1)S → B(λy.yy, 0)

2n Repetition of “102^n1” = Simply-Typed λ Caluculus + Strings as

the Base Type + Nondeterminism

Page 17: The Complexity of   Tree Transducer   Output Languages

www.kmonos.net

17

Complexity?Are Level-2, Level-3, … languages

context-sensitive (=NSPACE(n)), or do they go beyond?

[Damm82] Decidable[Maneth02] Call-by-Value Level-n

Languages are in DSPACE(n)[This Work] Call-by-Name Level-n

Languages are in DSPACE(n)!

Page 18: The Complexity of   Tree Transducer   Output Languages

www.kmonos.net

18

Level-n Languages and MTTs [Damm82]

For any G : CbN Level-n grammar, there exists n+1 composition of CbN MTTs s.t. [G] = range(τ1;τ2; … ;τn )Intuition: MTTs can carry out

substitutionHence, giving a complexity

upperbound for range(MTT*) also gives the upperbound for Level-n languages

Page 19: The Complexity of   Tree Transducer   Output Languages

www.kmonos.net

Context-Free Language

Level-1 Language= Macro Language

Level-2 Language

Level-n Language

range(MTT*)

CFG where nonterminals are parameterized by other nonterminals[Aho68] ⊆NSPACE(n)[Rounds73] NP-complete

Context-Sensitive Language

[Kuroda64]= NSPACE(n)

[CYK, Earley, …] ⊆PTIME

Level-n Language [Damm82][Our Work]⊆ DSPACE(n)NP-complete

Level-3 Language

Page 20: The Complexity of   Tree Transducer   Output Languages

www.kmonos.net

20

Brief Introduction toMacro Tree Transducers (MTTs)

Page 21: The Complexity of   Tree Transducer   Output Languages

www.kmonos.net

MTT

An MTT M = (Q, start, Σ, Δ, R) is a set of first-order functions of type Tree(Σ) × Tree(Δ)k Tree(Δ)

Defined by mutual induction on the1st parameter (the input tree) Application in the right-hand side is restricted only to the

direct children of the current node Can take output tree fragments via other parameters, but

not allowed to inspect or decompose them

start( A(x1) ) → double( x1, double(x1, E) )

double( A(x1), y1) → double( x1, double(x1, y1) )double( B, y1 ) → F( y1, y1 )double( B, y1 ) → G( y1, y1 )

RHS ::= F( RHS, … , RHS ) | q(xi, RHS, …, RHS) | yi

Nondeterminism!

Page 22: The Complexity of   Tree Transducer   Output Languages

www.kmonos.net

22

Evaluation StrategyTwo Evaluation Strategies

(analogous to the λ-calculus…)Call-by-Value (IO : Inside-Out)Call-by-Name (OI : Outside-In)

Page 23: The Complexity of   Tree Transducer   Output Languages

www.kmonos.net

Example (IO / call-by-value)

start(A(B)) double( B, double(B, E) ) double( B, F(E, E) )

F( F(E,E), F(E,E) ) or G( F(E,E), F(E,E) ) or double( B, G(E, E) )

F( G(E, E), G(E, E) ) or G( G(E, E), G(E, E) )

double( A(x1), y1) → double( x1, double(x2, y1) )double( B, y1 ) → F( y1, y1 )double( B, y1 ) → G( y1, y1 )

Page 24: The Complexity of   Tree Transducer   Output Languages

www.kmonos.net

Example (OI / call-by-name)

start(A(B)) double( B, double(B, E) ) F( double(B, E), double(B, E) ) F( F(E,E), double(B, E) ) F( F(E,E), F(E,E) )

F( F(E,E), G(E,E) ) !! F( G(E,E), double(B, E) ) F( G(E,E), F(E,E) ) !!

F( G(E,E), G(E,E) ) G( double(B, E), double(B, E) ) G( F(E,E), double(B, E) ) G( F(E,E), F(E,E) )

G( F(E,E), G(E,E) ) !! G( G(E,E), double(B, E) ) G( G(E,E), F(E,E) ) !!

G( G(E,E), G(E,E) )

double( A(x1), y1) → double( x1, double(x2, y1) )double( B, y1 ) → F( y1, y1 )double( B, y1 ) → G( y1, y1 )

Page 25: The Complexity of   Tree Transducer   Output Languages

www.kmonos.net

25

Evaluation StrategyTwo Strategies

(analogous to the λ-calculus…)Call-by-Value (IO : Inside-Out)Call-by-Name (OI : Outside-In)

Today, we consider OI evaluation only.Why?

MTTIO* = MTTOI* (even though MTTIO ≠ MTTOI)

MTTOI has better compositionality (as shown later)

Page 26: The Complexity of   Tree Transducer   Output Languages

www.kmonos.net

26

Main Result:DSPACE(n) Membership of range(MTT*) ── or, how to apply the GFF to the range complexity

Page 27: The Complexity of   Tree Transducer   Output Languages

www.kmonos.net

≪Approach: Generate & Test≫ Guess the input s0and all the intermediate trees s1, …, sn-1 Check whether

(s,s1)∈τ1, (s1,s2)∈τ2, …, (sn-1, t) ∈τn If it is, then t is in the output language! Otherwise, try another s, s2, …, sn-1

τ1 τ2 τn

ts0 s1 s2 Sn-1∈ ?∈ ? ∈ ?

Page 28: The Complexity of   Tree Transducer   Output Languages

www.kmonos.net

In order to carry out the algorithm in DSPACE(|t|) … The sizes |s0|, |s1|, |s2|, …, |sn| must be linearly bounded

by |t| i.e., there must be a constant c independent from t

such that |s| ≦ c|t| Each step to test the “translation membership” of τi

must be done in DSPACE(n)

“Garbage Free-Form” assures this property!

≪Approach: Generate & Test ≫

τ1 τ2 τn

ts0 s1 s2 Sn-1∈ ?∈ ? ∈ ?

Page 29: The Complexity of   Tree Transducer   Output Languages

www.kmonos.net

29

[Review] “Garbage-Free Form”

For any composition τ = τ1;τ2; … ;τn ⊆ TΣ×TΔ of MTTs, there exists an equivalent one τ = ρ0 ; ρ1 ; … ; ρ2n ,such thatdom(ρ0)=TΣ and range(ρ0) is

regularFor any (s0,t)∈τ, there’s s1,..,s2n s.t.

(si,si+1) ∈ρi, (s2n,t)∈ρ2n, |si+1|≦ 2|si+2|, |s2n|≦2|t|

Page 30: The Complexity of   Tree Transducer   Output Languages

www.kmonos.net

30

[Review] “Garbage-Free Form”

In each step of translation (except the first step), the size of the tree always becomes larger(Ignoring the constant factor “2”)

τ1 τ2 τk

ts0 s1

s2Sk-1

ρ0 ρ1 ρ2n

ts1 s2 S2ns0

Page 31: The Complexity of   Tree Transducer   Output Languages

www.kmonos.net

31

“Translation Membership”Let τ be a MTT.

For trees s and t, can we check whether (s,t)∈τ or not within O(|s|+|t|) space?

The answer is Yes, but it is hard to prove it directly.Let us consider subclasses of MTTs…

Page 32: The Complexity of   Tree Transducer   Output Languages

www.kmonos.net

32

In Search of a Good Subclass…

[Engelfriet&Vogler85] MTT ⊆ T ; LMTT

Hence, range(MTT*) = range((T∪LMTT)*)T : MTTs without accumulating params.LMTT: “Linear” MTTs, each input

variable occurs at most once in rhs.T and LMTT is weak enough to show

the DSPACE(n) translation memshipBut, too weak to have the garbage-

free form

Page 33: The Complexity of   Tree Transducer   Output Languages

www.kmonos.net

33

Our Idea : Path-linear MTT Introduce a new

class “PLMTT” T∪LMTT ⊆ PLMTT

Still weak enough to show the DSPACE(n) translation memshp

Strong enough to have the GFF, as will be shown

“Path-linear” = on a path of nested application, each variable occurs linearly

Linear: f(x1, g(x2), h(x3))

Path-linear(but not linear): f(x1, g(x2), h(x2))

Not path-linear: f(x1, g(x1), h(x2))

Page 34: The Complexity of   Tree Transducer   Output Languages

www.kmonos.net

34

Linear Space“Translation Membership”

Lemma: Let τ ∈ PLMTT. For trees s and t, we can check whether (s, t)∈τ or not within O(|s|+|t|) space.

Proof: Basically, try all nondeterministic computation by backtracking. (Some clever stack-sharing is required)Path-linearity assures the length of the

backtracking stack is O(|s|).

Page 35: The Complexity of   Tree Transducer   Output Languages

www.kmonos.net

35

Key Theorem:the Garbage-Free Form of (PL)MTT*

Page 36: The Complexity of   Tree Transducer   Output Languages

www.kmonos.net

36

Garbage-Free = No-deletion In each step of translation (except

the first step), the size of the tree always becomes larger

τ1 τ2 τk

ts0 s1

s2Sk-1

ρ0 ρ1 ρ2n

ts1 s2 S2ns0

Page 37: The Complexity of   Tree Transducer   Output Languages

www.kmonos.net

Proof Sketch ofthe Garbage-Free Form

“Factor out” the deletion

τ1 ; τ2 == τ1 ; (D ; ρ2)

== (τ1 ; D) ; ρ2

== τ’1 ; ρ2

Decompose τ2

to ‘deleting part’ Dand ‘nondeleting’ τ’2

Associativity

Compose τ1 with D(Right-Compositionality

of OI MTTs)

Page 38: The Complexity of   Tree Transducer   Output Languages

www.kmonos.net

Three Types of Deletion “Input-Deletion”

E.g., f( A(x1, x2) ) B( f(x1) ) Discarding the “x2” subtree!

“Skipping” E.g., f( A(x1) ) f(x1) No output is generated at the unary node A.

“Erasure” E.g., f( L, y1 ) y1 (Mainly at leaf nodes) No new output symbol is

generated at the node.

Page 39: The Complexity of   Tree Transducer   Output Languages

www.kmonos.net

39

Three Types of Deletion If there is no input-deletion,

skipping, and erasure during the computation,|in| ≦ 2|out|

Intuition: No input-deletion visits all nodes No skipping outputs at least one node for each

input unary node No erasure outputs at least one node for each input

leaf node For any tree, 2*(#unary + #leaf) ≧ #nodes

(cf., the number of matches in a knockout tournament)

Page 40: The Complexity of   Tree Transducer   Output Languages

www.kmonos.net

40

How to eliminate“erasing” rules

Look-ahead + Inline-ExpansionDecompose τ into τ = E ; τne

E : Bottom-up translation that annotates each input tree with

τne: τ without erasing rules

Page 41: The Complexity of   Tree Transducer   Output Languages

www.kmonos.net

41

How to eliminate“erasing” rules : τ = E ; τneExample

A

B

C

B

C

C

τ: f( C, y1 ) → y1

g( B(x1,x2) ) → X( f(x1, Y) )

Applying f to the 1st child invokes

erasure…

A

B

C

B

C

CApplying f to the 1st

child invokes erasure…

E

Page 42: The Complexity of   Tree Transducer   Output Languages

www.kmonos.net

42

How to eliminate“erasing” rules : τ = E ; τne

τne: f( C, y1 ) → y1

g( B (x1,x2) ) → X( Y )Applying f to the 1st

child invokes erasure…

A

B

C

B

C

C

Applying f to the 1st child invokes

erasure…

A

B

C

B

C

CApplying f to the 1st

child invokes erasure…

E

Page 43: The Complexity of   Tree Transducer   Output Languages

www.kmonos.net

43

How to eliminate“erasing” rules : τ = E ; τneMore complex case

τ: f( C, y1 ) → y1

h( B(x1,x2), y1 ) → f(x1, y1)A

B

C

B

C

C

f(1st) erasureh(2nd) erasure

A

B

C

B

C

C f(1st) erasureE

h(1st) erasure

Page 44: The Complexity of   Tree Transducer   Output Languages

www.kmonos.net

44

Problem!“Inline-Expansion” is not always

possible for nondeterministic MTTsτ: f( C, y1, y2 ) y1

f( C, y1, y2 ) y2

g( B (x1,x2) ) h( x1, f(x1, Y, Z) )

h( C, y1 ) X( y1, y1 )τne: g( B (x1,x2) ) h( x1, Y ) g( B (x1,x2) ) h( x1, Z ) h( C, y1 ) X( y1, y1 )

er…

er…er…

DifferentTranslation!

Page 45: The Complexity of   Tree Transducer   Output Languages

www.kmonos.net

45

Solution!Extend MTTs with “inline-

nondeterminism”τ: f( C, y1, y2 ) y1

f( C, y1, y2 ) y2

g( B (x1,x2) ) h( x1, f(x1, Y, Z) )

h( C, y1 ) X( y1, y1 )

τne: g( B (x1,x2) ) h( x1, +(Y,Z) ) h( C, y1 ) X( y1, y1 )

er…

er…SameTranslation!

Page 46: The Complexity of   Tree Transducer   Output Languages

www.kmonos.net

Solution:“MTT with Choice and

Failure”MTT + inline-nondeterminism

+ inline-partiality

Same expressiveness: MTT = MTTCFMuch more flexible syntax

Allows inline-expansion for free!

RHS ::= F( RHS, … , RHS ) | q(xi, RHS, …, RHS) | yi

| +(RHS, RHS) | θ

Page 47: The Complexity of   Tree Transducer   Output Languages

www.kmonos.net

47

Note on Path-linearity Inline-Expansion

…does not preserve linearity.…does preserve path-linearity.

This is the reason for using PLMTTs.

τ: f( C, y1 ) X( y1, y1 ) g( B(x1,x2) ) f( x1, h(x2, Y) )

τne: g( B(x1,x2) ) X( h(x2, Y), h(x2, Y) )

Page 48: The Complexity of   Tree Transducer   Output Languages

www.kmonos.net

48

How to eliminate“Input-Deletion”: τne = I ; τnei

Exhaustively trying all deletion by using nondeterminism

A

B

C

B

C

CI

A0 A1

B00

A1

B10

CA0

B01

C

B11

C

or or or

or…

Page 49: The Complexity of   Tree Transducer   Output Languages

www.kmonos.net

49

How to eliminate“Input-Deletion”: τne = I ; τnei

Exampleτne: f( B(x1,x2) ) g( x1, g(x1, Y) ) f( B(x1,x2) ) g( x1, g(x2, Y) ) f( B(x1,x2) ) g( x2, g(x1, Y) )

τnei: f( B10(x1) ) g( x1, g(x1, Y) ) f( B10(x1) ) g( x1, θ ) f( B10(x1) ) g( x2, g(x1, Y) ) …

Page 50: The Complexity of   Tree Transducer   Output Languages

www.kmonos.net

50

How to eliminate“Skipping”: τnei = S ; τneis

Same as for the “input-deletion”Try all possible deletion

nondeterminisitically…A

B

C

BS

A

CBBCABB

BA

CB

oror or...

Page 51: The Complexity of   Tree Transducer   Output Languages

www.kmonos.net

[Review] Proof Sketch ofthe Garbage-Free Form

“Factor out” the deletion

τ1 ; τ2 == τ1 ; (E;I;S ; ρ2)

== (τ1 ; E;I;S) ; ρ2

== τ’1 ; ρ2

Decompose τ2

to ‘deleting part’ E;I;Sand ‘nondeleting’ τ’2

Associativity

Compose τ1 with D(Right-Compositionality of OI MTTs with linear top-down transducers)

Page 52: The Complexity of   Tree Transducer   Output Languages

www.kmonos.net

Summary Membership problem of range(MTT*) is in

DSPACE(n) Note: Almost the same proof shows that it is in

NP Note: NP-hardness was already known

[Rounds73]Proof is by

Garbage-Free FormFor GFF, we needed a class that is robust

w.r.t. the inline-expansionTranslation Membership

For TrnMem, we needed a class with a certain kind of linearity

“Path-linear MTT with Choice and Failure”!

Page 53: The Complexity of   Tree Transducer   Output Languages

www.kmonos.net

53

Related Workrange(T*) is in DSPACE(n)

                   [Baker1978]T : the class of translations realizable

by MTTs with no accumulating params.

range(DtMTT*) is in DSPACE(n) [Maneth2002]DtMTT : that of deterministic total

MTTs

Page 54: The Complexity of   Tree Transducer   Output Languages

www.kmonos.net

54

Other Topics in my Thesis PTIME “Translation Membership” for IO

MTTs “Multi-Return Macro Tree Transducer”

Nondet IO MTTs have poor compositionality. I’ve proposed a slight extension of MTTs with

better compositionality: DT ; MRMTT ; DtT ⊆ MRMTT

A proof that the translation called “twist” cannot be expressed by a single MTT No general techniques; tough work Solves several open conjectures on the

strictness of inclusion between two classes