Queries on TreesAutomata, logic, and XML [Nev02b, Nev02a] Automata for XML – a survey [Sch07]...
Transcript of Queries on TreesAutomata, logic, and XML [Nev02b, Nev02a] Automata for XML – a survey [Sch07]...
Queries on Trees
Jerome Champavere Emmanuel Filiot Olivier GauwinEdouard Gilbert S lawek Staworko
INRIA Lille, Mostrare
2008
PhDs+S lawek (Mostrare) Queries on Trees 2008 1 / 128
Framework
n-ary queries on unranked labeled finite ordered trees
PhDs+S lawek (Mostrare) Queries on Trees 2008 2 / 128
Treest = a
c c d
finite alphabet: Σ = {a, b, c , d , e}
t is the structure (D, ch∗, ns∗, label) with:
D = {ǫ, 1, 2, 3}: prefix-closed finite subset of N
ch∗ = reflexive-transitive closure of ch, defined by:
ch(π1, π2) ⇔ π2 = π1 ·i for some i ∈ N
ns∗ = reflexive-transitive closure of ns, defined by:
ns(π1, π2) ⇔ π1 = π ·i and π2 = π ·(i + 1) for some π, i ∈ N∗ × N
label : D→ Σ. Can also be seen as a partition (labela)a∈Σ of D.
label(1·3) = d labeld(1·3)
PhDs+S lawek (Mostrare) Queries on Trees 2008 3 / 128
Queries
n-ary queries
q(t) ⊆ Dnt
n=0: Boolean queries
q(t) = ∅ or q(t) = {()}
q defines Lq = {t | q(t) = {()}}
Questions
expressivity
complexity of:◮ model-checking: x ∈ q(t)◮ satisfiability: ∃t, q(t) 6= ∅
PhDs+S lawek (Mostrare) Queries on Trees 2008 4 / 128
Existing material
Surveys
Logics over unranked trees: an overview [Lib06]
Automata, logic, and XML [Nev02b, Nev02a]
Automata for XML – a survey [Sch07]
Effective Characterizations of Tree Logics [Boj08a]
Tree-walking automata [Boj08b]
Books
Finite Model Theory [EF99, Lib04]
Foundations of Databases [AHV95]
PhDs+S lawek (Mostrare) Queries on Trees 2008 5 / 128
Outline
1 Classical logics (FO, MSO)
2 Queries by Tree AutomataTree-walking automataSchema Languages & Tree Automata
3 Conjunctive Queries over TreesDefinition, results and acyclic fragmentTwigs and Tree Patterns
4 Monadic Datalog
5 µ-calculus
6 XPath
7 Temporal Logics
PhDs+S lawek (Mostrare) Queries on Trees 2008 6 / 128
Part I
Classical Logics, Automata
PhDs+S lawek (Mostrare) Queries on Trees 2008 7 / 128
Outline
1 Classical logics (FO, MSO)
2 Queries by Tree AutomataTree-walking automataSchema Languages & Tree Automata
PhDs+S lawek (Mostrare) Queries on Trees 2008 8 / 128
FO
Well-formed formulas based on:
predicates from the structure: ch∗, ns∗, (labela)a∈Σ
Boolean connectives: ∧,¬
FO variables: x , y ...
quantifiers on FO variables: ∃x
PhDs+S lawek (Mostrare) Queries on Trees 2008 9 / 128
Querying using FO
We use free variables:
q(x) = ∃y .∃z .(ch∗(x , y) ∧ ch∗(y , z) ∧ labela(z))
This way we can define queries of any arity.
PhDs+S lawek (Mostrare) Queries on Trees 2008 10 / 128
FO: Available predicates
Why ch∗ and ns∗?
because ch and ns are definable from ch∗ and ns∗ in FO...
... but the converse is false
So in the following, we suppose that ch and ns are also available.
Also definable in FO:
unary predicates: root , leaf , lc (lastchild)
binary predicates: fc (firstchild)
PhDs+S lawek (Mostrare) Queries on Trees 2008 11 / 128
FO: Complexity
Model-checking
PSPACE-complete (combined complexity). [Sto74, Var82]
Remark: PSPACE-hardness is even true for the quantifiedpropositional logic [GO99].
Satisfiability
non-elementary on trees
PhDs+S lawek (Mostrare) Queries on Trees 2008 12 / 128
FO: Restrictions on the number of variables
FOk = FO formulas using only k variables
Variables might be reused
q(x) = ∃y .∃z .(ch∗(x , y) ∧ ch∗(y , z) ∧ labela(z)) /∈ FO2
but is equivalent toq′(x) = ∃y .(ch∗(x , y) ∧ ∃x .(ch∗(y , x) ∧ labela(x))) ∈ FO2
Theorem ([Imm82, Var95, GO99])
The model-checking problem for FOk (with k ≥ 2) is P-complete on anystructure.
PhDs+S lawek (Mostrare) Queries on Trees 2008 13 / 128
FO: Restrictions on the number of variables
FO2 = FO formulas using only 2 variables
In FO2, one cannot define ch and ns from ch∗ and ns∗ anymore. So ch
and ns are added to the signature.
Complexity
Model-checking in FO2 can be done in O(|t|2.|q|) [Imm82].
Expressivity
FO is strictly more expressive than FO2.
example of Boolean query: trees where the leaf language is (ab)∗.
Links between FO2 and XPath will be shown in Part 3.
PhDs+S lawek (Mostrare) Queries on Trees 2008 14 / 128
ExpressivityA B A ( B
A B A ⊆ B
A B A * B
FO
FO2
PhDs+S lawek (Mostrare) Queries on Trees 2008 15 / 128
FO: Restrictions on the number of variables
Data values
predicate ∼:x ∼ y if x and y are two attribute nodes that have the same value
in XPath semantics: add tests of the form/bib//book/@type = //collection/@style
Decidability
FO2 [∼,ch,ns ] is decidable [BDM+06].
FO2 [∼,ch,ns,ch∗,ns∗ ]: open question
FO3[∼,ch,ns ] is undecidable (even on strings) [BMS+06].
PhDs+S lawek (Mostrare) Queries on Trees 2008 16 / 128
FO: Restrictions on the number of variables
FOkn = FO formulas using (k bound variables) + (n free variables)
We assume here that the n free variables are never quantified.
Some results on trees
FO2 = FO32 [Mar05a] (his result is stronger)
FO3 = FO33 [Mar05b]
FOn = FO3n
◮ translate into a FO0 formula on alphabet Σ× Bn,◮ FO0 = FO3
0 (consequence of [Mar05b], Th. 3)◮ backward translation: label(f ,~b)(x) becomes labelf (x)
∧
~bi =1 x = xi
PhDs+S lawek (Mostrare) Queries on Trees 2008 17 / 128
MSO
MSO = FO + quantification over monadic predicates
“monadic predicates” also seen as “sets”X (x) x ∈ X
φodd(x , y)
= “y is a descendant of x and the path between them is of odd length”= ∃X .∃Y . (∀z .(X (z)⇔ ¬Y (z))) ∧
(∀z .(X (z) ∨ Y (z)⇒ ch∗(x , z) ∧ ch∗(z , y))) ∧(X (x) ∧ Y (y)) ∧(∀z .∀v .(ch∗(x , z) ∧ ch(z , v) ∧ ch∗(v , y)⇒
(X (z)⇒ Y (v) ∧ Y (z)⇒ X (v))))
Expressivity
MSO is strictly more expressive than FO (see φodd).
PhDs+S lawek (Mostrare) Queries on Trees 2008 18 / 128
ExpressivityA B A ( B
A B A ⊆ B
A B A * B
MSO
FO
FO2
PhDs+S lawek (Mostrare) Queries on Trees 2008 19 / 128
MSO: Complexity
Model-checking
combined complexity: PSPACEc [Sto74, Var82]
data complexity: linear (by translation to automaton)
Satisfiability
non-elementary on trees
PhDs+S lawek (Mostrare) Queries on Trees 2008 20 / 128
Deciding membership to FO
Theorem ([BS05])
Given a regular tree language L, one can decide if L is definable inFOch,(labela)a∈Σ
.
Open decision problem
Given a regular tree language L, is it possible to decide if L is definable inFO?
In other words, FO-definability is known to be decidable for unorderedtrees, but unknown for ordered trees.
Automata for FO
For a definition of automata recognizing exactly FO-definable languages,see [Boj04, Chapter 2].
PhDs+S lawek (Mostrare) Queries on Trees 2008 21 / 128
Outline
1 Classical logics (FO, MSO)
2 Queries by Tree AutomataTree-walking automataSchema Languages & Tree Automata
PhDs+S lawek (Mostrare) Queries on Trees 2008 22 / 128
Tree Automata for Queries
Branching & Stepwise Tree automata
Query automata
Tree-walking automata (TWA)
Schema languages
PhDs+S lawek (Mostrare) Queries on Trees 2008 23 / 128
Branching & Stepwise Tree Automata I
Automata over Σ× {0, 1}n
◮ Canonical languages◮ Same expressive power as MSO
Automata with selecting states◮ Boolean values into the states◮ Existential run-based queries [NPTT05]◮ Selecting tree automata [FGK03]
Stepwise tree automata [CNT04]
PhDs+S lawek (Mostrare) Queries on Trees 2008 24 / 128
Branching & Stepwise Tree Automata II
Decision problems
Membership PTIME
Non-emptiness PTIME
From MSO to tree automata: non-elementary size◮ Upper bound [TW68]◮ Lower bound [FG02]
PhDs+S lawek (Mostrare) Queries on Trees 2008 25 / 128
Query Automata [NS99, NS02]
Two-way deterministic tree automata [Mor94] over (un)ranked treesextended with a selection function
Equivalent to MSO
Decision problems
Non-emptiness EXPTIME
Containment EXPTIME
Equivalence EXPTIME
PhDs+S lawek (Mostrare) Queries on Trees 2008 26 / 128
Outline
1 Classical logics (FO, MSO)
2 Queries by Tree AutomataTree-walking automataSchema Languages & Tree Automata
PhDs+S lawek (Mostrare) Queries on Trees 2008 27 / 128
Context
Most work is done on ranked trees
PhDs+S lawek (Mostrare) Queries on Trees 2008 28 / 128
Context
Most work is done on ranked trees
Still some definitions on unranked cases, but few results
PhDs+S lawek (Mostrare) Queries on Trees 2008 28 / 128
Context
Most work is done on ranked trees
Still some definitions on unranked cases, but few results
Thus we will work on trees of rank 2
PhDs+S lawek (Mostrare) Queries on Trees 2008 28 / 128
Context
Most work is done on ranked trees
Still some definitions on unranked cases, but few results
Thus we will work on trees of rank 2
Structure: label, ch1, ch2
PhDs+S lawek (Mostrare) Queries on Trees 2008 28 / 128
Tree-walking automataon ranked trees
A tree-walking automaton (TWA)[AU71]:
◮ a tree is accepted whenever the “accept” action is used
PhDs+S lawek (Mostrare) Queries on Trees 2008 29 / 128
Tree-walking automataon ranked trees
A tree-walking automaton (TWA)[AU71]:◮ the automaton is located in some node (at first, the root) of the tree
and in a given state
◮ a tree is accepted whenever the “accept” action is used
PhDs+S lawek (Mostrare) Queries on Trees 2008 29 / 128
Tree-walking automataon ranked trees
A tree-walking automaton (TWA)[AU71]:◮ the automaton is located in some node (at first, the root) of the tree
and in a given state◮ if some conditions are verified (label, being a leaf, being the left child
of one’s parent), decide of an action
◮ a tree is accepted whenever the “accept” action is used
PhDs+S lawek (Mostrare) Queries on Trees 2008 29 / 128
Tree-walking automataon ranked trees
A tree-walking automaton (TWA)[AU71]:◮ the automaton is located in some node (at first, the root) of the tree
and in a given state◮ if some conditions are verified (label, being a leaf, being the left child
of one’s parent), decide of an action◮ actions: accept, reject, move to parent with state q, move to left child
with state q′, . . .◮ a tree is accepted whenever the “accept” action is used
PhDs+S lawek (Mostrare) Queries on Trees 2008 29 / 128
Tree-walking automataon ranked trees
A tree-walking automaton (TWA)[AU71]:◮ the automaton is located in some node (at first, the root) of the tree
and in a given state◮ if some conditions are verified (label, being a leaf, being the left child
of one’s parent), decide of an action◮ actions: accept, reject, move to parent with state q, move to left child
with state q′, . . .◮ a tree is accepted whenever the “accept” action is used
PhDs+S lawek (Mostrare) Queries on Trees 2008 29 / 128
Tree-walking automataon ranked trees
A tree-walking automaton (TWA)[AU71]:◮ the automaton is located in some node (at first, the root) of the tree
and in a given state◮ if some conditions are verified (label, being a leaf, being the left child
of one’s parent), decide of an action◮ actions: accept, reject, move to parent with state q, move to left child
with state q′, . . .◮ a tree is accepted whenever the “accept” action is used a tree can be
rejected by looping, the “reject” action is not necessary
Expressiveness:
PhDs+S lawek (Mostrare) Queries on Trees 2008 29 / 128
Tree-walking automataon ranked trees
A tree-walking automaton (TWA)[AU71]:◮ the automaton is located in some node (at first, the root) of the tree
and in a given state◮ if some conditions are verified (label, being a leaf, being the left child
of one’s parent), decide of an action◮ actions: accept, reject, move to parent with state q, move to left child
with state q′, . . .◮ a tree is accepted whenever the “accept” action is used a tree can be
rejected by looping, the “reject” action is not necessary
Expressiveness:◮ any tree-walking automaton can be represented as a branching
automaton, but with exponential blowup
PhDs+S lawek (Mostrare) Queries on Trees 2008 29 / 128
Tree-walking automataon ranked trees
A tree-walking automaton (TWA)[AU71]:◮ the automaton is located in some node (at first, the root) of the tree
and in a given state◮ if some conditions are verified (label, being a leaf, being the left child
of one’s parent), decide of an action◮ actions: accept, reject, move to parent with state q, move to left child
with state q′, . . .◮ a tree is accepted whenever the “accept” action is used a tree can be
rejected by looping, the “reject” action is not necessary
Expressiveness:◮ any tree-walking automaton can be represented as a branching
automaton, but with exponential blowup◮ but the opposite is false: TWA are not as expressive as MSO [BC05]
PhDs+S lawek (Mostrare) Queries on Trees 2008 29 / 128
Expressiveness (ranked case)A B A ( B
A B A ⊆ B
A B A * B MSO
TWA
FO
[BC04]
PhDs+S lawek (Mostrare) Queries on Trees 2008 30 / 128
Deterministic TWA and their expressiveness
Formulae recognised by deterministic TWA are stable by negation
Formulae recognised by non-deterministic ones are not
PhDs+S lawek (Mostrare) Queries on Trees 2008 31 / 128
Deterministic TWA and their expressiveness
Formulae recognised by deterministic TWA are stable by negation
Formulae recognised by non-deterministic ones are not
PhDs+S lawek (Mostrare) Queries on Trees 2008 31 / 128
Deterministic TWA and their expressiveness
Formulae recognised by deterministic TWA are stable by negation
Formulae recognised by non-deterministic ones are not ⇒deterministic TWA are strictly less expressive than non-deterministicones [MSS06]
FO ⊆ TWA ( MSO [BC04]
PhDs+S lawek (Mostrare) Queries on Trees 2008 31 / 128
Deterministic TWA and their expressiveness
Formulae recognised by deterministic TWA are stable by negation
Formulae recognised by non-deterministic ones are not ⇒deterministic TWA are strictly less expressive than non-deterministicones [MSS06]
FO ⊆ TWA ( MSO [BC04]
FO 6⊆ DTWA 6⊆ FO [BC05]
PhDs+S lawek (Mostrare) Queries on Trees 2008 31 / 128
Expressiveness (ranked case)A B A ( B
A B A ⊆ B
A B A * B MSO
TWA
detTWA
FO
[MSS06]
[BC05]
[BC04]
[BC05]
PhDs+S lawek (Mostrare) Queries on Trees 2008 32 / 128
Pebble tree-walking automata and stack discipline[EH99]
Add a finite number of pebble marked {1, . . . , n} to the automaton
PhDs+S lawek (Mostrare) Queries on Trees 2008 33 / 128
Pebble tree-walking automata and stack discipline[EH99]
Add a finite number of pebble marked {1, . . . , n} to the automaton
New tests: is there a pebble on current node?
PhDs+S lawek (Mostrare) Queries on Trees 2008 33 / 128
Pebble tree-walking automata and stack discipline[EH99]
Add a finite number of pebble marked {1, . . . , n} to the automaton
New tests: is there a pebble on current node?
New actions: add a pebble to current position, remove a pebble fromthe current (or any) state
PhDs+S lawek (Mostrare) Queries on Trees 2008 33 / 128
Pebble tree-walking automata and stack discipline[EH99]
Add a finite number of pebble marked {1, . . . , n} to the automaton
New tests: is there a pebble on current node?
New actions: add a pebble to current position, remove a pebble fromthe current (or any) state
Stack discipline: if pebble 1 to i already can only add pebble i + 1 orremove pebble i
PhDs+S lawek (Mostrare) Queries on Trees 2008 33 / 128
Expressiveness of pebble TWA
Expressiveness increases with number of pebble [BSSS06]
∀n ∈ N PTWAn ( PTWAn+1
detPTWA ⊆ PTWA[EH99]
it is not known if detPTWA = PTWAbut there is no c s.t. PTWAk ⊆ detPTWAck
Expressiveness without stack discipline
MSO 6⊆ TWAno stack
TWAno stack emptiness is undecidable
PhDs+S lawek (Mostrare) Queries on Trees 2008 34 / 128
Expressiveness (ranked case)A B A ( B
A B A ⊆ B
A B A * B MSO
TWA
detTWA
FO
TWApebble
detTWApebble
[BC04]
[BC05]
[BC04]
[BSSS06]
[BC05]
PhDs+S lawek (Mostrare) Queries on Trees 2008 35 / 128
Unbounded pebble TWA
We now allow an unbounded number of pebble (with stack discipline)
Expressiveness
Unbounded pebble TWA emptiness is undecidableInvisible pebble TWA = MSO
PhDs+S lawek (Mostrare) Queries on Trees 2008 36 / 128
Unbounded pebble TWA
We now allow an unbounded number of pebble (with stack discipline)
We can consider invisible pebble: only the top pebble presence can betested[EHS07]
Expressiveness
Unbounded pebble TWA emptiness is undecidableInvisible pebble TWA = MSO
PhDs+S lawek (Mostrare) Queries on Trees 2008 36 / 128
Alternating tree-walking automata
Two players ∀,∃
PhDs+S lawek (Mostrare) Queries on Trees 2008 37 / 128
Alternating tree-walking automata
Two players ∀,∃
Each state belongs to a player: Q = Q∀ ⊎ Q∃
PhDs+S lawek (Mostrare) Queries on Trees 2008 37 / 128
Alternating tree-walking automata
Two players ∀,∃
Each state belongs to a player: Q = Q∀ ⊎ Q∃
If q ∈ Q∀, then ∀ plays the next move in a given set of rules,otherwise, ∃ does
PhDs+S lawek (Mostrare) Queries on Trees 2008 37 / 128
Alternating tree-walking automata
Two players ∀,∃
Each state belongs to a player: Q = Q∀ ⊎ Q∃
If q ∈ Q∀, then ∀ plays the next move in a given set of rules,otherwise, ∃ does
A tree is accepted if ∃ wins, rejected if ∀ does
PhDs+S lawek (Mostrare) Queries on Trees 2008 37 / 128
Alternating tree-walking automata
Two players ∀,∃
Each state belongs to a player: Q = Q∀ ⊎ Q∃
If q ∈ Q∀, then ∀ plays the next move in a given set of rules,otherwise, ∃ does
A tree is accepted if ∃ wins, rejected if ∀ does
∃ wins if an accept rule is played by someone or if ∀ has no possiblemove, otherwise ∀ wins
PhDs+S lawek (Mostrare) Queries on Trees 2008 37 / 128
Alternating tree-walking automata
Two players ∀,∃
Each state belongs to a player: Q = Q∀ ⊎ Q∃
If q ∈ Q∀, then ∀ plays the next move in a given set of rules,otherwise, ∃ does
A tree is accepted if ∃ wins, rejected if ∀ does
∃ wins if an accept rule is played by someone or if ∀ has no possiblemove, otherwise ∀ wins
PhDs+S lawek (Mostrare) Queries on Trees 2008 37 / 128
Alternating tree-walking automata
Two players ∀,∃
Each state belongs to a player: Q = Q∀ ⊎ Q∃
If q ∈ Q∀, then ∀ plays the next move in a given set of rules,otherwise, ∃ does
A tree is accepted if ∃ wins, rejected if ∀ does
∃ wins if an accept rule is played by someone or if ∀ has no possiblemove, otherwise ∀ wins
Expressiveness
alternating TWA = MSO
PhDs+S lawek (Mostrare) Queries on Trees 2008 37 / 128
Caterpillar expressions [BKW00]
PhDs+S lawek (Mostrare) Queries on Trees 2008 38 / 128
Caterpillar expressions [BKW00]
Caterpillar expressions describe runs of tree-walking automata
PhDs+S lawek (Mostrare) Queries on Trees 2008 39 / 128
Caterpillar expressions [BKW00]
Caterpillar expressions describe runs of tree-walking automata
Caterpillar alphabet on Σ◮ Commands letter goleft, goright and goparent
PhDs+S lawek (Mostrare) Queries on Trees 2008 39 / 128
Caterpillar expressions [BKW00]
Caterpillar expressions describe runs of tree-walking automata
Caterpillar alphabet on Σ◮ Commands letter goleft, goright and goparent◮ Tests letter leaf, isleft, isright and labels a ∈ Σ
PhDs+S lawek (Mostrare) Queries on Trees 2008 39 / 128
Caterpillar expressions [BKW00]
Caterpillar expressions describe runs of tree-walking automata
Caterpillar alphabet on Σ◮ Commands letter goleft, goright and goparent◮ Tests letter leaf, isleft, isright and labels a ∈ Σ
Caterpillar words describe paths in TWA: isleft a goleft bdescribes paths going from a left child labeled a to its left childlabeled b
PhDs+S lawek (Mostrare) Queries on Trees 2008 39 / 128
Caterpillar expressions [BKW00]
Caterpillar expressions describe runs of tree-walking automata
Caterpillar alphabet on Σ◮ Commands letter goleft, goright and goparent◮ Tests letter leaf, isleft, isright and labels a ∈ Σ
Caterpillar words describe paths in TWA: isleft a goleft bdescribes paths going from a left child labeled a to its left childlabeled b
Caterpillar expressions: regular expressions on caterpillar alphabet
PhDs+S lawek (Mostrare) Queries on Trees 2008 39 / 128
Cutting caterpillar expressions
New letters:◮ test 〈c〉 (nest) where c is a caterpillar expression: true if c applied to
current node selects at least one path
PhDs+S lawek (Mostrare) Queries on Trees 2008 40 / 128
Cutting caterpillar expressions
New letters:◮ test 〈c〉 (nest) where c is a caterpillar expression: true if c applied to
current node selects at least one path◮ command cut transform the whole tree into the subtree of the current
node — local transform, does not apply outside nests
PhDs+S lawek (Mostrare) Queries on Trees 2008 40 / 128
Cutting caterpillar expressions
New letters:◮ test 〈c〉 (nest) where c is a caterpillar expression: true if c applied to
current node selects at least one path◮ command cut transform the whole tree into the subtree of the current
node — local transform, does not apply outside nests
Expressiveness: if nesting is forbidden under scope of negation,posCAT = PTWA
PhDs+S lawek (Mostrare) Queries on Trees 2008 40 / 128
Cutting caterpillar expressions
New letters:◮ test 〈c〉 (nest) where c is a caterpillar expression: true if c applied to
current node selects at least one path◮ command cut transform the whole tree into the subtree of the current
node — local transform, does not apply outside nests
Expressiveness: if nesting is forbidden under scope of negation,posCAT = PTWA
Expressiveness: if nesting is allowed under the scope of a negation, asexpressive as nested TWA (not defined here)
PhDs+S lawek (Mostrare) Queries on Trees 2008 40 / 128
Expressiveness (ranked case)A B A ( B
A B A ⊆ B
A B A * B MSO
TWA
detTWA
FO
TWApebble
detTWApebble
nested TWA
[BC04]
[BC05]
[BC04]
[BSSS06]
[BC05]
PhDs+S lawek (Mostrare) Queries on Trees 2008 41 / 128
FO: Extensions
Notation: z = (z1, . . . , zn)
Adding Transitive Closure: TC n
TCn[ϕ(x , y)](u, v)
iff
∃k , ∃(wi )i∈[1..k], ϕ(u, w1) ∧ ϕ(w1, w2) ∧ . . . ∧ ϕ(wk , v)
By TCn, we mean “parameter-free” transitive closure, i.e., x and y areexactly the free variable of ϕ.We write TCn
p for the non-parameter-free transitive closure (i.e., ϕ canhave extra free variables).
PhDs+S lawek (Mostrare) Queries on Trees 2008 42 / 128
FO: Extensions
FO + TC 1p = nested TWA [tCS08]
FO + TC 1 is often written FO∗, and FO + TC 1p is written FO(MTC ).
FO + TC 1 ⊆ FO + TC 1p : it is unknown whether it is strict.
FO + TC 1 ⊆ MSO
because TC 1[ϕ(x , y)](u, v)⇔∀X . (u ∈ X ∧ ∀(x , y). (x ∈ X ∧ ϕ(x , y)⇒ y ∈ X )⇒ v ∈ X )
FO ( FO + TC 1 ( MSO
Transitive closure is not expressible in FO [Fag75].
Adding TC 1 to FO is not enough to reach MSO [tCS08].
For properties of FO + TC 1 see [Kep06].
PhDs+S lawek (Mostrare) Queries on Trees 2008 43 / 128
Expressiveness (ranked case)A B A ( B
A B A ⊆ B
A B A * B MSO
TWA
detTWA
FO
TWApebble
detTWApebble
FO + TC1
FO + TC∗
[BC04]
[BC05]
[BC04]
[BSSS06]
[tCS08]
[TK06]
[BC05]
PhDs+S lawek (Mostrare) Queries on Trees 2008 44 / 128
FO: Extensions
FO + TC 2
FO + TC 2 * MSO
(cf next slide)
MSO ⊆ FO + TC 2?
This is an open question. It could be the case that MSO * FO + TC k , forall k .
PhDs+S lawek (Mostrare) Queries on Trees 2008 45 / 128
FO: Extensions
FO + TC 2 * MSO
For instance L = {f (X , X ) | X ∈ TΣ} is defined by:
ϕ = labelf (ǫ) ∧∃u1.∃v1.fc(ǫ, u1) ∧ ns(u1, v1) ∧ samelabel(u1, v1) ∧¬(∃w .ns(v1,w)) ∧∀u2. ch∗(u1, u2)⇒ ∃v2. TC 2[ψ(x , y)](u1, u2, v1, v2)
where ψ encodes a step isomorphism:ψ(x , y) = samelabel(x2, y2) ∧
(fc(x1, x2) ∧ fc(y1, y2)) ∨ (ns(x1, x2) ∧ ns(y1, y2))
with:samelabel(x , y) =
∨
a∈Σ labela(x) ∧ labela(y)
PhDs+S lawek (Mostrare) Queries on Trees 2008 46 / 128
Expressiveness (ranked case)A B A ( B
A B A ⊆ B
A B A * B MSO
TWA
detTWA
FO
TWApebble
detTWApebble
FO + TC1
FO + TC2
FO + TC∗
[BC04]
[BC05]
[BC04]
[BSSS06]
[tCS08]
[TK06]
[BC05]
tree isomorphism
PhDs+S lawek (Mostrare) Queries on Trees 2008 47 / 128
FO: Extensions
FO + detTC 1 = detTWApebble [EH06]
Deterministic Transitive Closure of ϕ = TC on the functional part of ϕ
FO + detTC 1 ⊆ FO + TC 1
becausedetTC 1[ϕ(x , y)](u, v)⇔ TC 1[ϕ(x , y) ∧ ∀z .ϕ(x , z)⇒ z = y ](u, v)
FO + detTC 1 ( FO + TC 1?
Open question (see [Kep06]).
For some properties of FO + detTC 1 (linear order, even...) see[Kep06, EI95].
PhDs+S lawek (Mostrare) Queries on Trees 2008 48 / 128
FO: Extensions
FO + posTC 1 = TWApebble [EH06]
formulas of FO + TC 1 with TC 1 operators under an even number ofnegations
FO + detTC 1 ⊆ FO + posTC 1 ⊆ FO + TC 1
inclusions due to TWA characterisations
whether these 2 inclusions are strict is still open
FO + TC 1 ( MSO
separation language based on the branching structure [tCS08]
PhDs+S lawek (Mostrare) Queries on Trees 2008 49 / 128
Expressiveness (ranked case)A B A ( B
A B A ⊆ B
A B A * B
FO + TC∗
FO + TC2MSO
FO + TC1 [tCS08]= nestedTWA
FO + posTC1 [EH06]= TWApebble
FO + detTC1 [EH06]= detTWApebble
TWA
detTWA
FO
[TK06]
[BSSS06]
[BSSS06]
[tCS08]
[BC04]
[TK06]
[BC05]
tree isomorphism
[Kep06, BSSS06]
[BC05]
PhDs+S lawek (Mostrare) Queries on Trees 2008 50 / 128
Outline
1 Classical logics (FO, MSO)
2 Queries by Tree AutomataTree-walking automataSchema Languages & Tree Automata
PhDs+S lawek (Mostrare) Queries on Trees 2008 51 / 128
XML Schema Languages
Describe a set of XML documents
Theoretical framework: no data, only structure
Closer to tree grammars [MLM01] than to tree automata
Tree automata: reference model for the expressiveness
PhDs+S lawek (Mostrare) Queries on Trees 2008 52 / 128
Document Type Definitions (DTDs)“Standard” DTDs
Local tree languages
a
b
c d
L = {
a′
b
d c
, }
a(qb) → qa
a′(qb) → qa′
b(qcqd) → qb
b(qdqc) → qb
c(ǫ) → qc
d(ǫ) → qd
a
b
d c
∈ L!⇒
◮ Restriction: no competiting states
Deterministic content models◮ One-unambiguous regular expressions [BKW98]◮ ab + ac: which a to match depends on the next symbol
Polynomial complexity for other usual decision problems(membership, emptiness, containment), except intersection [MNS04]
Lack of expressivity
PhDs+S lawek (Mostrare) Queries on Trees 2008 53 / 128
Extended DTDs (EDTDs) [MNSB06, Sch07] I
Alphabet extended with types (each type is associated to a uniquesymbol)
a
b1
c d
L = {
a′
b2
d c
, }
a(qb1 ) → qa
a′(qb2 ) → qa′
b1(qcqd) → qb1
b2(qdqc) → qb2
c(ǫ) → qc
d(ǫ) → qd
Typing problem:◮ Valid assignment of types to the elements w.r.t. EDTD◮ (Consistent) combination of unary queries
As expressive as (parallel) unranked tree automata of [BKWM01],thus equivalent to regular tree languages
◮ Examples of such schema languages: Relax NG [CM01], XDuce [HP03]◮ Restricted EDTDs: single-type, restrained-competition
PhDs+S lawek (Mostrare) Queries on Trees 2008 54 / 128
Extended DTDs (EDTDs) [MNSB06, Sch07] II
Single-type EDTDs
a(qb1 + qb2 ) → qa
b1(ǫ) → qb1
b2(ǫ) → qb2
a1(qb1 ) → qa1
a2(qb2 ) → qa2
b1(ǫ) → qb1
b2(ǫ) → qb2
◮ Element Declaration Consistent constraint (W3C XML Schemas)◮ Unique top-down typing◮ Validation with deterministic tree-walking automata
Restrained-competition EDTDs
a(qb1 · qb2 ) → qa
b1(ǫ) → qb1
b2(ǫ) → qb2
◮ Unique top-down left-to-right typing◮ Validation with deterministic top-down tree automata
PhDs+S lawek (Mostrare) Queries on Trees 2008 55 / 128
Expressiveness of Schemas
EDTD = MSO = UTA
Localtree languages
tree languages(Homogeneous) regular
Path-closedtree languages
EDTDSingle-type
DTD
EDTDRestrained-competition
DTWA
DetTATop-down
PhDs+S lawek (Mostrare) Queries on Trees 2008 56 / 128
Part II
Conjunctive Queries, Monadic Datalog
PhDs+S lawek (Mostrare) Queries on Trees 2008 57 / 128
Outline
3 Conjunctive Queries over TreesDefinition, results and acyclic fragmentTwigs and Tree Patterns
4 Monadic Datalog
PhDs+S lawek (Mostrare) Queries on Trees 2008 58 / 128
Conjunctive Queries
... seen as FO formulas
∃x . φ(x , y) where φ is a conjunction of atomic predicates.For instance:
∃x∃y∃w R1(x) ∧ R2(x , y) ∧ R3(x ,w , z)
... seen as rules
answer(z)← R1(x),R2(x , y),R3(x ,w , z)
... seen as terms of the Projection/Join algebra
πZ ( R1(X ) ⋊⋉ R2(X ,Y ) ⋊⋉ R3(X ,W ,Z ) )
These 3 formalisms are equivalent (see [AHV95]).PhDs+S lawek (Mostrare) Queries on Trees 2008 59 / 128
Conjunctive Queries over Trees
XPath axis X : ch, ch∗, ch+, ns, ns∗, ns
+, following and their inverse
following = (ch∗)−1 ◦ ns+ ◦ ch∗
Example
∃x∃y ch+(x , y) ∧ ch+(x , z) ∧ following(x , z)
x
y z/ -
ch+ch+
following
PhDs+S lawek (Mostrare) Queries on Trees 2008 60 / 128
Boolean Queries
Theorem ([GKS04])
Evaluation of Boolean CQ over X is NP-complete, even on a fixed tree.
Tractable fragments
X underbar property
Acyclic conjunctive queries
Twigs
PhDs+S lawek (Mostrare) Queries on Trees 2008 61 / 128
X property
R: a binary relation on the domain Dt of a tree t
a total order < on Dt
Definition
The relation R satisfies the X property wrt < if ∀n1, n2, n3, n4 st n1 < n2
and n3 < n4:
R
�
-
R
R
Rn1
n2
n3
n4
A set of relations R1, . . . ,Rn satisfies X wrt < if every Ri does.
PhDs+S lawek (Mostrare) Queries on Trees 2008 62 / 128
X property: Example
{ch+, ch∗} for the preorder <pre (ch+(x , y)⇒ x <pre y)
{ch, ns, ns+, ns∗} for <bflr
but not following for <pre1
2
3 4
5
6 7>
U
following
following R
�
2
3
4
6
-following
following
PhDs+S lawek (Mostrare) Queries on Trees 2008 63 / 128
Dichotomy Result
Theorem (Gottlob, Koch, Schulz, 2004)
For all F ⊆ X , CQ[F ] Boolean queries can be evaluated in PTIME iffthere is a total order < such that F satisfies the X property wrt <.
Question: generelization to n-ary queries? Which complexity measure?→ polynomial in the number of answers.
PhDs+S lawek (Mostrare) Queries on Trees 2008 64 / 128
Acyclic Conjunctive Queries (ACQ)
Acyclic: the query graph is acyclic
∃x∃y∃z , ns(x , y) ∧ ch∗(y , z)
x
y z
ns ch∗
PhDs+S lawek (Mostrare) Queries on Trees 2008 65 / 128
Expressiveness
[GKS04]
CQ[X ] (⋃
ACQ[X ] ⊆ FO[X ]exponential
x
y z/ ^
-
ch∗ch∗
following
-
x
u
y
v
z
ch∗ ch∗
ch+
ns+=
? ?
-
[Mar05b], over unranked trees,
⋃
ACQ[FO2] = FOnary
PhDs+S lawek (Mostrare) Queries on Trees 2008 66 / 128
ACQ Evaluation
Yannakakis algorithm: O(|q|.|db|.|q(db)|)
∃x R(x , y) ∧ R ′(x , z)
on trees t with predicates X : O(|q|.|t|2.|q(t)|)
∃x ch∗(x , y) ∧ ns∗(x , z)
PhDs+S lawek (Mostrare) Queries on Trees 2008 67 / 128
Outline
3 Conjunctive Queries over TreesDefinition, results and acyclic fragmentTwigs and Tree Patterns
4 Monadic Datalog
PhDs+S lawek (Mostrare) Queries on Trees 2008 68 / 128
Twigs: Testing containment [MS02]
Tree pattern
(unordered and unranked) tree labeled with elements from Σ ∪ {∗}child and descendant edgesn distinguished querying nodes (n-ary query)unary tree patterns (n = 1) equivalent to XPath(∗, [], //, /)
a
b
∗x1
cx2
a
b c
d c
Ans(p, t) = {(1 · 1, 2), (1 · 1, 2 · 1)}
Containment (problem statement)
p1 ⊆ p2 if and only if Ans(p1, t) ⊆ Ans(p2, t) for every t ∈ TΣ
PhDs+S lawek (Mostrare) Queries on Trees 2008 69 / 128
Booleanize your twigs
Boolean tree patterns
Tree patterns p with no querying nodes (n = 0)
Mod(p) = {t ∈ TΣ|t satisfies p}
Then, p1 ⊆ p2 if and only if Mod(p1) ⊆ Mod(p2)
p :
a
b
∗x1
cx2 pB :
a
b
∗
z1
c
z2
Proposition
For any two n-ary tree patterns p1 and p2: p1 ⊆ p2 ⇔ pB1 ⊆ pB
2
PhDs+S lawek (Mostrare) Queries on Trees 2008 70 / 128
Canonical models of Boolean twigs
p :
a
b
∗
c
a
b
∗
c
a
b
∗
∗
c
a
b
∗
∗
∗
c
. . .
PhDs+S lawek (Mostrare) Queries on Trees 2008 71 / 128
Canonical models of Boolean twigs
p :
a
b
∗
c
a
b
z
c
a
b
z
z
c
a
b
z
z
z
c
. . .
mod(p)
PhDs+S lawek (Mostrare) Queries on Trees 2008 71 / 128
Canonical models of Boolean twigs
p :
a
b
∗
c
a
b
z
c
a
b
z
z
c
a
b
z
z
z
c
. . .
mod(p)
Proposition
For any Boolean tree patterns p1 and p2:
p1 ⊆ p2 ⇔ mod(p1) ⊆ Mod(p2).
PhDs+S lawek (Mostrare) Queries on Trees 2008 71 / 128
Testing containment of Boolean twigs: Outline
unrankeda
b
∗
c
p
rankeda
b
∗
∗
∗
cLp
Up
unrankeda
b
z
z
z
cmod(p)
PhDs+S lawek (Mostrare) Queries on Trees 2008 72 / 128
Testing containment of Boolean twigs: Outline
unrankeda
b
∗
c
p
rankeda
b
∗
∗
∗
cLp
Up
unrankeda
b
z
z
z
cmod(p)
Main idea
p1 ⊆ p2 ⇔ mod(p1) ⊆ Mod(p2)⇔ Up1(Lp1) ⊆ Mod(p2)
⇔ Lp1 ⊆ U−1p1
(Mod(p2))⇔ Ap1 ⊆ Ap2 ,
where:
Ap1 : DFTA defining Lp1
Ap2 : AFTA defining U−1p1
(Mod(p2)) complexity:O(|p1|2|p2|)
PhDs+S lawek (Mostrare) Queries on Trees 2008 72 / 128
Testing containment: Conclusions
Positive results
p1 ⊆ p2 can be decided in time O(|p1||p2|wd), where:
d is the number of //-edges in p1
w is the maximal length of ∗/ ∗ / . . . /∗ in p2
Negative results
Deciding containment is coNP-complete. The result holds even if we:
bound the number of occurrences of ∗
bound the degree of the nodes of tree patterns
PhDs+S lawek (Mostrare) Queries on Trees 2008 73 / 128
Efficient evaluation of tree patterns
TwigStack [BKS02]
Interval representation used with a variant of B-tree index
Two phase approach:1 Find and stack (partial) solutions to leaf-to-root paths2 Join partial solutions
Linear in the size of the input and output
I/O and CPU optimal if only //-edges used
Twig2Stack [CLT+06]
Generalized tree pattern queries
One phase bottom-up approach
May stack elements that are not solutions
In the worst case the whole document may be stored in main memory
HollisticTwigStack [JLH+07] addresses this shortcomingPhDs+S lawek (Mostrare) Queries on Trees 2008 74 / 128
Outline
3 Conjunctive Queries over TreesDefinition, results and acyclic fragmentTwigs and Tree Patterns
4 Monadic Datalog
PhDs+S lawek (Mostrare) Queries on Trees 2008 75 / 128
Overview
Few words on datalog
Least fixed point
Monadic datalog over trees
PhDs+S lawek (Mostrare) Queries on Trees 2008 76 / 128
Datalog in (Very) Few Words
Language used in deductive databases
Extends conjunctive queries with recursion
Example: transitive closure of a graph
TC (x , y) :− Edge(x , y).TC (x , y) :− Edge(x , z),TC (z , y).
Model theoretic point of view:
∀x , y(Edge(x , y)→ TC (x , y))∀x , y , z((Edge(x , z) ∧ TC (z , y))→ TC (x , y))
Remark: no function symbols (finite models), no negation
See chapter 12 of [AHV95] for more details
PhDs+S lawek (Mostrare) Queries on Trees 2008 77 / 128
Least Fixed Point I
P is a fixed point of operator F if F (P) = P
The least fixed point lfp(F ) is the least element of the set of fixedpoints of F w.r.t. inclusion
Every monotone operator F (i.e., P ⊆ Q ⇒ F (P) ⊆ F (Q)) has aleast fixed point (Knaster-Tarski, cited by [Lib04]):
lfp(F ) =⋂
{P|F (P) = P}
Computing the least fixed point (standard closure):
P0 = ∅P i+1 = F (P i )lfp(F ) = P∞ =
⋃∞i=0 P i
Stabilizes after n steps on finite structures, i.e., P∞ = Pn
PhDs+S lawek (Mostrare) Queries on Trees 2008 78 / 128
Least Fixed Point IIDatalog immediate consequence operator TP(from [GK04]):
TP(Q) := Q ∪ {f | ∃φ,∃h :− b1, . . . , bn ∈ Pφ(h) = fφ(b1), . . . , φ(bn) ∈ Q }
Example: program P =
{
TC (x , y) :− Edge(x , y).TC (x , y) :− Edge(x , z),TC (z , y).
}
and
database Q = {Edge(1, 2),Edge(2, 3),Edge(3, 1)}
T 0P = Q = {Edge(1, 2),Edge(2, 3),Edge(3, 1)}
T 1P = T 0
P ∪ {TC (1, 2),TC (2, 3),TC (3, 1)}T 2P = T 1
P ∪ {TC (1, 3),TC (2, 1),TC (3, 2)}T 3P = T 2
P
Finally, lfp(TP)notation
= TωP = T 3
P = T 2P =
{Edge(1, 2),Edge(2, 3),Edge(3, 1),TC (1, 2),TC (2, 3),TC (3, 1), . . .}
PhDs+S lawek (Mostrare) Queries on Trees 2008 79 / 128
Monadic Datalog over Trees
Datalog with unary head predicates
Built-in predicates (for binary trees): root, leaf, (labela)a∈Σ, ch1,ch2
Example of query: select all nodes labeled by a at even height
Q0(x) :− root(x).Q(i+1)mod2(x) :− Qi (y), chk(y , x). (for k ∈ {1, 2})Ans(x) :− Q0(x), labela(x).
The query predicate is Ans
PhDs+S lawek (Mostrare) Queries on Trees 2008 80 / 128
Monadic Datalog over Trees: Complexity
Model Checking
Over ranked as well as unranked trees, monadic datalog hasO(|P| ∗ |dom|) combined complexity (theo. 4.2 of [GK04])
Proved by rewriting of P such that it is ground.
Satisfiability
Monadic datalog (over arbitrary finite structures) is NP-complete w.r.t.combined complexity (prop. 3.4 of [GK04])
Membership: guess a proof tree
Hardness: boolean conjuctive queries
For trees, satisfiability can be reduced to the emptiness problem forcontext-free languages [?]. What about the complexity?
PhDs+S lawek (Mostrare) Queries on Trees 2008 81 / 128
Monadic Datalog over Trees: Expressiveness
Equivalence with MSO
A tree language is definable in monadic datalog exactly if it is definable inMSO (coro. 4.7 of [GK04])
Sketch of proof (for monadic queries):
⇒ Encode the query defined by a monadic datalog program into anMSO formula (prop. 3.3 of [GK04])
⇐ More intricate, different ways of prooving it:1 Using ≡MSO
k -types (theo. 4.4 of [GK04])2 Simulating query automata of Neven & Schwentick [NS02] (Section
4.3 of [GK04])3 Encoding tree automata with selecting states? (next slides)
PhDs+S lawek (Mostrare) Queries on Trees 2008 82 / 128
Encoding a Tree Automaton A into a Monadic DatalogProgram P I
Rq(x) in lfp of P if a run of A can evaluate node x in state q:
a→ q ∈ rules(A)
Rq(x) :− leaf(x), labela(x).
f (q1, q2)→ q ∈ rules(A)
Rq(x) :− Rq1(y),Rq2(z), ch1(x , y), ch2(x , z), labelf (x).
PhDs+S lawek (Mostrare) Queries on Trees 2008 83 / 128
Encoding a Tree Automaton A into a Monadic DatalogProgram P II
L2Fq(x), aka LeadsToFinalq(x), in lfp of P if state q is used in a succesfulrun of A:
q ∈ final(A)
L2Fq(x) :− root(x).
f (q1, q2)→ q ∈ rules(A)
L2Fq1(y) :− L2Fq(x), ch1(x , y), ch2(x , z), labelf (x),Rq2(z).L2Fq2(z) :− L2Fq(x), ch1(x , y), ch2(x , z), labelf (x),Rq1(y).
PhDs+S lawek (Mostrare) Queries on Trees 2008 84 / 128
Encoding a Tree Automaton A into a Monadic DatalogProgram P III
Ans(x) in lfp of P if x is selected by automaton A, i.e., x is evaluated instate q ∈ S , where S ⊆ states(A) is the set of selecting states:
q ∈ S
Ans(x) :− Rq(x), L2Fq(x).
Proposition: Monadic datalog program P with Ans as query predicatesimulates tree automaton with selecting states A
PhDs+S lawek (Mostrare) Queries on Trees 2008 85 / 128
Part III
µ-calculus, Modal Logics (Temporal Logics, XPath...)
PhDs+S lawek (Mostrare) Queries on Trees 2008 86 / 128
Outline
5 µ-calculus
6 XPath
7 Temporal Logics
PhDs+S lawek (Mostrare) Queries on Trees 2008 87 / 128
Structure and formulae
The structure used here is the one used by Barcelo and Libkin. Mostof the results are taken from [BL05a, ABL07].
Tree t with two relations (or more) on position: child ≺ch and nextsibling ≺ns
Formulae of Lµ[≺]:◮ constants a◮ second order variables X◮ ⊤,⊥,¬φ, φ ∨ φ′
◮ ⋄(≺)φ◮ µX .φ where X can only appears positively in φ
PhDs+S lawek (Mostrare) Queries on Trees 2008 88 / 128
Interpretation
Given a tree t, nodes s, s ′ ∈ Domain(t) and a valuationv : X → P(Domain(t))
logic operators are interpreted as usual
(t, v , s) |= a iff t(s) = a
(t, v , s) |= X iff s ∈ v(X )
(t, v , s) |= ⋄(≺)φ iff (t, v , s ′) |= ⋄(≺)φ for some s ′ such that s ≺ s ′
(t, v , s) |= µX .φ iff s ∈ S where S is the least fix point of Fφ, definedby Fφ(P) = {s ′ | (t, v [P/X ], s ′) |= φ}
PhDs+S lawek (Mostrare) Queries on Trees 2008 89 / 128
Interpretation
(t, v , s) |= µX .φ(X ) iff s ∈ S where S is the least fix point of F
Problem: is there a least fix point?
The function P 7→ {s ′ | (t, v [P/X ], s ′) |= φ} is monotonicallyincreasing because X can only appear positively in µX .φ
PhDs+S lawek (Mostrare) Queries on Trees 2008 90 / 128
Interpretation
(t, v , s) |= µX .φ(X ) iff s ∈ S where S is the least fix point of F
Problem: is there a least fix point?
The function P 7→ {s ′ | (t, v [P/X ], s ′) |= φ} is monotonicallyincreasing because X can only appear positively in µX .φ
◮ Fa,F⊤,F⊥,FY are constant
PhDs+S lawek (Mostrare) Queries on Trees 2008 90 / 128
Interpretation
(t, v , s) |= µX .φ(X ) iff s ∈ S where S is the least fix point of F
Problem: is there a least fix point?
The function P 7→ {s ′ | (t, v [P/X ], s ′) |= φ} is monotonicallyincreasing because X can only appear positively in µX .φ
◮ Fa,F⊤,F⊥,FY are constant◮ FX (P) = P is increasing
PhDs+S lawek (Mostrare) Queries on Trees 2008 90 / 128
Interpretation
(t, v , s) |= µX .φ(X ) iff s ∈ S where S is the least fix point of F
Problem: is there a least fix point?
The function P 7→ {s ′ | (t, v [P/X ], s ′) |= φ} is monotonicallyincreasing because X can only appear positively in µX .φ
◮ Fa,F⊤,F⊥,FY are constant◮ FX (P) = P is increasing◮ if Fφ and F ′
φ both are increasing (resp. decreasing), thenFφ∨φ′(P) = Fφ(P) ∪ Fφ′(P) is increasing (resp. decreasing)
PhDs+S lawek (Mostrare) Queries on Trees 2008 90 / 128
Interpretation
(t, v , s) |= µX .φ(X ) iff s ∈ S where S is the least fix point of F
Problem: is there a least fix point?
The function P 7→ {s ′ | (t, v [P/X ], s ′) |= φ} is monotonicallyincreasing because X can only appear positively in µX .φ
◮ Fa,F⊤,F⊥,FY are constant◮ FX (P) = P is increasing◮ if Fφ and F ′
φ both are increasing (resp. decreasing), thenFφ∨φ′(P) = Fφ(P) ∪ Fφ′(P) is increasing (resp. decreasing)
◮ if Fφ is increasing (resp. decreasing) then F⋄(≺)φ is increasing (resp.decreasing)
PhDs+S lawek (Mostrare) Queries on Trees 2008 90 / 128
Interpretation
(t, v , s) |= µX .φ(X ) iff s ∈ S where S is the least fix point of F
Problem: is there a least fix point?
The function P 7→ {s ′ | (t, v [P/X ], s ′) |= φ} is monotonicallyincreasing because X can only appear positively in µX .φ
◮ Fa,F⊤,F⊥,FY are constant◮ FX (P) = P is increasing◮ if Fφ and F ′
φ both are increasing (resp. decreasing), thenFφ∨φ′(P) = Fφ(P) ∪ Fφ′(P) is increasing (resp. decreasing)
◮ if Fφ is increasing (resp. decreasing) then F⋄(≺)φ is increasing (resp.decreasing)
◮ if Fφ is increasing (resp. decreasing), then F¬φ(P) = Fφ(P) isdecreasing (resp. increasing)
PhDs+S lawek (Mostrare) Queries on Trees 2008 90 / 128
Interpretation
(t, v , s) |= µX .φ(X ) iff s ∈ S where S is the least fix point of F
Problem: is there a least fix point?
The function P 7→ {s ′ | (t, v [P/X ], s ′) |= φ} is monotonicallyincreasing because X can only appear positively in µX .φ
◮ Fa,F⊤,F⊥,FY are constant◮ FX (P) = P is increasing◮ if Fφ and F ′
φ both are increasing (resp. decreasing), thenFφ∨φ′(P) = Fφ(P) ∪ Fφ′(P) is increasing (resp. decreasing)
◮ if Fφ is increasing (resp. decreasing) then F⋄(≺)φ is increasing (resp.decreasing)
◮ if Fφ is increasing (resp. decreasing), then F¬φ(P) = Fφ(P) isdecreasing (resp. increasing)
◮ if Fφ is increasing then FµX .φ(P) is increasing
PhDs+S lawek (Mostrare) Queries on Trees 2008 90 / 128
Unary and boolean queries
A formula φ from Lµ can be used as a unary query which selects in t thenodes s such that
(t, ., s) |= φ
PhDs+S lawek (Mostrare) Queries on Trees 2008 91 / 128
Unary and boolean queries
A formula φ from Lµ can be used as a unary query which selects in t thenodes s such that
(t, ., s) |= φ
A formula φ from Lµ can be used as a boolean query which accepts a treet iff
(t, ., ε) |= φ
Example
Selects nodes which are ancestors of a node labelled by a:
µX .(a ∨ ⋄(≺ch)X ).
PhDs+S lawek (Mostrare) Queries on Trees 2008 91 / 128
Expressivenessof boolean queries
Lµ[≺ch,≺ns] cannot express first child...◮ Lµ[≺ch,≺ns,≺fc]◮ Lfull
µ [≺ch,≺ns]: one can use ⋄(≺−)φ, where s ≺− s ′ iff s ′ ≺ s
Lµ[≺ch,≺ns,≺fc] = Lfullµ [≺ch,≺ns] = MSO
PhDs+S lawek (Mostrare) Queries on Trees 2008 92 / 128
Lµ[≺ch,≺ns,≺fc] ⊆ MSO
One can rewrite any Lµ[≺ch,≺ns,≺fc] formula into a MSO query asfollow:
〈a〉(x) = labela(x)
〈µX .φ〉(z) = ∃X (z ∈ X ∧ ∀x ∈ X ⇒ 〈φ〉(x) ∧ (∀Y (∀y ∈ Y ⇒〈φ〉(y))⇒ X ⊆ Y ))
PhDs+S lawek (Mostrare) Queries on Trees 2008 93 / 128
Lµ[≺ch,≺ns,≺fc] ⊆ MSO
One can rewrite any Lµ[≺ch,≺ns,≺fc] formula into a MSO query asfollow:
〈a〉(x) = labela(x)
〈X 〉(x) = x ∈ X
〈µX .φ〉(z) = ∃X (z ∈ X ∧ ∀x ∈ X ⇒ 〈φ〉(x) ∧ (∀Y (∀y ∈ Y ⇒〈φ〉(y))⇒ X ⊆ Y ))
PhDs+S lawek (Mostrare) Queries on Trees 2008 93 / 128
Lµ[≺ch,≺ns,≺fc] ⊆ MSO
One can rewrite any Lµ[≺ch,≺ns,≺fc] formula into a MSO query asfollow:
〈a〉(x) = labela(x)
〈X 〉(x) = x ∈ X
〈⋄(≺ch)φ〉(x) = ∃y | ch(y , x) ∧ 〈φ〉(y), ...
〈µX .φ〉(z) = ∃X (z ∈ X ∧ ∀x ∈ X ⇒ 〈φ〉(x) ∧ (∀Y (∀y ∈ Y ⇒〈φ〉(y))⇒ X ⊆ Y ))
PhDs+S lawek (Mostrare) Queries on Trees 2008 93 / 128
Lµ[≺ch,≺ns,≺fc] ⊆ MSO
One can rewrite any Lµ[≺ch,≺ns,≺fc] formula into a MSO query asfollow:
〈a〉(x) = labela(x)
〈X 〉(x) = x ∈ X
〈⋄(≺ch)φ〉(x) = ∃y | ch(y , x) ∧ 〈φ〉(y), ...
〈µX .φ〉(z) = ∃X (z ∈ X ∧ ∀x ∈ X ⇒ 〈φ〉(x) ∧ (∀Y (∀y ∈ Y ⇒〈φ〉(y))⇒ X ⊆ Y ))
PhDs+S lawek (Mostrare) Queries on Trees 2008 93 / 128
Lµ[≺ch,≺ns,≺fc] ⊆ MSO
One can rewrite any Lµ[≺ch,≺ns,≺fc] formula into a MSO query asfollow:
〈a〉(x) = labela(x)
〈X 〉(x) = x ∈ X
〈⋄(≺ch)φ〉(x) = ∃y | ch(y , x) ∧ 〈φ〉(y), ...
〈µX .φ〉(z) = ∃X (z ∈ X ∧ ∀x ∈ X ⇒ 〈φ〉(x) ∧ (∀Y (∀y ∈ Y ⇒〈φ〉(y))⇒ X ⊆ Y ))
Finally, the whole query will be ∃x root(x) ∧ 〈φ〉(x)
PhDs+S lawek (Mostrare) Queries on Trees 2008 93 / 128
MSO ⊆ Lµ[≺ch,≺ns,≺fc]
Given a MSO query, let A be an equivalent deterministic automaton. Wecan encode A with a Lµ[≺ch,≺ns,≺fc] formula.
Example
On ranked trees, ≺ch1,≺ch2,Automaton Q = {qa, qb} ,QF = {qa}a→ qa, b → qb, f (qa, qb)→ qa, f (qb, qa)→ qb
µXa.a ∨ f ∧ ⋄(≺ch1)Xa ∧ ⋄(≺ch2)(µXb.b ∨ f ∧ ⋄(≺ch1)Xa ∧ ⋄(≺ch2))
PhDs+S lawek (Mostrare) Queries on Trees 2008 94 / 128
Expressivenessof unary queries
Lµ[≺ch,≺ns,≺fc] cannot express root...
we need to use Lfullµ [≺ch,≺ns]
Lfullµ [≺ch,≺ns] = MSO
PhDs+S lawek (Mostrare) Queries on Trees 2008 95 / 128
Expressivenessof unary queries
Lµ[≺ch,≺ns,≺fc] cannot express root...
we need to use Lfullµ [≺ch,≺ns]
Lfullµ [≺ch,≺ns] = MSO
PhDs+S lawek (Mostrare) Queries on Trees 2008 95 / 128
Expressivenessof unary queries
Lµ[≺ch,≺ns,≺fc] cannot express root...
we need to use Lfullµ [≺ch,≺ns]
Lfullµ [≺ch,≺ns] = MSO
Proofs: similar to Boolean queries, but with query automata instead
PhDs+S lawek (Mostrare) Queries on Trees 2008 95 / 128
Complexities
Because the structure of trees are acyclic, model checking ofLfull
µ [≺ch,≺sb] can be computed in O(|φ|2 |t|). Can be reduced for asubclass of Lµ (as expressive as MSO) to O(|φ| |t|)
Satisfiability of Lfullµ [≺ch,≺sb] is EXPTIME (slightly better bounds in
the case of tree than in the general case)
PhDs+S lawek (Mostrare) Queries on Trees 2008 96 / 128
Outline
5 µ-calculus
6 XPath
7 Temporal Logics
PhDs+S lawek (Mostrare) Queries on Trees 2008 97 / 128
First-order modal logicson Unranked Trees
Strong links between:
XPath
Modal Logics (temporal, propositional...)
FO
PhDs+S lawek (Mostrare) Queries on Trees 2008 98 / 128
First-order modal logicson Unranked Trees
Strong links between:
XPath
Modal Logics (temporal, propositional...)
FO
→ remember the first slides about the model and FO
PhDs+S lawek (Mostrare) Queries on Trees 2008 98 / 128
First-order modal logicson Unranked Trees
Strong links between:
XPath
Modal Logics (temporal, propositional...)
FO
→ remember the first slides about the model and FO→ we won’t talk about L-definability (i.e., given an automaton, is itequivalent to a formula of the logic L?). See [Boj08a] for a survey.
PhDs+S lawek (Mostrare) Queries on Trees 2008 98 / 128
Binary vs Unranked Trees
FO-definable queries on binary trees?
“select trees with even number of nodes”
PhDs+S lawek (Mostrare) Queries on Trees 2008 99 / 128
Binary vs Unranked Trees
FO-definable queries on binary trees?
“select trees with even number of nodes” X (always false)
PhDs+S lawek (Mostrare) Queries on Trees 2008 99 / 128
Binary vs Unranked Trees
FO-definable queries on binary trees?
“select trees with even number of nodes” X (always false)
“select trees with even number of a-nodes”
PhDs+S lawek (Mostrare) Queries on Trees 2008 99 / 128
Binary vs Unranked Trees
FO-definable queries on binary trees?
“select trees with even number of nodes” X (always false)
“select trees with even number of a-nodes” x
PhDs+S lawek (Mostrare) Queries on Trees 2008 99 / 128
Binary vs Unranked Trees
FO-definable queries on binary trees?
“select trees with even number of nodes” X (always false)
“select trees with even number of a-nodes” x
“select trees that have a leaf of even depth”
PhDs+S lawek (Mostrare) Queries on Trees 2008 99 / 128
Binary vs Unranked Trees
FO-definable queries on binary trees?
“select trees with even number of nodes” X (always false)
“select trees with even number of a-nodes” x
“select trees that have a leaf of even depth” X (zigzag technic)
PhDs+S lawek (Mostrare) Queries on Trees 2008 99 / 128
Binary vs Unranked Trees
FO-definable queries on binary trees?
“select trees with even number of nodes” X (always false)
“select trees with even number of a-nodes” x
“select trees that have a leaf of even depth” X (zigzag technic)
not clear whether the last query is FO-definable on unranked trees.
PhDs+S lawek (Mostrare) Queries on Trees 2008 99 / 128
XPath 1.0: a W3C recommendation (since 1999)
Example:/descendant :: a[position() > last() ∗ 0.5 or self :: ∗ = 100]Features:
select nodes (monadic queries)
navigation through axis (child... following, preceding)
node test and filters: /ax1::ntst1[f1][f2[f3]]/...
context-sensitive functions (position, last...)
element types (element, attribute, instruction, comments)
arithmetic operators (+,−...)
data operators/comparators (string-length...)
aggregators (count, sum...)
identifiers functions...
type conversion functions...
PhDs+S lawek (Mostrare) Queries on Trees 2008 100 / 128
XPath axes [Shi08]
PhDs+S lawek (Mostrare) Queries on Trees 2008 101 / 128
XPath 1.0
first implementations: exponential time in the size of the query
PTIME combined complexity obtained in [GKP02, GKP03a]:O(|D|2.|Q|4) in time, O(|D|2.|Q|2) in space.
PhDs+S lawek (Mostrare) Queries on Trees 2008 102 / 128
XPath 1.0
first implementations: exponential time in the size of the query
PTIME combined complexity obtained in [GKP02, GKP03a]:O(|D|2.|Q|4) in time, O(|D|2.|Q|2) in space.
Questions:
linear time fragment?
expressiveness? links to other logics?
PhDs+S lawek (Mostrare) Queries on Trees 2008 102 / 128
CoreXPathThe navigational core of XPath
defined by Gottlob, Koch and Pichler [GKP02, GKP03a]
restriction to navigation through axis, filters, and nodetests
locpath ::= axis :: ntst | axis :: ntst[fexpr ] | /locpath | locpath/locpathfexpr ::= locpath | not fexpr | fexpr and fexpr | fexpr or fexpr
axis ::= self | ch | ch+ | ch∗ | ch−1 | ch−1
+ | ch−1∗ | ns+ | ns
−1+
ntst ::= a, a ∈ Σ | ∗
document order axis following and preceding are syntactic sugar:
following :: ntst[fexpr ] ≡ ch−1∗ :: ∗/ns+ :: ∗/ch∗ :: ntst[fexpr ]
preceding :: ntst[fexpr ] ≡ ch−1∗ :: ∗/ns−1
+ :: ∗/ch∗ :: ntst[fexpr ]
PhDs+S lawek (Mostrare) Queries on Trees 2008 103 / 128
CoreXPath complexity [GKP03b]
query evaluation becomes linear: O(|D|.|Q|)
it is P-hard wrt. combined complexity...
... even when t is limited to depth 3 and only axes ch, ch−1, ch∗ areallowed
Positive-CoreXPath is LOGCFL-complete
satisfiability is EXPTIME-complete
PhDs+S lawek (Mostrare) Queries on Trees 2008 104 / 128
CoreXPath expressiveness
CoreXPath ⊆ FO
CoreXPath /ch+ :: a [ch :: b] /ch :: c
variables y z x
φ(x) = ∃y . labela(y) ∧ ∃z . labelc(z) ∧ labelc(x)∧ ch(y , z) ∧ ch(y , x)
PhDs+S lawek (Mostrare) Queries on Trees 2008 105 / 128
CoreXPath expressiveness
CoreXPath ⊆ FO
CoreXPath /ch+ :: a [ch :: b] /ch :: c
variables y z x
φ(x) = ∃y . labela(y) ∧ ∃z . labelc(z) ∧ labelc(x)∧ ch(y , z) ∧ ch(y , x)
FO * CoreXPath
example: select root if the leaf language is (ab)∗.
in fact, CoreXPath = FO21 [Mar05b]
PhDs+S lawek (Mostrare) Queries on Trees 2008 105 / 128
ExpressivenessA B A ( B
A B A ⊆ B
A B A * B
FO1
FO21 = CoreXPath
PhDs+S lawek (Mostrare) Queries on Trees 2008 106 / 128
CondXPath [Mar04]Conditional XPath
CondXPath =CoreXPath+ axis: ns, ns∗, ns
−1, ns−1∗
+ until operator: (axis :: ntst[fexpr ])+ with axis ∈ {ch, ch−1, ns, ns−1}
CondXPath has the same complexity as CoreXPath (for both queryevaluation and satisiability).
PhDs+S lawek (Mostrare) Queries on Trees 2008 107 / 128
CondXPath [Mar04]Conditional XPath
CondXPath =CoreXPath+ axis: ns, ns∗, ns
−1, ns−1∗
+ until operator: (axis :: ntst[fexpr ])+ with axis ∈ {ch, ch−1, ns, ns−1}
CondXPath has the same complexity as CoreXPath (for both queryevaluation and satisiability).
CondXPath ⊆ FO
For instance (ch :: a[ns∗ :: b])+ translates to the FO formula:φ(x , y) =∃z . ns∗(y , z) ∧ labelb(z) ∧¬(∃s. ch∗(x , s) ∧ ch∗(s, y) ∧ (¬labela(s) ∨ ¬∃s ′. ns∗(s, s ′) ∧ labelb(s ′)))
PhDs+S lawek (Mostrare) Queries on Trees 2008 107 / 128
CondXPath [Mar04]Expressiveness
How to prove that FO ⊆ CondXPath?
PhDs+S lawek (Mostrare) Queries on Trees 2008 108 / 128
CondXPath [Mar04]Expressiveness
How to prove that FO ⊆ CondXPath?Marx uses an intermediate logic: Xuntil.
CondXPath FO
Xuntil
previous slide
21
PhDs+S lawek (Mostrare) Queries on Trees 2008 108 / 128
Xuntil
Syntax
ϕ ::≡ a | ⊤ | ¬ϕ | ϕ ∧ ϕ′ | θ(ϕ,ϕ′) (a ∈ Σ, θ ∈ {⇓,⇐,⇒,⇑})
Arrows are interpreted as transitive closures of corresponding axis.
Semantics(t, π) |= a iff labelt
a(π)(t, π) |= ¬ϕ iff (t, π) 6|= ϕ(t, π) |= ϕ ∧ ϕ′ iff (t, π) |= ϕ and (t, π) |= ϕ′
(t, π) |= θ(ϕ,ϕ′) iff there exists π′ s.t. θ+(π, π′) and (t, π′) |= ϕand for all π′′ s.t. πθ+π′′θ+π′, (t, π′′) |= ϕ′
PhDs+S lawek (Mostrare) Queries on Trees 2008 109 / 128
1. From Xuntil to CondXPath
Xuntil → CondXPath
r(a) = self :: ar(¬ϕ) = not r(ϕ)
r(ϕ ∧ ϕ′) = r(ϕ) and r(ϕ′)r(θ(ϕ,ϕ′)) = θ :: ∗[r(ϕ)] or (θ :: ∗[r(ϕ′)])+/θ :: ∗[r(ϕ)]
PhDs+S lawek (Mostrare) Queries on Trees 2008 110 / 128
2. From FO to XuntilSeparation technic
Theorem ([GHR94], adapted in [Mar04])
If every Xuntil formula is separable over trees, then Xuntil is FO-expressive.
“ϕ separable” means:equivalent to a Booleancombination of purepast/present/future/left/rightformula
PhDs+S lawek (Mostrare) Queries on Trees 2008 111 / 128
2. From FO to XuntilSeparation technic
Theorem ([Mar04])
Each Xuntil formula is separable.
Query rewriting... with blowup.
PhDs+S lawek (Mostrare) Queries on Trees 2008 112 / 128
Alternative proof
Theorem ([Mar05b])
any expansion of CoreXPath which is closed under complementationis FO-expressive
CondXPath is closed under complementation
PhDs+S lawek (Mostrare) Queries on Trees 2008 113 / 128
ExpressivenessA B A ( B
A B A ⊆ B
A B A * B
FO1 = CondXPath
FO21 = CoreXPath
PhDs+S lawek (Mostrare) Queries on Trees 2008 114 / 128
RegularXPath≈ [tC06]
RegularXPath =
CoreXPath+ axis: ns, ns∗, ns
−1, ns−1∗
+ transitive closure: (RegularXPath expression)∗
RegularXPath≈ =
RegularXPath+ loop predicate: [loop(ϕ)]t = {π ∈ Dt | (π, π) ∈ [ϕ]t}
Both have PTIME combined complexity for query evaluation.
PhDs+S lawek (Mostrare) Queries on Trees 2008 115 / 128
RegularXPath≈ [tC06]Expressiveness
Theorem
RegularXPath≈ and FO + TC 1 have the same expressive power.
In a preceding section, we saw that FO + TC 1 is strictly less expressivethan MSO [tCS08].
Corollary
The class of binary relations definable in RegularXPath≈ is closed underintersection and complementation.
It is only conjectured that adding loop increases expressivity, i.e., thatRegularXPath ( RegularXPath≈.
PhDs+S lawek (Mostrare) Queries on Trees 2008 116 / 128
ExpressivenessA B A ( B
A B A ⊆ B
A B A * B MSO
FO + TC1 = RegularXPath≈
FO = CondXPath
FO2 = CoreXPath
[BC05, BSSS06]
[tCS08]
PhDs+S lawek (Mostrare) Queries on Trees 2008 117 / 128
RegularXPath variants
µRegularXPath adds a fixed-point operator [tC06] → MSO
RegularXPath(W ) adds a “subtree relativisation operator” [tCS08]
Beware: RegularXPath(W ) = FO + TC 1p , whereas
RegularXPath≈ = FO + TC 1. Remind that it is not known whether theinclusion FO + TC 1 ⊆ FO + TC 1
p is strict.
PhDs+S lawek (Mostrare) Queries on Trees 2008 118 / 128
XPath 2.0
XPath 2.0
adds the following features to XPath:
for loops: for $i in R return S
Boolean intersection (intersect) and complementation (except) onpath expressions
variables: n-ary queries
node comparison tests (is)
PhDs+S lawek (Mostrare) Queries on Trees 2008 119 / 128
CoreXPath 2 [tCM07]
CoreXPath 2
for loops are interpreted as sets of nodes, not sequences
no positional/aggregate: position(), last(), count()
no value comparison operators
PhDs+S lawek (Mostrare) Queries on Trees 2008 120 / 128
CoreXPath 2 [tCM07]
CoreXPath 2
for loops are interpreted as sets of nodes, not sequences
no positional/aggregate: position(), last(), count()
no value comparison operators
adding the last 2 features leads to undecidability.
equivalence of CoreXPath 2 queries is decidable.
of course, CoreXPath 2 is FO-expressive (adding except toCoreXPath is already sufficient).
CoreXPath 2 ↔ FO translations in linear time
PhDs+S lawek (Mostrare) Queries on Trees 2008 120 / 128
Outline
5 µ-calculus
6 XPath
7 Temporal Logics
PhDs+S lawek (Mostrare) Queries on Trees 2008 121 / 128
Preliminaries: Linear Temporal Logic (LTL)
Syntax
ϕ ::≡ a | ¬ϕ | ϕ ∨ ψ | Xϕ | X− ϕ | ϕU ψ | ϕ S ψ
Semantics
Structure: s = s0s1 · · · sn a string over ΣInterpretation: (s, i) |= ϕ (ϕ is satisfied in s at position i)label (s, i) |= a iff si = a (i.e. labels(i) = a)next (s, i) |= Xϕ iff (s, i + 1) |= ϕprev (s, i) |= X− ϕ iff (s, i − 1) |= ϕuntil (s, i) |= ϕU ψ iff ∃j ≥ i . (s, j) |= ψ ∧ ∀k ∈ {i , . . . , j − 1}. (s, k) |= ϕsince (s, i) |= ϕ S ψ iff ∃j ≤ i . (s, j) |= ψ ∧ ∀k ∈ {j + 1, . . . , i}. (s, k) |= ϕ
PhDs+S lawek (Mostrare) Queries on Trees 2008 122 / 128
Querying and expressivity
LTL Boolean queries
QA(ϕ, s) = true⇔ (s, 0) |= ϕ
LTL unary queries
QA(ϕ, s) ={
i ∈ {0, . . . , |s|} : (s, i) |= ϕ}
Kamp’s Theorem.
Over strings, LTL = FO
PhDs+S lawek (Mostrare) Queries on Trees 2008 123 / 128
Tree Temporal Logic TLtree
Syntax
ϕ ::≡ a | ¬ϕ | ϕ ∨ ψ | Xθ ϕ | X−θ ϕ | ϕUθ ψ | ϕ Sθ ψ (θ ∈ {↓,←})
Semantics
(t, π) |= ϕ reads “ϕ is satisfied in t at node π”(t, π) |= a iff labelt(π) = a(t, π) |= X↓ϕ iff ∃π′ such that π ↓ π′ and (t, π′) |= ϕ.etc.
Theorem [Mar05a]
Over unranked ordered trees, TLtree = FO (Boolean and unary queries)
PhDs+S lawek (Mostrare) Queries on Trees 2008 124 / 128
Computational tree logic CTL∗past
Syntax
Node formulas: Φ ::≡ a | ¬Φ | Φ ∨Ψ | E↓ ϕ | E→ ϕ
Path formulas: ϕ = Φ? | ¬ϕ | ϕ ∨ ψ | Xϕ | X− ϕ | ϕU ψ | ϕ S ψ
Semantics
(t, π) |= Φ reads “Φ is satisfied in t at node π”(t, π) |= E↓ϕ iff ∃π1 ↓ · · · ↓ πi−1 ↓ π ↓ πi+1 ↓ · · · ↓ πk such that(π1 · · ·πk , i) |=p ϕ where:(π1 · · ·πk , i) |=p Φ? iff (t, πi ) |= Φ etc.
Theorem [BL05b]
Over unranked ordered trees, CTL∗past = FO (Boolean and unary queries)
PhDs+S lawek (Mostrare) Queries on Trees 2008 125 / 128
Propositional Dynamic Logic for trees PDLtree [ABD+05]
Syntax
Path formulas:
σ ::≡ ← | → | ↓ | ↑ | σ/σ′ | σ ∪ σ′ | σ∗ | ϕ?
Propositions:ϕ ::≡ a | ¬ϕ | ϕ ∨ ψ | Xσ ϕ |
Semantics
σ defines a binary relation [[σ]]t on nodes of t(t, π) |= Xσ ϕ iff ∃π′ such that π [[σ]]t π
′ and (t, π′) |= ϕ
PhDs+S lawek (Mostrare) Queries on Trees 2008 126 / 128
Expressivity of PDLtree [ABD+05]
Theorem
PDLtree is equivalent to Regular XPath.
Theorem
PDLtree restricted to
σ ::≡ ← | → | ↓ | ↑ | σ∗ | σ/ϕ?
is equivalent to Conditional XPath which is equivalent to FO.
Theorem
PDLtree restricted toσ ::≡ ← | → | ↓ | ↑ | σ∗
is equivalent to Core XPath which is equivalent to FO2.
PhDs+S lawek (Mostrare) Queries on Trees 2008 127 / 128
References
PhDs+S lawek (Mostrare) Queries on Trees 2008 128 / 128
[ABD+05] Loredana Afanasiev, Patrick Blackburn, Ioanna Dimitriou,Bertrand Gaiffe, Evan Goris, Maarten Marx, and Maartende Rijke.PDL for ordered trees.Journal of Applied Non-Classical Logics, 15(2):115–135, 2005.
[ABL07] Marcelo Arenas, Pablo Barcelo, and Leonid Libkin.Combining temporal logics for querying XML documents.In Springer-Verlag, editor, Proceedings of InternationalConference on Database Theory, volume 4353 of LectureNotes in Computer Science, pages 359–374, 2007.
[AHV95] Serge Abiteboul, Richard Hull, and Victor Vianu.Foundations of Databases.1995.
[AU71] A. V. Aho and J. D. Ullmann.Translations on a context-free grammar.Information and Control, 19:439–475, 1971.
[BC04] Miko laj Bojanczyk and Thomas Colcombet.
PhDs+S lawek (Mostrare) Queries on Trees 2008 128 / 128
Tree-walking automata cannot be determinized.In 31st International Colloquium on Automata, Languagesand Programming, Lecture Notes in Computer Science, pages246–256. Springer Verlag, 2004.
[BC05] Miko laj Bojanczyk and Thomas Colcombet.Tree-walking automata do not recognize all regular languages.In 37th Annual ACM Symposium on Theory of Computing,pages 234–243, New York, NY, USA, 2005. ACM-Press.
[BDM+06] Miko laj Bojanczyk, Claire David, Anca Muscholl, ThomasSchwentick, and Luc Segoufin.Two-variable logic on data trees and XML reasoning.In Twenty-fifth ACM SIGACT-SIGMOD-SIGART Symposiumon Principles of Database Systems, pages 10–19, 2006.
[BKS02] Nicolas Bruno, Nick Koudas, and Divesh Srivastava.Holistic twig joins: optimal xml pattern matching.In SIGMOD Conference, pages 310–321, 2002.
[BKW98] Anne Bruggemann-Klein and Derick Wood.
PhDs+S lawek (Mostrare) Queries on Trees 2008 128 / 128
One-unambiguous regular languages.Information and Computation, 142(2):182–206, May 1998.
[BKW00] Anne Bruggemann-Klein and Derick Wood.Caterpillars: A context specification technique.Markup Languages, 2(1):81–106, 2000.
[BKWM01] Anne Bruggemann-Klein, Derick Wood, and Makoto Murata.Regular tree and regular hedge languages over unrankedalphabets: Version 1, April 07 2001.
[BL05a] Pablo Barcelo and Leonid Libkin.Temporal logics over unranked trees.In Proceedings of the IEEE Symposium on Logic in ComputerScience, pages 31–40, 2005.
[BL05b] Pablo Barcelo and Leonid Libkin.Temporal logics over unranked trees.In 20th Annual IEEE Symposium on Logic in ComputerScience, pages 31–40. IEEE Comp. Soc. Press, 2005.
PhDs+S lawek (Mostrare) Queries on Trees 2008 128 / 128
[BMS+06] Miko laj Bojanczyk, Anca Muscholl, Thomas Schwentick, LucSegoufin, and Claire David.Two-variable logic on words with data.In 21st Annual IEEE Symposium on Logic in ComputerScience, pages 7–16. IEEE Comp. Soc. Press, 2006.
[Boj04] Miko laj Bojanczyk.Decidable Properties of Tree Languages.PhD thesis, Warsaw University, 2004.
[Boj08a] Miko laj Bojanczyk.Effective characterizations of tree logics, 2008.PODS’08 Keynote.
[Boj08b] Miko laj Bojanczyk.Tree-walking automata.Tutorial at LATA’08, 2008.
[BS05] Michael Benedikt and Luc Segoufin.Regular tree languages definable in FO and FOmod.
PhDs+S lawek (Mostrare) Queries on Trees 2008 128 / 128
In 22nd International Symposium on Theoretical Aspects ofComputer Science, volume 3404 of Lecture Notes inComputer Science, pages 327–339. Springer Verlag, 2005.
[BSSS06] Miko laj Bojanczyk, Mathias Samuelides, Thomas Schwentick,and Luc Segoufin.Expressive power of pebbles automata.In International Colloquium on Automata Languages andProgramming (ICALP’06), Lecture Notes in ComputerScience, pages 157–168. Springer Verlag, 2006.
[CLT+06] Songting Chen, Hua-Gang Li, Jun’ichi Tatemura, Wang-PinHsiung, Divyakant Agrawal, and K. Selcuk Candan.Twig2stack: Bottom-up processing of generalized-tree-patternqueries over xml documents.In VLDB, pages 283–294, 2006.
[CM01] James Clark and Murata Makoto.Relax ng specification, 2001.
[CNT04] Julien Carme, Joachim Niehren, and Marc Tommasi.
PhDs+S lawek (Mostrare) Queries on Trees 2008 128 / 128
Querying unranked trees with stepwise tree automata.In 19th International Conference on Rewriting Techniques andApplications, volume 3091 of Lecture Notes in ComputerScience, pages 105–118. Springer Verlag, 2004.
[EF99] H. Ebbinghaus and J. Flum.Finite Model Theory.Springer Verlag, Berlin, 1999.
[EH99] Joost Engelfriet and Hendrik Jan Hoogeboom.Tree-walking pebble automata.In Juhani Karhumaki, Hermann A. Maurer, Gheorghe Paun,and Grzegorz Rozenberg, editors, Jewels are Forever,Contributions on Theoretical Computer Science in Honor ofArto Salomaa, pages 72–83, London, UK, 1999.Springer-Verlag.
[EH06] Joost Engelfriet and Hendrik Jan Hogeboom.Nested pebbles and transitive closure.
PhDs+S lawek (Mostrare) Queries on Trees 2008 128 / 128
In 23rd Symposium on Theoretical Aspects of ComputerScience, volume 3884 of Lecture Notes in Computer Science,pages 477–488. Springer Verlag, 2006.
[EHS07] Joost Engelfriet, Hendrik Jan Hoogeboom, and Bart Samwel.Xml transformation by tree-walking transducers with invisiblepebbles.In PODS ’07: Proceedings of the twenty-sixth ACMSIGMOD-SIGACT-SIGART symposium on Principles ofdatabase systems, pages 63–72, New York, NY, USA, 2007.ACM.
[EI95] Kousha Etessami and Neil Immerman.Reachability and the power of local ordering.Theoretical Computer Science, 148(2):227–260, 1995.
[Fag75] Ronald Fagin.Monadic generalized spectra.Zeitschrift fur Mathematische Logik und Grundlagen derMathematik, 21:89–96, 1975.
PhDs+S lawek (Mostrare) Queries on Trees 2008 128 / 128
[FG02] Markus Frick and Martin Grohe.The complexity of first-order and monadic second-order logicrevisited.In LICS ’02: Proceedings of the 17th Annual IEEESymposium on Logic in Computer Science, pages 215–224,Washington, DC, USA, 2002.
[FGK03] Markus Frick, Martin Grohe, and Christoph Koch.Query evaluation on compressed trees.In 18th IEEE Symposium on Logic in Computer Science,pages 188–197, 2003.
[GHR94] D.M. Gabbay, I. Hodkinson, and M. Reynolds.Temporal Logic (Volume 1: Mathematical Foundations andComputational Aspects).Oxford Science Publications, 1994.
[GK04] Georg Gottlob and Christoph Koch.Monadic datalog and the expressive power of languages forweb information extraction.
PhDs+S lawek (Mostrare) Queries on Trees 2008 128 / 128
Journal of the ACM, 51(1):74–113, 2004.
[GKP02] Georg Gottlob, Christoph Koch, and Reinhard Pichler.Efficient algorithms for processing xpath queries.In 28th International Conference on Very Large Data Bases,pages 95–106, Hong Kong, 2002.
[GKP03a] G. Gottlob, C. Koch, and R. Pichler.Xpath query evaluation: Improving time and space efficiency.In In Proceedings of the 19th IEEE International Conferenceon Data Engineering (ICDE 03), 2003.
[GKP03b] Georg Gottlob, Christoph Koch, and Reinhard Pichler.The complexity of xpath query evaluation.In 22nd ACM SIGMOD-SIGACT-SIGART Symposium onPrinciples of Database Systems, pages 179–190, 2003.
[GKS04] Georg Gottlob, Christoph Koch, and Klaus U. Schulz.Conjunctive queries over trees.
PhDs+S lawek (Mostrare) Queries on Trees 2008 128 / 128
In Proceedings of the 23rd ACM SIGMOD-SIGACT-SIGARTSymposium on Principles of Database Systems, pages189–200, New York, NY, USA, 2004. ACM-Press.
[GO99] Erich Gradel and Martin Otto.On logics with two variables.Theoretical Computer Science, 224:73–113, 1999.
[HP03] Haruo Hosoya and Benjamin Pierce.Regular expression pattern matching for XML.Journal of Functional Programming, 6(13):961–1004, 2003.
[Imm82] Neil Immerman.Upper and lower bounds for first order expressibility.Journal of Computer and System Science, 25:76–98, 1982.
[JLH+07] Zhewei Jiang, Cheng Luo, Wen-Chi Hou, Qiang Zhu, andDunren Che.Efficient processing of xml twig pattern: A novel one-phaseholistic solution.In DEXA, pages 87–97, 2007.
PhDs+S lawek (Mostrare) Queries on Trees 2008 128 / 128
[Kep06] Stephan Kepser.Properties of binary transitive closure logics over trees.In 11th conference on Formal Grammar, 2006.
[Lib04] Leonid Libkin.Elements of Finite Model Theory.Springer Verlag, 2004.
[Lib06] Leonid Libkin.Logics over unranked trees: an overview.Logical Methods in Computer Science, 3(2):1–31, 2006.
[Mar04] Maarten Marx.Conditional XPath, the first order complete XPath dialect.In ACP SIGACT-SIGMOD-SIGART Symposium on Principlesof Database Systems, pages 13–22. ACM-Press, 2004.
[Mar05a] Maarten Marx.Conditional XPath.ACM Transactions on Database Systems, 30(4):929–959,2005.
PhDs+S lawek (Mostrare) Queries on Trees 2008 128 / 128
[Mar05b] Maarten Marx.First order paths in ordered trees.In International Conference on Database Theory, pages114–128, 2005.
[MLM01] M. Murata, D. Lee, and M. Mani.Taxonomy of XML schema languages using formal languagetheory.In Extreme Markup Languages, Montreal, Canada, 2001.
[MNS04] Wim Martens, Frank Neven, and Thomas Schwentick.Complexity of decision problems for simple regularexpressions.In Mathematical Foundations of Computer Science 2004, 29thInternational Symposium, pages 889–900, 2004.
[MNSB06] Wim Martens, Frank Neven, Thomas Schwentick, andGeert Jan Bex.Expressiveness and complexity of XML schema.
PhDs+S lawek (Mostrare) Queries on Trees 2008 128 / 128
ACM Transactions of Database Systems, 31(3):770–813,2006.
[Mor94] Etsuro Moriya.On two-way tree automata.Inf. Process. Lett., 50(3):117–121, 1994.
[MS02] Gerome Miklau and Dan Suciu.Containment and equivalence for an xpath fragment.In PODS, pages 65–76, 2002.
[MSS06] Anca Muscholl, Mathias Samuelides, and Luc Segoufin.Complementing deterministic tree-walking automata.Information Processing Letters, 99(1):33–39, 2006.
[Nev02a] Frank Neven.Automata, logic, and XML.In Computer Science Logic, Lecture Notes in ComputerScience, pages 2–26. Springer Verlag, 2002.
[Nev02b] Frank Neven.Automata theory for XML researchers.
PhDs+S lawek (Mostrare) Queries on Trees 2008 128 / 128
SIGMOD Rec., 31(3):39–46, 2002.
[NPTT05] Joachim Niehren, Laurent Planque, Jean-Marc Talbot, andSophie Tison.N-ary queries by tree automata.In 10th International Symposium on Database ProgrammingLanguages, volume 3774 of Lecture Notes in ComputerScience, pages 217–231. Springer Verlag, September 2005.
[NS99] Frank Neven and Thomas Schwentick.Query automata.In Proceedings of the Eighteenth ACM Symposium onPrinciples of Database Systems, pages 205–214, 1999.
[NS02] Frank Neven and Thomas Schwentick.Query automata over finite trees.Theoretical Computer Science, 275(1-2):633–674, 2002.
[Sch07] Thomas Schwentick.Automata for XML—a survey.
PhDs+S lawek (Mostrare) Queries on Trees 2008 128 / 128
Journal of Computer and System Science, 73(3):289–315,2007.
[Shi08] John W. Shipman.XSLT Reference.2008.
[Sto74] L. J. Stockmeyer.The Complexity of Decision Problems in Automata Theory.PhD thesis, Department of Electrical Engineering, MIT, 1974.
[tC06] Balder ten Cate.The expressiveness of XPath with transitive closure.In 25st ACM SIGMOD-SIGACT Symposium on Principles ofDatabase Systems. ACM-Press, 2006.
[tCM07] Balder ten Cate and Maarten Marx.Axiomatizing the logical core of XPath 2.0.In International Conference on Database Theory, 2007.
[tCS08] Balder ten Cate and Luc Segoufin.
PhDs+S lawek (Mostrare) Queries on Trees 2008 128 / 128
XPath, transitive closure logic, and nested tree walkingautomata.In 27th ACM SIGACT-SIGMOD-SIGART Symposium onPrinciples of Database Systems, 2008.
[TK06] Hans-Jorg Tiede and Stephan Kepser.Monadic second-order logic and transitive closure logics overtrees.In 13th Workshop on Logic, Language, Information andComputation, volume 165 of Electronical notes in theoreticalcomputer science, pages 189–199. Elsevier, 2006.
[TW68] J. W. Thatcher and J. B. Wright.Generalized finite automata with an application to a decisionproblem of second-order logic.Mathematical System Theory, 2:57–82, 1968.
[Var82] Moshe Y. Vardi.The complexity of relational query languages.In 14th ACM Symposium on Theory of Computing, pages137–146, 1982.
PhDs+S lawek (Mostrare) Queries on Trees 2008 128 / 128
[Var95] Moshe Y. Vardi.On the complexity of bounded-variable queries.In Fourteenth ACM SIGACT-SIGMOD-SIGART Symposiumon Principles of Database Systems, pages 266–276, 1995.
PhDs+S lawek (Mostrare) Queries on Trees 2008 128 / 128