QUERYING TREES AND CHAINSjhellings.nl/files/phd_defense_slides.pdfPeggy FriendOf FriendOf Victor...
Transcript of QUERYING TREES AND CHAINSjhellings.nl/files/phd_defense_slides.pdfPeggy FriendOf FriendOf Victor...
1/36
On Tarski’s Relation Algebra
QUERYING TREES AND CHAINS
and
THE SEMI-JOIN ALGEBRA
Jelle Hellings
2/36
Part I
On Tarski’s Relation Algebra
INTRODUCTION
3/36
Relation algebra: graphs and queries
Alice
Bob
Carol
ParentOfParentOf
Dan
ParentOf
Faythe
ParentOf
Grace
ParentOf
PeggyFriendOf FriendOf
Victor
FriendOf
WorksWith
WendyFriendOf
FriendsOfGgp = π1[ParentOf ◦ ParentOf ◦ ParentOf ] ◦ FriendOf
3/36
Relation algebra: graphs and queries
Alice
Bob
Carol
ParentOfParentOf
Dan
ParentOf
Faythe
ParentOf
Grace
ParentOf
PeggyFriendOf FriendOf
Victor
FriendOf
WorksWith
WendyFriendOf
FriendsOfGgp = π1[ParentOf ◦ ParentOf ◦ ParentOf ] ◦ FriendOf
3/36
Relation algebra: graphs and queries
Alice
Bob
Carol
ParentOfParentOf
Dan
ParentOf
Faythe
ParentOf
Grace
ParentOf
PeggyFriendOf FriendOf
Victor
FriendOf
WorksWith
WendyFriendOf
FriendsOfGgp = π1[ParentOf ◦ ParentOf ◦ ParentOf ] ◦ FriendOf
3/36
Relation algebra: graphs and queries
Alice
Bob
Carol
ParentOfParentOf
Dan
ParentOf
Faythe
ParentOf
Grace
ParentOf
PeggyFriendOf FriendOf
Victor
FriendOf
WorksWith
WendyFriendOf
FriendsOfGgp = π1[ParentOf ◦ ParentOf ◦ ParentOf ] ◦ FriendOf
3/36
Relation algebra: graphs and queries
Alice
Bob
Carol
ParentOfParentOf
Dan
ParentOf
Faythe
ParentOf
Grace
ParentOf
PeggyFriendOf FriendOf
Victor
FriendOf
WorksWith
WendyFriendOf
FriendsOfGgp = π1[ParentOf ◦ ParentOf ◦ ParentOf ] ◦ FriendOf
4/36
Relation algebra: other operators
ChildOf = ParentOf a
AcquaintanceOf = FriendOf ∪WorksWith
WorkFriendOf = FriendOf ∩WorksWith
NonWorkFriend = FriendOf −WorksWith
Parent = π1[ParentOf ]
NonParent = π 1[ParentOf ]
GrandParentOf = ParentOf ◦ ParentOf
AncestorOf = [ParentOf ]+
Also: constants ∅, id, di.
5/36
Graph query languages
∗ ∅ id ∪ ◦ a π π ∩ − di
Tarski’s Relation Algebra, FO[3]
RPQs
2RPQs
Nested RPQs
Navigational XPath, GXPath
6/36
�estions about the relation algebra
�estions
I Are all these operators necessary?I What does each operator add?I When can we replace complex operators by simpler ones?
What is the expressive power of fragments of the relation algebra?
6/36
�estions about the relation algebra
�estions
I Are all these operators necessary?I What does each operator add?I When can we replace complex operators by simpler ones?
What is the expressive power of fragments of the relation algebra?
7/36
Related work and our results
Previous work by Fletcher et al.Relative expressive power of relation algebra fragments on graphs.
This workImprove understanding of the relation algebraI when querying trees or chains;I in comparison with the semi-join algebra.
8/36
Background: when are queries equivalent?
Two types of queries and equivalences
Path-queries. The exact query result is important:
FriendOf ∩WorksWith ≡path FriendOf − (FriendOf −WorksWith).
Boolean-queries. The existence of a query result is important:
FriendOf a ◦ ParentOf ≡bool π1[FriendOf ] ∩ π1[ParentOf ].
DefinitionLanguage L1 is z-subsumed by L2 if every query in L1 isz-equivalent to a query in L2 (denoted by L1 �z L2).
9/36
Background: query fragments
Let F ⊆ {di, a,π ,π ,∩,−, ∗}.I We write N(F): only allows ∅, id, `, ◦, ∪, and all operators in F.I We write N(F) to represent all operators expressible in N(F)
using the following basic rewrite rules:
π1[e] = π j[π 1[e]] = (e ◦ [e]−1) ∩ id = (e ◦ (id ∪ di)) ∩ id;
π2[e] = π j[π 2[e]] = ([e]−1 ◦ e) ∩ id = ((id ∪ di) ◦ e) ∩ id;
π i[e] = id − πi[e];e1 ∩ e2 = e1 − (e1 − e2).
Examples
I N(π ) = N(π ,π ).I N(a,−) = N(a,π ,π ,∩,−).
10/36
Part II
On Tarski’s Relation Algebra
QUERYING TREES AND CHAINS
11/36
Why studying trees
DefinitionA tree is an acyclic graph in which exactly one node, the root, has noincoming edges, and all other nodes have exactly one incoming edge.
AdviserOf AdviserOf
AdviserOf AdviserOf
I Hierarchical relations: taxonomies, corporate structures.I XML and JSON data.I Nested relational data.
12/36
Initial classification
1-subtreereducible
2-subtreereducible
3-subtreereducible
downward non-downward
local
non-local,∗-free
non-∗-free
monotone
non-monotone
monotone
non-monotone
monotone
non-monotone
N(∩,−)
N(π ,∩)
N(π ,π ,∩,−)
N(π ,∩, ∗)
N(π ,π ,∩,−, ∗)
N(a,π ,π ,∩)
N(a,π ,∩)
N(a,π ,π ,∩, ∗)
N(a,π ,∩, ∗)
N(di, a,π )
N(di, a,π ,π )
N(di, a,π , ∗)
N(di, a,π ,π , ∗)
N(a,π ,π ,∩,−)
N(di, a,π ,∩)
N(di, a,π ,π ,∩,−)
N(di, a,π ,∩, ∗)
N
13/36
Recognizing branches: labeled trees
FriendOf ParentOf FriendOf ParentOf
Only select the three on the right
I Not possible in N(∗).I Using projection: π1[FriendOf ] ◦ π1[ParentOf ].I Using converse: FriendOf a ◦ ParentOf .
14/36
Recognizing branches: unlabeled trees
DefinitionA k-subtree-reduction step on tree T consists of removing a subtreerooted at a node n with parent m, given that parent m has at least kother children isomorphic to n.
Theorem
1. N(di,∩) and N(a,−) are closed under 3-subtree-reductions.
2. N(di, a,π ,π , ∗) is closed under 2-subtree-reductions.
3. N(a,π ,π ,∩, ∗) is closed under 1-subtree-reductions.
15/36
Monotonicity
I Without negation you cannot ‘forbid’ structures.I π always provides negation: NonParent = π 1[ParentOf ].I Without negation, only a few Boolean queries are possible.
Theorem
1. On unlabeled chains, we have N(di, a,π ,∩, ∗) �bool N().
2. On unlabeled trees, we have N(a,π ,∩, ∗) �bool N().
TheoremAlready on unlabeled chains, we have N(π ) �bool N(di, a,π ,∩, ∗).
16/36
The Kleene-star
I Is not expressible in first-order logic.I Path queries: always adds expressive power.I Labeled structures: always adds expressive power.
Theorem
1. On unlabeled chains, we have N(di, a,π ,∩, ∗) �bool N().
2. On unlabeled trees, we have N(a,π ,∩, ∗) �bool N().
TheoremLet F ⊆ {di, a,π }. On unlabeled trees, we haveN(F ∪ {∗}) �bool N(F).
17/36
Downward queries: path semantics
N()
N(π )
N(π )
N(∗)
N(π , ∗)
N(π , ∗)
N(∩,−)
N(π ,∩)
N(π ,π ,∩,−)
N(∩,−, ∗)
N(π ,∩, ∗)
N(π ,π ,∩,−, ∗)
I N(π ,π ,∩,−, ∗) is downward.I a and di are not downward.I Downward queries can be expressed
by condition automata.
TheoremLet F ⊆ {π ,π ,∩,−, ∗}. On labeled trees,we have N(F) �path N(F − {∩,−}).
18/36
Downward queries: Boolean semantics
Unlabeled Trees Labeled Trees
N()
N(π )
N(π , ∗)
N(∩,−, ∗)
N(π ,∩, ∗)
N(π ,π ,∩,−)
N(π ,π ,∩,−, ∗)
N()
N(π )
N(π )
N(∗)
N(π , ∗)
N(π , ∗)
N(∩,−)
N(π ,∩)
N(π ,π ,∩,−)
N(∩,−, ∗)
N(π ,∩, ∗)
N(π ,π ,∩,−, ∗)
19/36
Local queries: path semantics
N()
N(π )
N(π )
N(a)
N(a,π )
N(a,π )
N(a,−)
N(∩,−)
N(π ,∩)
N(π ,π ,∩,−)
N(a,π ,∩)
N(a,π ,π ,∩)
N(a,π ,π ,∩,−)
I N(a,π ,π ,∩,−) is local.I ∗ and di are not local.I Local queries can be expressed by
unions of condition tree queries.
TheoremLet {a,π } ⊆ F ⊆ {a,π ,π ,∩}. On labeledtrees, we have N(F) �path N(F − {∩}).
20/36
Local queries: Boolean semantics
Unlabeled Trees Labeled Trees
N()
N(π )
N(a,−)
N(∩,−)
N(a,π ,∩)
N(π ,π ,∩,−)N(a,π ,π ,∩)
N(a,π ,π ,∩,−)
N()
N(π ) = N(a)
N(π )
N(a,−)
N(∩,−)
N(π ,∩)N(a,π ,∩)
N(π ,π ,∩,−)N(a,π ,π ,∩)
N(a,π ,π ,∩,−)
21/36
Conclusion
I We fully established relationships between:I downward fragments;I local fragments.
I On trees: solved most cases not involving ∗.I On chains: solved most cases not involving di or ∗.
Main open problemLet {di,π ,∩} ⊆ F ⊆ {di, a,π ,π ,∩, ∗} and let z ∈ {bool, path}. Withrespect to either labeled trees, unlabeled trees, labeled chains, orunlabeled chains, do we have a collapse N(F ∪ {−}) �z N(F) or not?
22/36
Part III
On Tarski’s Relation Algebra
THE SEMI-JOIN ALGEBRA
23/36
Naive query evaluation: an ine�icient example
Return pairs of (great-grandparent, friend)π1[ParentOf ◦ ParentOf ◦ ParentOf ] ◦ FriendOf .
1. Compute (grandparent, grandchild):X = ParentOf ◦ ParentOf
2. Compute (great-grandparent, great-grandchild):Y = ParentOf ◦ X
3. Throw away the great-grandchildren:Z = π1[Y ]
4. Compute (great-grandparent, friend):Result = Z ◦ FriendOf
24/36
Introducing the semi-join algebra
mi
...
m2
m1
z
A
A
A
njB
...
n2
B
n1
B
Composition: querying for pathsConsider A ◦ B:I yields (begin, end)-nodes connected by AB.I yields i · j node pairs in total.
Checking for existence of pathsConsider the semi-join A n B:I yields the edges in A that connect to B.I yields i node pairs in total.
24/36
Introducing the semi-join algebra
mi
...
m2
m1
z
A
A
A
njB
...
n2
B
n1
B
Composition: querying for pathsConsider A ◦ B:I yields (begin, end)-nodes connected by AB.I yields i · j node pairs in total.
Checking for existence of pathsConsider the semi-join A n B:I yields the edges in A that connect to B.I yields i node pairs in total.
25/36
Optimize query evaluation: using semi-joins
Return pairs of (great-grandparent, friend)π1[ParentOf ◦ ParentOf ◦ ParentOf ] ◦ FriendOf .
1. Compute (grandparent, ???):X = ParentOf n ParentOf
2. Compute (great-grandparent, ???):Y = ParentOf n (X )
3. Throw away ???:Z = π1[Y ]
4. Compute (great-grandparent, friend):Result = Z o FriendOf
π1[ParentOf n (ParentOf n ParentOf )] o FriendOf .
26/36
Projection-equivalence
Requiring the rewrite to guarantee path-equivalence is too strong!
Le�-projection-equivalence. We only need the first projection.
ParentOf ◦ ParentOf ≡π1 ParentOf n ParentOf
π1[ParentOf ◦ ParentOf ] ≡path π1[ParentOf n ParentOf ]
Right-projection-equivalence. We only need the second projection.
ParentOf ◦ ParentOf ≡π2 ParentOf o ParentOf
π2[ParentOf ◦ ParentOf ] ≡path π2[ParentOf o ParentOf ]
27/36
The main result
∗ ∅ id ∪ ◦ a π π di ∩ −Relation algebra
fp ∅ id ∪no
a π π di ∩ −Semi-join algebra
FO[2]
FO[3]
�path�π2�π1
Intersection and di�erence
I Consider (E ◦ E) ∩ E:
G3,3: G4:
I We can allow ∩ and − on basic expressions (edges, ∅, id, di).
27/36
The main result
∗ ∅ id ∪ ◦ a π π di ∩ −Relation algebra
fp ∅ id ∪no
a π π di ∩ −Semi-join algebra
FO[2]
FO[3]
�path�π2�π1
Intersection and di�erence
I Consider (E ◦ E) ∩ E:
G3,3: G4:
I We can allow ∩ and − on basic expressions (edges, ∅, id, di).
28/36
Partial rewriting to the semi-join algebra
Rewrite functions τ (e) ≡path e, τπ1(e) ≡π1 e and τπ2(e) ≡π2 e
Helper functions τ◦1(e; ε) ≡π1 e n ε and τ◦2(e; ε) ≡π2 ε o e
Examplee = π1[((WorksOn ◦WorksOna) ∩ FriendOf ) ◦ EditorOf ] ◦ StudentOf
Rewriting e using τ (e):
τ (e) = τπ2(π1[((W ◦Wa) ∩ F ) ◦ E]) o τ (S)
= π1[τπ1(((W ◦Wa) ∩ F ) ◦ E)] o S
= π1[τ◦1((W ◦Wa) ∩ F ; E)] o S
= π1[(τ (W ◦Wa) ∩ τ (F )) n E o S]
= π1[((τ (W ) ◦ τ (Wa)) ∩ F ) n E o S]
= π1[((W ◦Wa) ∩ F ) n E] o S.
29/36
�ery rewriting and optimization
I Cost of each operator 7.I Input size of each operator 7.I Number of evaluation steps 7.
Cost of each operator
Relation algebra
Semi-join algebra
FO[2]
FO[3]
∗ ∅ id ∪ ◦ a π π di ∩ −
fp ∅ id ∪no
a π π di ∩ −
29/36
�ery rewriting and optimization
I Cost of each operator 3.I Input size of each operator 7.I Number of evaluation steps 7.
Cost of each operator
Relation algebra
Semi-join algebra
FO[2]
FO[3]
∗ ∅ id ∪ ◦ a π π di ∩ −
fp ∅ id ∪no
a π π di ∩ −
30/36
�ery rewriting and optimization
I Cost of each operator 3.I Input size of each operator 7.I Number of evaluation steps 7.
Input size of each operator
m n
ziA B
...
z2
A B
z1
A B
A ◦ B ≡π1 A n B
I A ◦ B yields {(m, n)}.I A n B yields {(m, z1), (m, z2), . . . , (m, zi)}.
I Fix: add projection-steps into algorithms.
30/36
�ery rewriting and optimization
I Cost of each operator 3.I Input size of each operator 3.I Number of evaluation steps 7.
Input size of each operator
m n
ziA B
...
z2
A B
z1
A B
A ◦ B ≡π1 A n B
I A ◦ B yields {(m, n)}.I A n B yields {(m, z1), (m, z2), . . . , (m, zi)}.I Fix: add projection-steps into algorithms.
31/36
�ery rewriting and optimization
I Cost of each operator 3.I Input size of each operator 3.I Number of evaluation steps ∼.
Number of evaluation steps
I Number of evaluation steps can increase.I Complexity of each individual step can significantly decrease.
ExampleConsider e = π1[(` ◦ `) ◦ (` ◦ `)] and τ (e) = π1[` n (` n (` n `)))].We can evaluate e in three steps:
1. compute X = ` ◦ `;
2. compute Y = X ◦ X ;
3. compute π1[Y ].
32/36
Other expensive constructs: id, di, and π
Idea: replace common use cases
I Replace id by equality-selections
(FriendOf ◦WorksWith) ∩ id ≡path σ=(FriendOf ◦WorksWith).
I Replace di by inequality-selections
(FriendOf ◦ FriendOf ) ∩ di ≡path σ,(FriendOf ◦ FriendOf ).
I Replace π by anti-semi-joins
FriendOf ◦ π 1[ParentOf ] ≡path FriendOf n̄ ParentOf .
33/36
Further optimization: filter steps in practical queries
Find friend suggestions for Alice:
1. Compute all friends-of-friends (excluding friends and oneself):
FriendSuggestions = (FriendOf ◦ FriendOf ) − (FriendOf ∪ id).
2. Filter the first column on Alice.
3. Keep the second column.
Incorporate filter-step in the relation algebra
π2[(〈Alice〉 o FriendOf ) o (FriendOf n̄
π2[(〈Alice〉 o FriendOf ) ∪ 〈Alice〉])]
33/36
Further optimization: filter steps in practical queries
Find friend suggestions for Alice:
1. Compute all friends-of-friends (excluding friends and oneself):
FriendSuggestions = (FriendOf ◦ FriendOf ) − (FriendOf ∪ id).
2. Filter the first column on Alice.
3. Keep the second column.
Incorporate filter-step in the relation algebra
π2[(〈Alice〉 o FriendOf ) o (FriendOf n̄
π2[(〈Alice〉 o FriendOf ) ∪ 〈Alice〉])]
34/36
Conclusion
�eries in the relation algebra can be partially rewri�en intosemi-join algebra queries that are easier to evaluate.
Future work
I Implementation in real systems.I Interactions with other optimization techniques.I Elimination of intersection and di�erence.
35/36
Part IV
On Tarski’s Relation Algebra
CONCLUSION
36/36
Conclusion
�estions
I Are all these operators necessary?I What does each operator add?I When can we replace complex operators by simpler ones?
This workImprove understanding of the relation algebraI when querying trees or chains;I in comparison with the semi-join algebra.