QUERYING TREES AND CHAINSjhellings.nl/files/phd_defense_slides.pdfPeggy FriendOf FriendOf Victor...

Post on 13-Sep-2020

0 views 0 download

Transcript of QUERYING TREES AND CHAINSjhellings.nl/files/phd_defense_slides.pdfPeggy FriendOf FriendOf Victor...

1/36

On Tarski’s Relation Algebra

QUERYING TREES AND CHAINS

and

THE SEMI-JOIN ALGEBRA

Jelle Hellings

2/36

Part I

On Tarski’s Relation Algebra

INTRODUCTION

3/36

Relation algebra: graphs and queries

Alice

Bob

Carol

ParentOfParentOf

Dan

ParentOf

Faythe

ParentOf

Grace

ParentOf

PeggyFriendOf FriendOf

Victor

FriendOf

WorksWith

WendyFriendOf

FriendsOfGgp = π1[ParentOf ◦ ParentOf ◦ ParentOf ] ◦ FriendOf

3/36

Relation algebra: graphs and queries

Alice

Bob

Carol

ParentOfParentOf

Dan

ParentOf

Faythe

ParentOf

Grace

ParentOf

PeggyFriendOf FriendOf

Victor

FriendOf

WorksWith

WendyFriendOf

FriendsOfGgp = π1[ParentOf ◦ ParentOf ◦ ParentOf ] ◦ FriendOf

3/36

Relation algebra: graphs and queries

Alice

Bob

Carol

ParentOfParentOf

Dan

ParentOf

Faythe

ParentOf

Grace

ParentOf

PeggyFriendOf FriendOf

Victor

FriendOf

WorksWith

WendyFriendOf

FriendsOfGgp = π1[ParentOf ◦ ParentOf ◦ ParentOf ] ◦ FriendOf

3/36

Relation algebra: graphs and queries

Alice

Bob

Carol

ParentOfParentOf

Dan

ParentOf

Faythe

ParentOf

Grace

ParentOf

PeggyFriendOf FriendOf

Victor

FriendOf

WorksWith

WendyFriendOf

FriendsOfGgp = π1[ParentOf ◦ ParentOf ◦ ParentOf ] ◦ FriendOf

3/36

Relation algebra: graphs and queries

Alice

Bob

Carol

ParentOfParentOf

Dan

ParentOf

Faythe

ParentOf

Grace

ParentOf

PeggyFriendOf FriendOf

Victor

FriendOf

WorksWith

WendyFriendOf

FriendsOfGgp = π1[ParentOf ◦ ParentOf ◦ ParentOf ] ◦ FriendOf

4/36

Relation algebra: other operators

ChildOf = ParentOf a

AcquaintanceOf = FriendOf ∪WorksWith

WorkFriendOf = FriendOf ∩WorksWith

NonWorkFriend = FriendOf −WorksWith

Parent = π1[ParentOf ]

NonParent = π 1[ParentOf ]

GrandParentOf = ParentOf ◦ ParentOf

AncestorOf = [ParentOf ]+

Also: constants ∅, id, di.

5/36

Graph query languages

∗ ∅ id ∪ ◦ a π π ∩ − di

Tarski’s Relation Algebra, FO[3]

RPQs

2RPQs

Nested RPQs

Navigational XPath, GXPath

6/36

�estions about the relation algebra

�estions

I Are all these operators necessary?I What does each operator add?I When can we replace complex operators by simpler ones?

What is the expressive power of fragments of the relation algebra?

6/36

�estions about the relation algebra

�estions

I Are all these operators necessary?I What does each operator add?I When can we replace complex operators by simpler ones?

What is the expressive power of fragments of the relation algebra?

7/36

Related work and our results

Previous work by Fletcher et al.Relative expressive power of relation algebra fragments on graphs.

This workImprove understanding of the relation algebraI when querying trees or chains;I in comparison with the semi-join algebra.

8/36

Background: when are queries equivalent?

Two types of queries and equivalences

Path-queries. The exact query result is important:

FriendOf ∩WorksWith ≡path FriendOf − (FriendOf −WorksWith).

Boolean-queries. The existence of a query result is important:

FriendOf a ◦ ParentOf ≡bool π1[FriendOf ] ∩ π1[ParentOf ].

DefinitionLanguage L1 is z-subsumed by L2 if every query in L1 isz-equivalent to a query in L2 (denoted by L1 �z L2).

9/36

Background: query fragments

Let F ⊆ {di, a,π ,π ,∩,−, ∗}.I We write N(F): only allows ∅, id, `, ◦, ∪, and all operators in F.I We write N(F) to represent all operators expressible in N(F)

using the following basic rewrite rules:

π1[e] = π j[π 1[e]] = (e ◦ [e]−1) ∩ id = (e ◦ (id ∪ di)) ∩ id;

π2[e] = π j[π 2[e]] = ([e]−1 ◦ e) ∩ id = ((id ∪ di) ◦ e) ∩ id;

π i[e] = id − πi[e];e1 ∩ e2 = e1 − (e1 − e2).

Examples

I N(π ) = N(π ,π ).I N(a,−) = N(a,π ,π ,∩,−).

10/36

Part II

On Tarski’s Relation Algebra

QUERYING TREES AND CHAINS

11/36

Why studying trees

DefinitionA tree is an acyclic graph in which exactly one node, the root, has noincoming edges, and all other nodes have exactly one incoming edge.

AdviserOf AdviserOf

AdviserOf AdviserOf

I Hierarchical relations: taxonomies, corporate structures.I XML and JSON data.I Nested relational data.

12/36

Initial classification

1-subtreereducible

2-subtreereducible

3-subtreereducible

downward non-downward

local

non-local,∗-free

non-∗-free

monotone

non-monotone

monotone

non-monotone

monotone

non-monotone

N(∩,−)

N(π ,∩)

N(π ,π ,∩,−)

N(π ,∩, ∗)

N(π ,π ,∩,−, ∗)

N(a,π ,π ,∩)

N(a,π ,∩)

N(a,π ,π ,∩, ∗)

N(a,π ,∩, ∗)

N(di, a,π )

N(di, a,π ,π )

N(di, a,π , ∗)

N(di, a,π ,π , ∗)

N(a,π ,π ,∩,−)

N(di, a,π ,∩)

N(di, a,π ,π ,∩,−)

N(di, a,π ,∩, ∗)

N

13/36

Recognizing branches: labeled trees

FriendOf ParentOf FriendOf ParentOf

Only select the three on the right

I Not possible in N(∗).I Using projection: π1[FriendOf ] ◦ π1[ParentOf ].I Using converse: FriendOf a ◦ ParentOf .

14/36

Recognizing branches: unlabeled trees

DefinitionA k-subtree-reduction step on tree T consists of removing a subtreerooted at a node n with parent m, given that parent m has at least kother children isomorphic to n.

Theorem

1. N(di,∩) and N(a,−) are closed under 3-subtree-reductions.

2. N(di, a,π ,π , ∗) is closed under 2-subtree-reductions.

3. N(a,π ,π ,∩, ∗) is closed under 1-subtree-reductions.

15/36

Monotonicity

I Without negation you cannot ‘forbid’ structures.I π always provides negation: NonParent = π 1[ParentOf ].I Without negation, only a few Boolean queries are possible.

Theorem

1. On unlabeled chains, we have N(di, a,π ,∩, ∗) �bool N().

2. On unlabeled trees, we have N(a,π ,∩, ∗) �bool N().

TheoremAlready on unlabeled chains, we have N(π ) �bool N(di, a,π ,∩, ∗).

16/36

The Kleene-star

I Is not expressible in first-order logic.I Path queries: always adds expressive power.I Labeled structures: always adds expressive power.

Theorem

1. On unlabeled chains, we have N(di, a,π ,∩, ∗) �bool N().

2. On unlabeled trees, we have N(a,π ,∩, ∗) �bool N().

TheoremLet F ⊆ {di, a,π }. On unlabeled trees, we haveN(F ∪ {∗}) �bool N(F).

17/36

Downward queries: path semantics

N()

N(π )

N(π )

N(∗)

N(π , ∗)

N(π , ∗)

N(∩,−)

N(π ,∩)

N(π ,π ,∩,−)

N(∩,−, ∗)

N(π ,∩, ∗)

N(π ,π ,∩,−, ∗)

I N(π ,π ,∩,−, ∗) is downward.I a and di are not downward.I Downward queries can be expressed

by condition automata.

TheoremLet F ⊆ {π ,π ,∩,−, ∗}. On labeled trees,we have N(F) �path N(F − {∩,−}).

18/36

Downward queries: Boolean semantics

Unlabeled Trees Labeled Trees

N()

N(π )

N(π , ∗)

N(∩,−, ∗)

N(π ,∩, ∗)

N(π ,π ,∩,−)

N(π ,π ,∩,−, ∗)

N()

N(π )

N(π )

N(∗)

N(π , ∗)

N(π , ∗)

N(∩,−)

N(π ,∩)

N(π ,π ,∩,−)

N(∩,−, ∗)

N(π ,∩, ∗)

N(π ,π ,∩,−, ∗)

19/36

Local queries: path semantics

N()

N(π )

N(π )

N(a)

N(a,π )

N(a,π )

N(a,−)

N(∩,−)

N(π ,∩)

N(π ,π ,∩,−)

N(a,π ,∩)

N(a,π ,π ,∩)

N(a,π ,π ,∩,−)

I N(a,π ,π ,∩,−) is local.I ∗ and di are not local.I Local queries can be expressed by

unions of condition tree queries.

TheoremLet {a,π } ⊆ F ⊆ {a,π ,π ,∩}. On labeledtrees, we have N(F) �path N(F − {∩}).

20/36

Local queries: Boolean semantics

Unlabeled Trees Labeled Trees

N()

N(π )

N(a,−)

N(∩,−)

N(a,π ,∩)

N(π ,π ,∩,−)N(a,π ,π ,∩)

N(a,π ,π ,∩,−)

N()

N(π ) = N(a)

N(π )

N(a,−)

N(∩,−)

N(π ,∩)N(a,π ,∩)

N(π ,π ,∩,−)N(a,π ,π ,∩)

N(a,π ,π ,∩,−)

21/36

Conclusion

I We fully established relationships between:I downward fragments;I local fragments.

I On trees: solved most cases not involving ∗.I On chains: solved most cases not involving di or ∗.

Main open problemLet {di,π ,∩} ⊆ F ⊆ {di, a,π ,π ,∩, ∗} and let z ∈ {bool, path}. Withrespect to either labeled trees, unlabeled trees, labeled chains, orunlabeled chains, do we have a collapse N(F ∪ {−}) �z N(F) or not?

22/36

Part III

On Tarski’s Relation Algebra

THE SEMI-JOIN ALGEBRA

23/36

Naive query evaluation: an ine�icient example

Return pairs of (great-grandparent, friend)π1[ParentOf ◦ ParentOf ◦ ParentOf ] ◦ FriendOf .

1. Compute (grandparent, grandchild):X = ParentOf ◦ ParentOf

2. Compute (great-grandparent, great-grandchild):Y = ParentOf ◦ X

3. Throw away the great-grandchildren:Z = π1[Y ]

4. Compute (great-grandparent, friend):Result = Z ◦ FriendOf

24/36

Introducing the semi-join algebra

mi

...

m2

m1

z

A

A

A

njB

...

n2

B

n1

B

Composition: querying for pathsConsider A ◦ B:I yields (begin, end)-nodes connected by AB.I yields i · j node pairs in total.

Checking for existence of pathsConsider the semi-join A n B:I yields the edges in A that connect to B.I yields i node pairs in total.

24/36

Introducing the semi-join algebra

mi

...

m2

m1

z

A

A

A

njB

...

n2

B

n1

B

Composition: querying for pathsConsider A ◦ B:I yields (begin, end)-nodes connected by AB.I yields i · j node pairs in total.

Checking for existence of pathsConsider the semi-join A n B:I yields the edges in A that connect to B.I yields i node pairs in total.

25/36

Optimize query evaluation: using semi-joins

Return pairs of (great-grandparent, friend)π1[ParentOf ◦ ParentOf ◦ ParentOf ] ◦ FriendOf .

1. Compute (grandparent, ???):X = ParentOf n ParentOf

2. Compute (great-grandparent, ???):Y = ParentOf n (X )

3. Throw away ???:Z = π1[Y ]

4. Compute (great-grandparent, friend):Result = Z o FriendOf

π1[ParentOf n (ParentOf n ParentOf )] o FriendOf .

26/36

Projection-equivalence

Requiring the rewrite to guarantee path-equivalence is too strong!

Le�-projection-equivalence. We only need the first projection.

ParentOf ◦ ParentOf ≡π1 ParentOf n ParentOf

π1[ParentOf ◦ ParentOf ] ≡path π1[ParentOf n ParentOf ]

Right-projection-equivalence. We only need the second projection.

ParentOf ◦ ParentOf ≡π2 ParentOf o ParentOf

π2[ParentOf ◦ ParentOf ] ≡path π2[ParentOf o ParentOf ]

27/36

The main result

∗ ∅ id ∪ ◦ a π π di ∩ −Relation algebra

fp ∅ id ∪no

a π π di ∩ −Semi-join algebra

FO[2]

FO[3]

�path�π2�π1

Intersection and di�erence

I Consider (E ◦ E) ∩ E:

G3,3: G4:

I We can allow ∩ and − on basic expressions (edges, ∅, id, di).

27/36

The main result

∗ ∅ id ∪ ◦ a π π di ∩ −Relation algebra

fp ∅ id ∪no

a π π di ∩ −Semi-join algebra

FO[2]

FO[3]

�path�π2�π1

Intersection and di�erence

I Consider (E ◦ E) ∩ E:

G3,3: G4:

I We can allow ∩ and − on basic expressions (edges, ∅, id, di).

28/36

Partial rewriting to the semi-join algebra

Rewrite functions τ (e) ≡path e, τπ1(e) ≡π1 e and τπ2(e) ≡π2 e

Helper functions τ◦1(e; ε) ≡π1 e n ε and τ◦2(e; ε) ≡π2 ε o e

Examplee = π1[((WorksOn ◦WorksOna) ∩ FriendOf ) ◦ EditorOf ] ◦ StudentOf

Rewriting e using τ (e):

τ (e) = τπ2(π1[((W ◦Wa) ∩ F ) ◦ E]) o τ (S)

= π1[τπ1(((W ◦Wa) ∩ F ) ◦ E)] o S

= π1[τ◦1((W ◦Wa) ∩ F ; E)] o S

= π1[(τ (W ◦Wa) ∩ τ (F )) n E o S]

= π1[((τ (W ) ◦ τ (Wa)) ∩ F ) n E o S]

= π1[((W ◦Wa) ∩ F ) n E] o S.

29/36

�ery rewriting and optimization

I Cost of each operator 7.I Input size of each operator 7.I Number of evaluation steps 7.

Cost of each operator

Relation algebra

Semi-join algebra

FO[2]

FO[3]

∗ ∅ id ∪ ◦ a π π di ∩ −

fp ∅ id ∪no

a π π di ∩ −

29/36

�ery rewriting and optimization

I Cost of each operator 3.I Input size of each operator 7.I Number of evaluation steps 7.

Cost of each operator

Relation algebra

Semi-join algebra

FO[2]

FO[3]

∗ ∅ id ∪ ◦ a π π di ∩ −

fp ∅ id ∪no

a π π di ∩ −

30/36

�ery rewriting and optimization

I Cost of each operator 3.I Input size of each operator 7.I Number of evaluation steps 7.

Input size of each operator

m n

ziA B

...

z2

A B

z1

A B

A ◦ B ≡π1 A n B

I A ◦ B yields {(m, n)}.I A n B yields {(m, z1), (m, z2), . . . , (m, zi)}.

I Fix: add projection-steps into algorithms.

30/36

�ery rewriting and optimization

I Cost of each operator 3.I Input size of each operator 3.I Number of evaluation steps 7.

Input size of each operator

m n

ziA B

...

z2

A B

z1

A B

A ◦ B ≡π1 A n B

I A ◦ B yields {(m, n)}.I A n B yields {(m, z1), (m, z2), . . . , (m, zi)}.I Fix: add projection-steps into algorithms.

31/36

�ery rewriting and optimization

I Cost of each operator 3.I Input size of each operator 3.I Number of evaluation steps ∼.

Number of evaluation steps

I Number of evaluation steps can increase.I Complexity of each individual step can significantly decrease.

ExampleConsider e = π1[(` ◦ `) ◦ (` ◦ `)] and τ (e) = π1[` n (` n (` n `)))].We can evaluate e in three steps:

1. compute X = ` ◦ `;

2. compute Y = X ◦ X ;

3. compute π1[Y ].

32/36

Other expensive constructs: id, di, and π

Idea: replace common use cases

I Replace id by equality-selections

(FriendOf ◦WorksWith) ∩ id ≡path σ=(FriendOf ◦WorksWith).

I Replace di by inequality-selections

(FriendOf ◦ FriendOf ) ∩ di ≡path σ,(FriendOf ◦ FriendOf ).

I Replace π by anti-semi-joins

FriendOf ◦ π 1[ParentOf ] ≡path FriendOf n̄ ParentOf .

33/36

Further optimization: filter steps in practical queries

Find friend suggestions for Alice:

1. Compute all friends-of-friends (excluding friends and oneself):

FriendSuggestions = (FriendOf ◦ FriendOf ) − (FriendOf ∪ id).

2. Filter the first column on Alice.

3. Keep the second column.

Incorporate filter-step in the relation algebra

π2[(〈Alice〉 o FriendOf ) o (FriendOf n̄

π2[(〈Alice〉 o FriendOf ) ∪ 〈Alice〉])]

33/36

Further optimization: filter steps in practical queries

Find friend suggestions for Alice:

1. Compute all friends-of-friends (excluding friends and oneself):

FriendSuggestions = (FriendOf ◦ FriendOf ) − (FriendOf ∪ id).

2. Filter the first column on Alice.

3. Keep the second column.

Incorporate filter-step in the relation algebra

π2[(〈Alice〉 o FriendOf ) o (FriendOf n̄

π2[(〈Alice〉 o FriendOf ) ∪ 〈Alice〉])]

34/36

Conclusion

�eries in the relation algebra can be partially rewri�en intosemi-join algebra queries that are easier to evaluate.

Future work

I Implementation in real systems.I Interactions with other optimization techniques.I Elimination of intersection and di�erence.

35/36

Part IV

On Tarski’s Relation Algebra

CONCLUSION

36/36

Conclusion

�estions

I Are all these operators necessary?I What does each operator add?I When can we replace complex operators by simpler ones?

This workImprove understanding of the relation algebraI when querying trees or chains;I in comparison with the semi-join algebra.