Coloring Away Communication in Parallel Query Optimization Waqar Hasan, Rajeev Motwani Stanford...

21
Coloring Away Communication in Parallel Query Optimization Waqar Hasan, Rajeev Motwani Stanford University Παυλάτος Χρήστος [email protected]
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    217
  • download

    0

Transcript of Coloring Away Communication in Parallel Query Optimization Waqar Hasan, Rajeev Motwani Stanford...

Coloring Away Communication in Parallel Query Optimization

Waqar Hasan, Rajeev MotwaniStanford University

Παυλάτος Χρήστος

[email protected]

Coloring Away Communication in Parallel Query Optimization

2

Parallel plans for SQL queries

The problem is to find optimal parallel plans for SQL queries using a model based on representing the partitioning of data as a color.

Coloring Away Communication in Parallel Query Optimization

3

Hong and Stonebraker approach

The problem of parallel plans has been broken into two phases :

join ordering and query rewrite (JOQR) and parallelization

JOQR ParallelizationSQLQuery

ParallelPlan

Coloring Away Communication in Parallel Query Optimization

4

Optimize JOQR phase

JOQR

Conventional optimization

Query tree annotation and coloring

Coloring Away Communication in Parallel Query Optimization

5

Partitioning

A partitioning is a pair (a, h) where

a is an attribute and h is a function that maps values of a to non-negative integers.

Coloring Away Communication in Parallel Query Optimization

6

Partitioning example

Suppose we have two tables:Emp (name, number) and Cust (name, number) that both are partitioned across two sites usingthe function h (number) mod 2. Since thetables have the same partitioning Emp Cust = (Emp0 Cust0 ) U (Emp1 Cust1)

This permit (Emp Cust) to be computed in Parallel.

Coloring Away Communication in Parallel Query Optimization

7

The new approach

We want to choose the partitioning attributesin a query tree to minimize the sum total ofcommunication and computation cost. Byregarding partitioning attributes as colors wemodel the problem as a query tree coloring.

Coloring Away Communication in Parallel Query Optimization

8

Some definitions The color of a node in a query tree is

the attribute used for partitioning the node.

An edge between nodes i and j is multicolored if and only if i has different color from j

The weight ce of an edge represent the repartition cost.

Coloring Away Communication in Parallel Query Optimization

9

Query tree Coloring problem

Given a query tree T = (V, E), the weights ofthe edges and colors for some subset of thenodes, color the remaining nodes so as to minimize the total weight of multicolorededges.

Coloring Away Communication in Parallel Query Optimization

10

An example

Coloring Away Communication in Parallel Query Optimization

11

Problem Simplification

(Split) A colored interior node of degree d may be split into d nodes of the same colors and each incident edge connected to a distinct copy.

(Collapse) An uncolored leaf node may be collapsed into its parent. This gives it the same color as its parent.

Coloring Away Communication in Parallel Query Optimization

12

Examples on simplifications

Coloring Away Communication in Parallel Query Optimization

13

LemmaSuppose m is a mother with edges e1, e2…

ed to leaf childrean u1, u2 … ud . Assume

that we have numbered the childrean inorder of non-decreasing edge weight i.ece1, ce2 … ce3

Then there is a minimal coloring that cuts e1, e2…ed.

Coloring Away Communication in Parallel Query Optimization

14

The algorithm

Coloring Away Communication in Parallel Query Optimization

15

An example

Coloring Away Communication in Parallel Query Optimization

16

Algorithm for Repeated colors

Coloring Away Communication in Parallel Query Optimization

17

Decompose the tree

Coloring Away Communication in Parallel Query Optimization

18

Combining computation and communication costs

We can develop a new model by extending

the definition of color to be a triple <p, s, i>

where P is the partitioning attribute S is the sort attribute The indexing attribute

Coloring Away Communication in Parallel Query Optimization

19

The cost of a node

The cost of a node consists the cost of

Recoloring the outputs of its children

Have the color of its inputs The cost of executing the strategy

itself

Coloring Away Communication in Parallel Query Optimization

20

Strategy

A strategy specifies a particular algorithm

for computing an operator. It requires the

inputs to satisfy some constraints andguarantees some properties for its

output.

Coloring Away Communication in Parallel Query Optimization

21

Constraint

We use color patterns to specify such input-

output constraints. A constraint has the form :Input1, …, Inputn → Output

Where Inputj, Output are color patterns