• date post

19-Dec-2015
• Category

## Documents

• view

212

0

Embed Size (px)

### Transcript of Coloring Away Communication in Parallel Query Optimization Waqar Hasan, Rajeev Motwani Stanford...

• Slide 1
• Coloring Away Communication in Parallel Query Optimization Waqar Hasan, Rajeev Motwani Stanford University pavlatos@cslab.ece.ntua.gr
• Slide 2
• Coloring Away Communication in Parallel Query Optimization2 Parallel plans for SQL queries The problem is to find optimal parallel plans for SQL queries using a model based on representing the partitioning of data as a color.
• Slide 3
• Coloring Away Communication in Parallel Query Optimization3 Hong and Stonebraker approach The problem of parallel plans has been broken into two phases : join ordering and query rewrite (JOQR) and parallelization JOQR Parallelization SQL Query Parallel Plan
• Slide 4
• Coloring Away Communication in Parallel Query Optimization4 Optimize JOQR phase JOQR Conventional optimization Query tree annotation and coloring
• Slide 5
• Coloring Away Communication in Parallel Query Optimization5 Partitioning A partitioning is a pair (a, h) where a is an attribute and h is a function that maps values of a to non-negative integers.
• Slide 6
• Coloring Away Communication in Parallel Query Optimization6 Partitioning example Suppose we have two tables: Emp (name, number) and Cust (name, number) that both are partitioned across two sites using the function h (number) mod 2. Since the tables have the same partitioning Emp Cust = (Emp 0 Cust 0 ) U (Emp 1 Cust 1 ) This permit (Emp Cust) to be computed in Parallel.
• Slide 7
• Coloring Away Communication in Parallel Query Optimization7 The new approach We want to choose the partitioning attributes in a query tree to minimize the sum total of communication and computation cost. By regarding partitioning attributes as colors we model the problem as a query tree coloring.
• Slide 8
• Coloring Away Communication in Parallel Query Optimization8 Some definitions The color of a node in a query tree is the attribute used for partitioning the node. An edge between nodes i and j is multicolored if and only if i has different color from j The weight c e of an edge represent the repartition cost.
• Slide 9
• Coloring Away Communication in Parallel Query Optimization9 Query tree Coloring problem Given a query tree T = (V, E), the weights of the edges and colors for some subset of the nodes, color the remaining nodes so as to minimize the total weight of multicolored edges.
• Slide 10
• Coloring Away Communication in Parallel Query Optimization10 An example
• Slide 11
• Coloring Away Communication in Parallel Query Optimization11 Problem Simplification (Split) A colored interior node of degree d may be split into d nodes of the same colors and each incident edge connected to a distinct copy. (Collapse) An uncolored leaf node may be collapsed into its parent. This gives it the same color as its parent.
• Slide 12
• Coloring Away Communication in Parallel Query Optimization12 Examples on simplifications
• Slide 13
• Coloring Away Communication in Parallel Query Optimization13 Lemma Suppose m is a mother with edges e 1, e 2 e d to leaf childrean u 1, u 2 u d. Assume that we have numbered the childrean in order of non-decreasing edge weight i.e c e1, c e2 c e3 Then there is a minimal coloring that cuts e 1, e 2 e d.
• Slide 14
• Coloring Away Communication in Parallel Query Optimization14 The algorithm
• Slide 15
• Coloring Away Communication in Parallel Query Optimization15 An example
• Slide 16
• Coloring Away Communication in Parallel Query Optimization16 Algorithm for Repeated colors
• Slide 17
• Coloring Away Communication in Parallel Query Optimization17 Decompose the tree
• Slide 18
• Coloring Away Communication in Parallel Query Optimization18 Combining computation and communication costs We can develop a new model by extending the definition of color to be a triple where P is the partitioning attribute S is the sort attribute The indexing attribute
• Slide 19
• Coloring Away Communication in Parallel Query Optimization19 The cost of a node The cost of a node consists the cost of Recoloring the outputs of its children Have the color of its inputs The cost of executing the strategy itself
• Slide 20
• Coloring Away Communication in Parallel Query Optimization20 Strategy A strategy specifies a particular algorithm for computing an operator. It requires the inputs to satisfy some constraints and guarantees some properties for its output.
• Slide 21
• Coloring Away Communication in Parallel Query Optimization21 Constraint We use color patterns to specify such input- output constraints. A constraint has the form : Input 1, , Input n Output Where Input j, Output are color patterns