Coloring Away Communication in Parallel Query Optimization Waqar Hasan, Rajeev Motwani Stanford...

Click here to load reader

  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    212
  • download

    0

Embed Size (px)

Transcript of Coloring Away Communication in Parallel Query Optimization Waqar Hasan, Rajeev Motwani Stanford...

  • Slide 1
  • Coloring Away Communication in Parallel Query Optimization Waqar Hasan, Rajeev Motwani Stanford University pavlatos@cslab.ece.ntua.gr
  • Slide 2
  • Coloring Away Communication in Parallel Query Optimization2 Parallel plans for SQL queries The problem is to find optimal parallel plans for SQL queries using a model based on representing the partitioning of data as a color.
  • Slide 3
  • Coloring Away Communication in Parallel Query Optimization3 Hong and Stonebraker approach The problem of parallel plans has been broken into two phases : join ordering and query rewrite (JOQR) and parallelization JOQR Parallelization SQL Query Parallel Plan
  • Slide 4
  • Coloring Away Communication in Parallel Query Optimization4 Optimize JOQR phase JOQR Conventional optimization Query tree annotation and coloring
  • Slide 5
  • Coloring Away Communication in Parallel Query Optimization5 Partitioning A partitioning is a pair (a, h) where a is an attribute and h is a function that maps values of a to non-negative integers.
  • Slide 6
  • Coloring Away Communication in Parallel Query Optimization6 Partitioning example Suppose we have two tables: Emp (name, number) and Cust (name, number) that both are partitioned across two sites using the function h (number) mod 2. Since the tables have the same partitioning Emp Cust = (Emp 0 Cust 0 ) U (Emp 1 Cust 1 ) This permit (Emp Cust) to be computed in Parallel.
  • Slide 7
  • Coloring Away Communication in Parallel Query Optimization7 The new approach We want to choose the partitioning attributes in a query tree to minimize the sum total of communication and computation cost. By regarding partitioning attributes as colors we model the problem as a query tree coloring.
  • Slide 8
  • Coloring Away Communication in Parallel Query Optimization8 Some definitions The color of a node in a query tree is the attribute used for partitioning the node. An edge between nodes i and j is multicolored if and only if i has different color from j The weight c e of an edge represent the repartition cost.
  • Slide 9
  • Coloring Away Communication in Parallel Query Optimization9 Query tree Coloring problem Given a query tree T = (V, E), the weights of the edges and colors for some subset of the nodes, color the remaining nodes so as to minimize the total weight of multicolored edges.
  • Slide 10
  • Coloring Away Communication in Parallel Query Optimization10 An example
  • Slide 11
  • Coloring Away Communication in Parallel Query Optimization11 Problem Simplification (Split) A colored interior node of degree d may be split into d nodes of the same colors and each incident edge connected to a distinct copy. (Collapse) An uncolored leaf node may be collapsed into its parent. This gives it the same color as its parent.
  • Slide 12
  • Coloring Away Communication in Parallel Query Optimization12 Examples on simplifications
  • Slide 13
  • Coloring Away Communication in Parallel Query Optimization13 Lemma Suppose m is a mother with edges e 1, e 2 e d to leaf childrean u 1, u 2 u d. Assume that we have numbered the childrean in order of non-decreasing edge weight i.e c e1, c e2 c e3 Then there is a minimal coloring that cuts e 1, e 2 e d.
  • Slide 14
  • Coloring Away Communication in Parallel Query Optimization14 The algorithm
  • Slide 15
  • Coloring Away Communication in Parallel Query Optimization15 An example
  • Slide 16
  • Coloring Away Communication in Parallel Query Optimization16 Algorithm for Repeated colors
  • Slide 17
  • Coloring Away Communication in Parallel Query Optimization17 Decompose the tree
  • Slide 18
  • Coloring Away Communication in Parallel Query Optimization18 Combining computation and communication costs We can develop a new model by extending the definition of color to be a triple where P is the partitioning attribute S is the sort attribute The indexing attribute
  • Slide 19
  • Coloring Away Communication in Parallel Query Optimization19 The cost of a node The cost of a node consists the cost of Recoloring the outputs of its children Have the color of its inputs The cost of executing the strategy itself
  • Slide 20
  • Coloring Away Communication in Parallel Query Optimization20 Strategy A strategy specifies a particular algorithm for computing an operator. It requires the inputs to satisfy some constraints and guarantees some properties for its output.
  • Slide 21
  • Coloring Away Communication in Parallel Query Optimization21 Constraint We use color patterns to specify such input- output constraints. A constraint has the form : Input 1, , Input n Output Where Input j, Output are color patterns