Basics of Database Tuning - Simon Fraser University · • Primary index, comparison – Case A > v...

Query Processing

CMPT 454: Database II -- Query Processing 2

Query Processing Stepsσbalance<2500(∏balance(account))∏balance(σbalance<2500(account))Step 1

Step 2Step 3

SELECT balanceFROM accountWHERE balance < 2500

A B+-tree index on balance


Query Cost Measures

• Query processing as an optimization problem– Search space: all possible equivalent relational algebra

expressions all possible query execution plans– Goal: find the most efficient query execution

• Efficiency: I/O or CPU?– CPU processing time is often much smaller than I/O

cost, and is hard to estimate (real systems do consider)– Each I/O access cost may slightly different– Number of block transfers is a measure of the dominant

component of query answering cost– Communication cost in distributed database systems


Selection• Table-scan: read the blocks one by one from disk

– Linear search: scan each file block, • Cost = tS + br * tT, where br is the number of blocks in the file, tS is the average

seek time, and tT is the average block transfer time• Search on a key attribute: an average cost of br / 2• Can be used in any cases

– Binary search: the file is ordered on an attribute, the selection condition is an equality comparison on the attribute

• Cost: ⎡log2(br)⎤ * (tS + tT)• If the attribute is not a key, some extra blocks may need to be read

• Index-scan: using an index in selection– Primary index, equality on key: if a B+-tree is used, (hi + 1) * (tS + tT), where

hi is the height of the tree– Primary index, equality on nonkey: hi * (tT + tS) + tS + tT * b, where b is the

number of blocks containing records with the specified search key– Secondary index, equality on key: (hi + 1) * (tS + tT)– Secondary index, equality on nonkey: (hi + n) * (tT + tS), can be worse than

linear search


Selection Involving Comparisons• Primary index, comparison

– Case A > v or A ≥ v: use the index to find the first tuple having A ≥ v, scan the tuple up to the end of the file, cost hi * (tT + tS) + tS + tT * b

– Case A < v or A ≤ v: scan from the beginning of the file until the condition is violated, the index is not used

• Secondary index, comparison: use the index to find the pointers to the record, retrieve the data blocks– Sort pointers to ensure each block is read once– Can be costly if the selectivity of a query is low (i.e., many tuples satisfy the

condition)• Conjunction σθ1∧ θ2∧. . . θn(r)

– Selection and test using one index on one attribute of composite search key

– Selection by intersection of identifiers• Disjunction σθ1∨ θ2 ∨. . . θn (r) by union of identifiers


Nested-Loop Join• To compute the theta join r θ s

for each tuple tr in r do beginfor each tuple ts in s do begin

test whether pair (tr,ts) satisfies the join condition θif so, add tr • ts to the result

endend– r : the outer relation of the join– s : the inner relation of the join

• No indexes, can be used with any kind of join condition• Cost

– Worst case – only one block of each relation in memory: nr ∗ bs + br– Best case: both relations are in memory: br + bs– If one relation can fit entirely in main memory, use that relation as

the inner relation: br + bs


Block Nested-Loop Join• Idea: once a block is read into main memory, the records in the block

should be utilized as much as possiblefor each block Br of r do begin

for each block Bs of s do beginfor each tuple tr in Br do begin

for each tuple ts in Bs do begincheck if (tr,ts) satisfies the join condition if so, add tr • ts to the result

endend

endend

• Cost– Worst case – only one block for each relation : br ∗ bs + br– Best case: br + bs


Further Improvements• In natural join or equi-join, if the join attributes form a key on the inner

relation, then for each outer relation tuple, the inner loop can stop as soon as the first match is found

• In the block nested-loop algorithm, if M blocks are available, use M-2 blocks for outer relation (why?)– Total cost: ⎡br / (M-2)⎤ ∗ bs + br

• Scan the inner loop alternately forward and backward (similar to the elevator algorithm), reuse the blocks remaining in the buffer– How and why is it good?

• Indexed nested-loop join– An index exists on the inner loop’s join attribute – use the index lookups to

replace file scans– Cost: br (tT + tS) + nr ∗ c, where c is the cost of a single selection on s using

the join condition, nr is the number of records in r• If indices are available on the join attributes of both r and s, use the

relation with fewer tuples as the outer relation


Merge Join• Can be used only for equi-joins and natural joins• Sort both relations on their join attribute (if not already

sorted on the join attributes)• Merge the sorted relations to join them

– Join step is similar to the merge stage of the sort-merge algorithm

– Every pair with same value on join attribute must be matched

• Cost: br + bs block transfers + ⎡br / bb⎤ + ⎡bs / bb⎤seeks + the cost of sorting if relations are unsorted– After sorting, each block needs to be read only once– Suppose all tuples for any given value of the join

attributes fit in memory– Can be further improved by combining the merge

phase of merge-sort with the merge phase of merge-join – merge-join multiple sorted sublists


Hash Join: the Idea


Hash Join• For equi-joins and natural joins only• A hash function h depending only on the join attributes is

used to partition tuples of both relations, – h maps JoinAttrs values to {0, 1, ..., n}– r0, r1, . . ., rn are partitions of r tuples, each tuple tr ∈ r is put in

partition ri where i = h(tr [JoinAttrs])– s0,, s1. . ., sn are partitions of s tuples, each tuple ts ∈s is put in

partition si, where i = h(ts [JoinAttrs])– r tuples in ri need only to be compared with s tuples in si


Setting Parameters of Hash Joins

• Algorithm: relation s: build input, relation r: probe inputPartition the relation s using hashing function h, when partitioning a relation,

one block of memory is reserved as the output buffer for each partitionPartition r similarlyFor each i do

Load si into memory and build an in-memory hash index on it using the join attribute

This hash index uses a different hash function than the earlier one hRead the tuples in ri from the disk one by oneFor each tuple tr locate each matching tuple ts in si using the in-memory hash

indexOutput the concatenation of their attributes

• n and the hash function h is chosen such that each si should fit in memory– Use the smaller input relation as the build relation– The probe relation partitions ri need not fit in memory

• Typically n is chosen as ⎡bs/M⎤ * f where f is a “fudge factor”, typically around 1.2


Recursive Partitioning, Overflow• For number of partitions n is greater than number of pages M of memory,

instead of partitioning n ways, use M – 1 partitions for s• Further partition the M – 1 partitions using a different hash function, use same

partitioning method on r (Rarely required)• Hash table overflow: a partition cannot fit in memory

– Many tuples with same value for join attributes due to bad hash function– Partitioning is said skewed if some partitions have significantly more tuples than

some others• Overflow resolution in build phase

– Partition si is further partitioned using different hash function – Partition ri must be similarly partitioned

• Overflow avoidance– Performs partitioning carefully to avoid overflows during build phase– E.g. partition build relation into many partitions, then combine them

• Both approaches fail with large numbers of duplicates– Fallback option: use block nested loops join on overflowed partitions


Performance Analysis

• Without recursive partitioning: 3 (br + bs) + 4 ∗ nh• For recursive partitioning: 2 (br + bs) ⎡logM–1(bs) –

1⎤ + br + bs– Cost of partitioning s: ⎡logM–1(bs) – 1⎤– Similar cost for partitioning r– best to choose the smaller relation as the build relation

• If the entire build input can be kept in main memory, then do not partition the relations into temporary files– Cost: br + bs


Hybrid Hash Join

• Join the first partitions during partitioning the tables– Partition relation s, keep the first partition s0 in

main memory– Partition relation r, join tuples in r0 with s0 in

main memory– No need to store s0 and r0

• Most useful if M >> sb


Complex Joins

• Join with a conjunctive condition rθ1∧θ 2∧…∧θ n s– Either use nested loops/block nested loops, or– Compute the result of one of the simpler joins rθis

• Final result comprises those tuples in the intermediate result that satisfy the remaining conditions θ1∧...∧θi –1∧θi +1∧...∧θn

• Join with a disjunctive condition rθ1∨θ2∨...∨θns – Either use nested loops/block nested loops, or– Compute as the union of the records in individual joins

rθis• Compute (rθ1s) ∪ (rθ2s) ∪ ... ∪ (rθn s)


Sort-Based Algorithms

Operators Memory cost I/O costDuplicate

elimination, grouping and aggregation

SQRT(B) 3B

Union, intersection and difference SQRT(B(R)+B(S)) 3(B(R)+B(S))

Merge-join SQRT(MAX(B(R), B(S)) 5(B(R)+B(S))Merge-join (improved) SQRT(B(R)+B(S)) 3(B(R)+B(S))


Hash-Based Algorithms

Operators Memory cost I/O costDuplicate

elimination, grouping and aggregation

SQRT(B) 3B

Union, intersection and difference SQRT(MIN(B(R),B(S))) 3(B(R)+B(S))

Simple hash-join SQRT(MIN(B(R),B(S))) 3(B(R)+B(S))Hash-join (improved) SQRT(MIN(B(R),B(S)))


Duplicate Elimination Using Sorting

• Pass-1: sort tuples in sublists• Pass-2: use the available main memory to hold one block

from each sorted sublist, repeatedly copy one to the output and ignore all tuples identical to it

• I/O cost: 3B(R)– B(R) to read each block of R when creating the sorted sublists– B(R) to write each of the sorted sublists to disk– B(R) to read each block from the sublists back to generate the final

results • Memory usage:

– Each sublist can have up to M blocks (why?)– Up to M sublists can be processed in the second pass


ExampleMemory Disk1 2 2 2, 2 52 3 4 4, 4 51 1 2 3, 5

Sublists2, 5, 2, 1, 2, 2 4, 5, 4, 3, 4, 21, 5, 2, 1, 3

Sorting

Memory Disk2 2 2, 2 52 3 4 4, 4 52 3 5

Output “1”

Output“2”

Memory Disk53 4 4, 4 53 5

Memory Disk54 4 4 55

Output“3”

Output“4”

Memory Disk555

Output “5” Answer: 1, 2, 3, 4, 5


Duplicate Elimination Using Hashing

• Hash R to M-1 buckets– Two duplicate tuples will be hashed to the same

bucket• Eliminate duplicates in each bucket

– Assumption: each bucket can fit into main memory

– Memory usage:• I/O cost: 3B(R)

– B(R) in each of the three phases: reading in, writing hashing result, processing each bucket

)(RB


Grouping and Aggregation

• The sort-based method – similar to duplicate elimination– Please study the algorithm and analysis by

yourself• The hash-based method

– Use a hash function depending only on the grouping attributes to hash all tuples to (M-1) buckets

– Scan each bucket once to compute groups


Union Algorithms

• How to compute R∪S?• The sort-based algorithm

– Sort R and S respectively using the same order– Merge the sorted sublists, remove duplicates

• The hash-based method– Hashing S and R into buckets R1, …, RM-1, and S1, …,

SM-1 using the same hash function– Compute Ri∪Si, get the union of the buckets

• Intersection and difference can be computed in a similar way

Basics of Database Tuning - Simon Fraser University · • Primary index, comparison – Case A > v...

Documents

Transcript of Basics of Database Tuning - Simon Fraser University · • Primary index, comparison – Case A > v...