ULDBs: Databases with Uncertainty and Lineage O. Benjelloun, A. Das Sarma, A. Halevy, J. Widom.
-
Upload
katherine-reynolds -
Category
Documents
-
view
221 -
download
0
Transcript of ULDBs: Databases with Uncertainty and Lineage O. Benjelloun, A. Das Sarma, A. Halevy, J. Widom.
ULDBs: Databases with Uncertainty ULDBs: Databases with Uncertainty and Lineageand Lineage
O. Benjelloun, A. Das Sarma,
A. Halevy, J. Widom
Running Example: Crime-Running Example: Crime-SolvingSolving
• Saw(witness,car) // may be uncertain
• Drives(person,car) // may be uncertain
• Suspects(person) = πperson(Saw ⋈
Drives)
Model for UncertaintyModel for Uncertainty
Model for UncertaintyModel for Uncertainty
1. X-Tuples
– more expressive than or-attributes
2. ‘?’ (Maybe) Annotations
Our Model for UncertaintyOur Model for Uncertainty
• 1. X-Tuples: uncertainty about value
• 2. ‘?’ (Maybe) Annotations
Saw (witness,car)
(Amy, Honda) ∥ (Amy, Toyota) ∥ (Amy, Mazda)
witness car
Amy { Honda, Toyota, Mazda }
=
Three possibleinstances
Our Model for UncertaintyOur Model for Uncertainty
• 1. X-Tuples: uncertainty about value
• 2. ‘?’ (Maybe) Annotations
Saw (witness,car)
(Amy, Honda) ∥ (Sally, Toyota) ∥ (Amy, Mazda)
Three possibleinstances
Not expressible using or-attributes
Six possibleinstances
Our Model for UncertaintyOur Model for Uncertainty
• 1. X-Tuples
• 2. ‘?’ (Maybe): uncertainty about
presence
Saw (witness,car)
(Amy, Honda) ∥ (Amy, Toyota) ∥ (Amy, Mazda)
(Betty, Acura)?
Our Model is Not ClosedOur Model is Not Closed
Saw (witness,car)
(Cathy, Honda) ∥ (Cathy, Mazda)
Drives (person,car)
(Jimmy, Toyota) ∥ (Jimmy, Mazda)
(Billy, Honda) ∥ (Frank, Honda)
(Hank, Honda)
Suspects
Jimmy
Billy ∥ Frank
Hank
Suspects = πperson(Saw ⋈ Drives)
???
Does not correctlycapture possibleinstances in theresult
CANNOT
LineageLineage
Lineage to the RescueLineage to the Rescue
• Lineage
– Captures “where data came from”
– In Trio: A function λ from alternatives to other
alternatives (or external sources)
• Model, with lineage, is complete
– proof omitted
Example with LineageExample with Lineage
ID Saw (witness,car)
11
(Cathy, Honda) ∥ (Cathy, Mazda)
ID Drives (person,car)
21
(Jimmy, Toyota) ∥ (Jimmy, Mazda)
22
(Billy, Honda) ∥ (Frank, Honda)
23
(Hank, Honda)
ID Suspects
31 Jimmy
32 Billy ∥ Frank
33 Hank
???
Suspects = πperson(Saw ⋈ Drives) λ(31) = (11,2),(21,2)
λ(32,1) = (11,1),(22,1); λ(32,2) = (11,1),(22,2)
λ(33) = (11,1), 23
Correctly captures possible instances inthe result
Example: What is the result of Example: What is the result of joining these tables?joining these tables?
ID Saw(Witness, Car)
21 (Amy, Mazda)||(Amy, Toyota)
23 (Betty, Honda)
?
ID Drives(Person, Car)
31 (Jimmy, Mazda)
32 (Jimmy, Toyota)
33 (Billy, Mazda)
34 (Billy, Honda)
What is a legal instance of a ULDB?What is a legal instance of a ULDB?
• Each tuple t in a ULDB is associated by
with a set of pairs (i,j) such that the j-th
alternative of the i-th tuple was used to
derive i
ID Suspects
31 Jimmy
32 Billy ∥ Frank
33 Hank
???
λ(31) = (11,2),(21,2)λ(32,1) = (11,1),(22,1); λ(32,2) = (11,1),(22,2)
λ(33) = (11,1), 23
What is a legal instance of a ULDB?What is a legal instance of a ULDB?
• Let S be the set of all symbols (i.e., pairs (i,j)) in
the database
• An instance of D is derived by picking a set S’µ S
such that
– if (i,j)2 S’ then for every j j’, (i,j’) S’
– 8 (i,j) 2 S’, (i,j)µ S’
– if, for some X-tuple ti, there does not exist a (i,j)2 S’, then
ti is a maybe-tuple and for all (i,j’)2 ti, either (i,j) = ; or
(i,j)* S’
Example: What are all legal Example: What are all legal instances of the following ULDB?instances of the following ULDB?
? (41,1) = {(21,1),(31,1)}
ID Accuses(Witness, Person)
41 (Amy, Jimmy)
42 (Amy, Jimmy)
43 (Amy, Billy)
44 (Betty, Billy)
? (42,1) = {(21,2),(32,1)}
? (41,1) = {(21,1),(33,1)}
? (41,1) = {(23,1),(34,1)}
ID Saw(Witness, Car)
21 (Amy, Mazda)||(Amy, Toyota)
23 (Betty, Honda)
?
ID Drives(Person, Car)
31 (Jimmy, Mazda)
32 (Jimmy, Toyota)
33 (Billy, Mazda)
34 (Billy, Honda)
Well-Behaved LineageWell-Behaved Lineage
• In principle, may be any function
– * is the transitive closure of
• However, useful to restrict to be well
behaved:
– Acyclic: 8 (i,j), (i,j) *(i,j)
– Deterministic: 8 (i,j), (i,j’), if j j’ then either
(i,j)(i,j’) or (i,j)=;
– Uniform: 8 (i,j),(i,j’), B(i,j)=B(i,j’) where
B(i,j) = {k | 9 l, (k,l)2 (i,j)}
Example: Is this ULDB Well-Example: Is this ULDB Well-Behaved?Behaved?
ID A
11 apple
12 pear
ID B
21 red
22 green
(11,1) = {(21,1)}
(21,1) = {(11,1)}
Example: Is this ULDB Well-Example: Is this ULDB Well-Behaved?Behaved?
ID A
11 apple
12 pear
ID B
21 red || green
22 green
(21,1) = {(11,1)} (21,2) = {(11,1)}
Example: Is this ULDB Well-Example: Is this ULDB Well-Behaved?Behaved?
ID A
11 apple || peach
12 pear || grape
ID B
21 red || pink
22 green || purple
(21,1) = {(11,1)} (21,2) = {(11,2)}
(22,1) = {(12,1)} (21,2) = {(11,2)}
QueryingQuerying
Querying Querying
• How do we query a ULDB?
• What tuples are in the answer?
• How is the lineage of the answer defined?– for join?
– projection?
– minus?
• Only consider projection, multi-set selection, join, multiset union– why?
Query Evaluation AlgorithmQuery Evaluation Algorithm
• Given, ULDB D and query Q
• Step 1: Create D’, an ordinary database
derived by taking all alternatives of all tuples
ID Saw(Witness, Car)
21 (Amy, Mazda)||(Amy, Toyota)
23 (Betty, Honda)
ID Saw(Witness, Car)
21, 1 (Amy, Mazda)
21, 2 (Amy, Toyota)
23, 1 (Betty, Honda)
Query Evaluation AlgorithmQuery Evaluation Algorithm
• Step 2: Evaluate the query normally
ID Saw(Witness, Car)
21, 1 (Amy, Mazda)
21, 2 (Amy, Toyota)
23, 1 (Betty, Honda)
ID Accuses(Witness, Person)
41 (Amy, Jimmy)
42 (Amy, Jimmy)
43 (Amy, Billy)
44 (Betty, Billy)
BC
Query Evaluation AlgorithmQuery Evaluation Algorithm
• Step 3: Group tuples in result by the tuple
identifiers (the i value) corresponding to their
lineage by the evaluation
• Step 4: For each group of tuple identifiers
– create a maybe tuple tl with all tuples in group as
alternatives
– set lineage as derived by the evaluation
• Note: all tuples created are maybe-tuples!!
ExamplesExamples
• Complete example from previous slides
• Compute the result of the query:
– (R(A,B) BC S(B,C)) [ T(D,E)
ID R(A,B)
11 (1,2) || (1,3)
12 (4,1) || (5,1)
ID S(B,C)
11 (2,4) || (2,5)
12 (1,3) || (2,3)
ID T(D,E)
11 (7,8)
12 (9,10) || (9,11)
MinimalityMinimality
MinimalityMinimality
• ULDBs may contain superfluous information
• Two types of minimality:
– data minimality: ? may be unneeded, entire tuple
may be unneeded
– lineage minimality
Data Minimality: Example 1Data Minimality: Example 1
ID Saw(Witness, Car)
21 (Amy, Mazda)||(Amy,
Toyota)
23 (Betty, Honda)
?
ID Drives(Person, Car)
31 (Jimmy, Mazda)
32 (Jimmy, Toyota)
33 (Billy, Mazda)
34 (Billy, Honda)ID Suspects
31 Jimmy
32 Billy ∥ Frank
33 Hank
???
λ(31) = (11,2),(21,2)λ(32,1) = (11,1),(22,1); λ(32,2) = (11,1),(22,2)
λ(33) = (11,1), 23
Which ? is not needed?
Data Minimality: Example 2Data Minimality: Example 2
• What is unneeded in the result of the
following query:
– (SawBC Car1) BC witness (SawBC Car2)
ID Saw(Witness, Car)
1 (Amy, Mazda)||(Amy,
Toyota)
ID Car1(Car)
2 Mazda
ID Car2(Car)
3 Toyota
Data-Minimality: FormallyData-Minimality: Formally
• An alternative (i,j) is extraneous if removing
it from the relation does not change the set
of possible instances
• A ? on a tuple is extraneous if removing it
does not change the set of possible
instances
Checking for Data-MinimalityChecking for Data-Minimality
• Theorem: Let D be a well-behaved ULDB.
An alternative (k,l) is extraneous if and only if
there exist (i,j), (i,j’)2 (k,l) with j j’
– Proof?
Checking for Data-MinimalityChecking for Data-Minimality
• Let h(t) be the set of base tuples of t
– tuples that are used to derive an alternative in t, which
have empty lineage
• Let m(t) be the number of alternative of t that are
not extraneous
• Theorem: Let D be a well-behaved ULDB. A ? on
an x-tuple t2 D is extraneous if and only if:
– none of the tuples in h(t) have a ?
– m(t) = t’2 h(t) m(t’)
Test YourselfTest Yourself
• Go back to slides 28-29 and prove what is
extraneous, using the characteristics
Tuple Membership ProblemsTuple Membership Problems
Tuple Membership, Tuple CertaintyTuple Membership, Tuple Certainty
• Recall that:
– The tuple membership problem is to determine if a
tuple is a member in some instance of the ULDB
– The tuple certainty problem is to determine if a tuple is
a member in some instance of the ULDB
• How would you answer tuple membership? Tuple
certainty? What is the complexity of these
problems?