ULDBs: Databases with Uncertainty and Lineage O. Benjelloun, A. Das Sarma, A. Halevy, J. Widom.

35
ULDBs: Databases with ULDBs: Databases with Uncertainty and Lineage Uncertainty and Lineage O. Benjelloun, A. Das Sarma, A. Halevy, J. Widom

Transcript of ULDBs: Databases with Uncertainty and Lineage O. Benjelloun, A. Das Sarma, A. Halevy, J. Widom.

Page 1: ULDBs: Databases with Uncertainty and Lineage O. Benjelloun, A. Das Sarma, A. Halevy, J. Widom.

ULDBs: Databases with Uncertainty ULDBs: Databases with Uncertainty and Lineageand Lineage

O. Benjelloun, A. Das Sarma,

A. Halevy, J. Widom

Page 2: ULDBs: Databases with Uncertainty and Lineage O. Benjelloun, A. Das Sarma, A. Halevy, J. Widom.

Running Example: Crime-Running Example: Crime-SolvingSolving

• Saw(witness,car) // may be uncertain

• Drives(person,car) // may be uncertain

• Suspects(person) = πperson(Saw ⋈

Drives)

Page 3: ULDBs: Databases with Uncertainty and Lineage O. Benjelloun, A. Das Sarma, A. Halevy, J. Widom.

Model for UncertaintyModel for Uncertainty

Page 4: ULDBs: Databases with Uncertainty and Lineage O. Benjelloun, A. Das Sarma, A. Halevy, J. Widom.

Model for UncertaintyModel for Uncertainty

1. X-Tuples

– more expressive than or-attributes

2. ‘?’ (Maybe) Annotations

Page 5: ULDBs: Databases with Uncertainty and Lineage O. Benjelloun, A. Das Sarma, A. Halevy, J. Widom.

Our Model for UncertaintyOur Model for Uncertainty

• 1. X-Tuples: uncertainty about value

• 2. ‘?’ (Maybe) Annotations

Saw (witness,car)

(Amy, Honda) ∥ (Amy, Toyota) ∥ (Amy, Mazda)

witness car

Amy { Honda, Toyota, Mazda }

=

Three possibleinstances

Page 6: ULDBs: Databases with Uncertainty and Lineage O. Benjelloun, A. Das Sarma, A. Halevy, J. Widom.

Our Model for UncertaintyOur Model for Uncertainty

• 1. X-Tuples: uncertainty about value

• 2. ‘?’ (Maybe) Annotations

Saw (witness,car)

(Amy, Honda) ∥ (Sally, Toyota) ∥ (Amy, Mazda)

Three possibleinstances

Not expressible using or-attributes

Page 7: ULDBs: Databases with Uncertainty and Lineage O. Benjelloun, A. Das Sarma, A. Halevy, J. Widom.

Six possibleinstances

Our Model for UncertaintyOur Model for Uncertainty

• 1. X-Tuples

• 2. ‘?’ (Maybe): uncertainty about

presence

Saw (witness,car)

(Amy, Honda) ∥ (Amy, Toyota) ∥ (Amy, Mazda)

(Betty, Acura)?

Page 8: ULDBs: Databases with Uncertainty and Lineage O. Benjelloun, A. Das Sarma, A. Halevy, J. Widom.

Our Model is Not ClosedOur Model is Not Closed

Saw (witness,car)

(Cathy, Honda) ∥ (Cathy, Mazda)

Drives (person,car)

(Jimmy, Toyota) ∥ (Jimmy, Mazda)

(Billy, Honda) ∥ (Frank, Honda)

(Hank, Honda)

Suspects

Jimmy

Billy ∥ Frank

Hank

Suspects = πperson(Saw ⋈ Drives)

???

Does not correctlycapture possibleinstances in theresult

CANNOT

Page 9: ULDBs: Databases with Uncertainty and Lineage O. Benjelloun, A. Das Sarma, A. Halevy, J. Widom.

LineageLineage

Page 10: ULDBs: Databases with Uncertainty and Lineage O. Benjelloun, A. Das Sarma, A. Halevy, J. Widom.

Lineage to the RescueLineage to the Rescue

• Lineage

– Captures “where data came from”

– In Trio: A function λ from alternatives to other

alternatives (or external sources)

• Model, with lineage, is complete

– proof omitted

Page 11: ULDBs: Databases with Uncertainty and Lineage O. Benjelloun, A. Das Sarma, A. Halevy, J. Widom.

Example with LineageExample with Lineage

ID Saw (witness,car)

11

(Cathy, Honda) ∥ (Cathy, Mazda)

ID Drives (person,car)

21

(Jimmy, Toyota) ∥ (Jimmy, Mazda)

22

(Billy, Honda) ∥ (Frank, Honda)

23

(Hank, Honda)

ID Suspects

31 Jimmy

32 Billy ∥ Frank

33 Hank

???

Suspects = πperson(Saw ⋈ Drives) λ(31) = (11,2),(21,2)

λ(32,1) = (11,1),(22,1); λ(32,2) = (11,1),(22,2)

λ(33) = (11,1), 23

Correctly captures possible instances inthe result

Page 12: ULDBs: Databases with Uncertainty and Lineage O. Benjelloun, A. Das Sarma, A. Halevy, J. Widom.

Example: What is the result of Example: What is the result of joining these tables?joining these tables?

ID Saw(Witness, Car)

21 (Amy, Mazda)||(Amy, Toyota)

23 (Betty, Honda)

?

ID Drives(Person, Car)

31 (Jimmy, Mazda)

32 (Jimmy, Toyota)

33 (Billy, Mazda)

34 (Billy, Honda)

Page 13: ULDBs: Databases with Uncertainty and Lineage O. Benjelloun, A. Das Sarma, A. Halevy, J. Widom.

What is a legal instance of a ULDB?What is a legal instance of a ULDB?

• Each tuple t in a ULDB is associated by

with a set of pairs (i,j) such that the j-th

alternative of the i-th tuple was used to

derive i

ID Suspects

31 Jimmy

32 Billy ∥ Frank

33 Hank

???

λ(31) = (11,2),(21,2)λ(32,1) = (11,1),(22,1); λ(32,2) = (11,1),(22,2)

λ(33) = (11,1), 23

Page 14: ULDBs: Databases with Uncertainty and Lineage O. Benjelloun, A. Das Sarma, A. Halevy, J. Widom.

What is a legal instance of a ULDB?What is a legal instance of a ULDB?

• Let S be the set of all symbols (i.e., pairs (i,j)) in

the database

• An instance of D is derived by picking a set S’µ S

such that

– if (i,j)2 S’ then for every j j’, (i,j’) S’

– 8 (i,j) 2 S’, (i,j)µ S’

– if, for some X-tuple ti, there does not exist a (i,j)2 S’, then

ti is a maybe-tuple and for all (i,j’)2 ti, either (i,j) = ; or

(i,j)* S’

Page 15: ULDBs: Databases with Uncertainty and Lineage O. Benjelloun, A. Das Sarma, A. Halevy, J. Widom.

Example: What are all legal Example: What are all legal instances of the following ULDB?instances of the following ULDB?

? (41,1) = {(21,1),(31,1)}

ID Accuses(Witness, Person)

41 (Amy, Jimmy)

42 (Amy, Jimmy)

43 (Amy, Billy)

44 (Betty, Billy)

? (42,1) = {(21,2),(32,1)}

? (41,1) = {(21,1),(33,1)}

? (41,1) = {(23,1),(34,1)}

ID Saw(Witness, Car)

21 (Amy, Mazda)||(Amy, Toyota)

23 (Betty, Honda)

?

ID Drives(Person, Car)

31 (Jimmy, Mazda)

32 (Jimmy, Toyota)

33 (Billy, Mazda)

34 (Billy, Honda)

Page 16: ULDBs: Databases with Uncertainty and Lineage O. Benjelloun, A. Das Sarma, A. Halevy, J. Widom.

Well-Behaved LineageWell-Behaved Lineage

• In principle, may be any function

– * is the transitive closure of

• However, useful to restrict to be well

behaved:

– Acyclic: 8 (i,j), (i,j) *(i,j)

– Deterministic: 8 (i,j), (i,j’), if j j’ then either

(i,j)(i,j’) or (i,j)=;

– Uniform: 8 (i,j),(i,j’), B(i,j)=B(i,j’) where

B(i,j) = {k | 9 l, (k,l)2 (i,j)}

Page 17: ULDBs: Databases with Uncertainty and Lineage O. Benjelloun, A. Das Sarma, A. Halevy, J. Widom.

Example: Is this ULDB Well-Example: Is this ULDB Well-Behaved?Behaved?

ID A

11 apple

12 pear

ID B

21 red

22 green

(11,1) = {(21,1)}

(21,1) = {(11,1)}

Page 18: ULDBs: Databases with Uncertainty and Lineage O. Benjelloun, A. Das Sarma, A. Halevy, J. Widom.

Example: Is this ULDB Well-Example: Is this ULDB Well-Behaved?Behaved?

ID A

11 apple

12 pear

ID B

21 red || green

22 green

(21,1) = {(11,1)} (21,2) = {(11,1)}

Page 19: ULDBs: Databases with Uncertainty and Lineage O. Benjelloun, A. Das Sarma, A. Halevy, J. Widom.

Example: Is this ULDB Well-Example: Is this ULDB Well-Behaved?Behaved?

ID A

11 apple || peach

12 pear || grape

ID B

21 red || pink

22 green || purple

(21,1) = {(11,1)} (21,2) = {(11,2)}

(22,1) = {(12,1)} (21,2) = {(11,2)}

Page 20: ULDBs: Databases with Uncertainty and Lineage O. Benjelloun, A. Das Sarma, A. Halevy, J. Widom.

QueryingQuerying

Page 21: ULDBs: Databases with Uncertainty and Lineage O. Benjelloun, A. Das Sarma, A. Halevy, J. Widom.

Querying Querying

• How do we query a ULDB?

• What tuples are in the answer?

• How is the lineage of the answer defined?– for join?

– projection?

– minus?

• Only consider projection, multi-set selection, join, multiset union– why?

Page 22: ULDBs: Databases with Uncertainty and Lineage O. Benjelloun, A. Das Sarma, A. Halevy, J. Widom.

Query Evaluation AlgorithmQuery Evaluation Algorithm

• Given, ULDB D and query Q

• Step 1: Create D’, an ordinary database

derived by taking all alternatives of all tuples

ID Saw(Witness, Car)

21 (Amy, Mazda)||(Amy, Toyota)

23 (Betty, Honda)

ID Saw(Witness, Car)

21, 1 (Amy, Mazda)

21, 2 (Amy, Toyota)

23, 1 (Betty, Honda)

Page 23: ULDBs: Databases with Uncertainty and Lineage O. Benjelloun, A. Das Sarma, A. Halevy, J. Widom.

Query Evaluation AlgorithmQuery Evaluation Algorithm

• Step 2: Evaluate the query normally

ID Saw(Witness, Car)

21, 1 (Amy, Mazda)

21, 2 (Amy, Toyota)

23, 1 (Betty, Honda)

ID Accuses(Witness, Person)

41 (Amy, Jimmy)

42 (Amy, Jimmy)

43 (Amy, Billy)

44 (Betty, Billy)

BC

Page 24: ULDBs: Databases with Uncertainty and Lineage O. Benjelloun, A. Das Sarma, A. Halevy, J. Widom.

Query Evaluation AlgorithmQuery Evaluation Algorithm

• Step 3: Group tuples in result by the tuple

identifiers (the i value) corresponding to their

lineage by the evaluation

• Step 4: For each group of tuple identifiers

– create a maybe tuple tl with all tuples in group as

alternatives

– set lineage as derived by the evaluation

• Note: all tuples created are maybe-tuples!!

Page 25: ULDBs: Databases with Uncertainty and Lineage O. Benjelloun, A. Das Sarma, A. Halevy, J. Widom.

ExamplesExamples

• Complete example from previous slides

• Compute the result of the query:

– (R(A,B) BC S(B,C)) [ T(D,E)

ID R(A,B)

11 (1,2) || (1,3)

12 (4,1) || (5,1)

ID S(B,C)

11 (2,4) || (2,5)

12 (1,3) || (2,3)

ID T(D,E)

11 (7,8)

12 (9,10) || (9,11)

Page 26: ULDBs: Databases with Uncertainty and Lineage O. Benjelloun, A. Das Sarma, A. Halevy, J. Widom.

MinimalityMinimality

Page 27: ULDBs: Databases with Uncertainty and Lineage O. Benjelloun, A. Das Sarma, A. Halevy, J. Widom.

MinimalityMinimality

• ULDBs may contain superfluous information

• Two types of minimality:

– data minimality: ? may be unneeded, entire tuple

may be unneeded

– lineage minimality

Page 28: ULDBs: Databases with Uncertainty and Lineage O. Benjelloun, A. Das Sarma, A. Halevy, J. Widom.

Data Minimality: Example 1Data Minimality: Example 1

ID Saw(Witness, Car)

21 (Amy, Mazda)||(Amy,

Toyota)

23 (Betty, Honda)

?

ID Drives(Person, Car)

31 (Jimmy, Mazda)

32 (Jimmy, Toyota)

33 (Billy, Mazda)

34 (Billy, Honda)ID Suspects

31 Jimmy

32 Billy ∥ Frank

33 Hank

???

λ(31) = (11,2),(21,2)λ(32,1) = (11,1),(22,1); λ(32,2) = (11,1),(22,2)

λ(33) = (11,1), 23

Which ? is not needed?

Page 29: ULDBs: Databases with Uncertainty and Lineage O. Benjelloun, A. Das Sarma, A. Halevy, J. Widom.

Data Minimality: Example 2Data Minimality: Example 2

• What is unneeded in the result of the

following query:

– (SawBC Car1) BC witness (SawBC Car2)

ID Saw(Witness, Car)

1 (Amy, Mazda)||(Amy,

Toyota)

ID Car1(Car)

2 Mazda

ID Car2(Car)

3 Toyota

Page 30: ULDBs: Databases with Uncertainty and Lineage O. Benjelloun, A. Das Sarma, A. Halevy, J. Widom.

Data-Minimality: FormallyData-Minimality: Formally

• An alternative (i,j) is extraneous if removing

it from the relation does not change the set

of possible instances

• A ? on a tuple is extraneous if removing it

does not change the set of possible

instances

Page 31: ULDBs: Databases with Uncertainty and Lineage O. Benjelloun, A. Das Sarma, A. Halevy, J. Widom.

Checking for Data-MinimalityChecking for Data-Minimality

• Theorem: Let D be a well-behaved ULDB.

An alternative (k,l) is extraneous if and only if

there exist (i,j), (i,j’)2 (k,l) with j j’

– Proof?

Page 32: ULDBs: Databases with Uncertainty and Lineage O. Benjelloun, A. Das Sarma, A. Halevy, J. Widom.

Checking for Data-MinimalityChecking for Data-Minimality

• Let h(t) be the set of base tuples of t

– tuples that are used to derive an alternative in t, which

have empty lineage

• Let m(t) be the number of alternative of t that are

not extraneous

• Theorem: Let D be a well-behaved ULDB. A ? on

an x-tuple t2 D is extraneous if and only if:

– none of the tuples in h(t) have a ?

– m(t) = t’2 h(t) m(t’)

Page 33: ULDBs: Databases with Uncertainty and Lineage O. Benjelloun, A. Das Sarma, A. Halevy, J. Widom.

Test YourselfTest Yourself

• Go back to slides 28-29 and prove what is

extraneous, using the characteristics

Page 34: ULDBs: Databases with Uncertainty and Lineage O. Benjelloun, A. Das Sarma, A. Halevy, J. Widom.

Tuple Membership ProblemsTuple Membership Problems

Page 35: ULDBs: Databases with Uncertainty and Lineage O. Benjelloun, A. Das Sarma, A. Halevy, J. Widom.

Tuple Membership, Tuple CertaintyTuple Membership, Tuple Certainty

• Recall that:

– The tuple membership problem is to determine if a

tuple is a member in some instance of the ULDB

– The tuple certainty problem is to determine if a tuple is

a member in some instance of the ULDB

• How would you answer tuple membership? Tuple

certainty? What is the complexity of these

problems?