IntegriDB: Verifiable SQL for Outsourced Databases Yupeng Zhang, Jonathan Katz, Charalampos...

31
IntegriDB: Verifiable SQL for Outsourced Databases Yupeng Zhang, Jonathan Katz, Charalampos Papamanthou University of Maryland

Transcript of IntegriDB: Verifiable SQL for Outsourced Databases Yupeng Zhang, Jonathan Katz, Charalampos...

Page 1: IntegriDB: Verifiable SQL for Outsourced Databases Yupeng Zhang, Jonathan Katz, Charalampos Papamanthou University of Maryland.

IntegriDB: Verifiable SQL for Outsourced Databases

Yupeng Zhang, Jonathan Katz, Charalampos Papamanthou

University of Maryland

Page 2: IntegriDB: Verifiable SQL for Outsourced Databases Yupeng Zhang, Jonathan Katz, Charalampos Papamanthou University of Maryland.

What is a verifiable database?digest

δ

result proof

or

database

SQL query

Update

database’

Update

digest δ’

owner server

clientID na ag

e

1 ali 12

2 ba 24

Page 3: IntegriDB: Verifiable SQL for Outsourced Databases Yupeng Zhang, Jonathan Katz, Charalampos Papamanthou University of Maryland.

Our contributions

IntegriDB: A system for Verifiable SQL

• Expressive: multidimensional RANGE, JOIN, SUM, COUNT, MAX, MIN, etc.; limited nesting Validated on native SQL queries from TPC-H benchmark

• Dynamic: Efficient updates

• Scalable: Validated on tables from TPC-H benchmark (6 million rows)

Page 4: IntegriDB: Verifiable SQL for Outsourced Databases Yupeng Zhang, Jonathan Katz, Charalampos Papamanthou University of Maryland.

1. SELECT SUM (l_extendedprice * (1 - l_discount)) AS revenue FROM lineitem, part WHERE2. ( p_partkey = l_partkey3. AND p_brand = ‘Brand#41’4. AND p_container IN (‘SM CASE’, ‘SM BOX’, ‘SM PACK’, ‘SM PKG’)5. AND l_quantity >= 7 AND l_quantity <= 7 + 106. AND p_size BETWEEN 1 AND 57. AND l_shipmode IN (‘AIR’, ‘AIR REG’)8. AND l_shipinstruct = ‘DELIVER IN PERSON’ )9. OR10. ( p_partkey = l_partkey11. AND p_brand = ‘Brand#14’12. AND p_container IN (‘MED BAG’, ‘MED BOX’,‘MED PKG’, ‘MED PACK’)13. AND l_quantity >= 14 AND l_quantity <= 14 + 1014. AND p_size BETWEEN 1 AND 1015. AND l_shipmode IN (‘AIR’, ‘AIR REG’)16. AND l_shipinstruct = ‘DELIVER IN PERSON’ )17. OR18. ( p_partkey = l_partkey19. AND p_brand = ‘Brand#23’20. AND p_container IN (‘LG CASE’, ‘LG BOX’, ‘LG PACK’, ‘LG PKG’)21. AND l_quantity >= 25 AND l_quantity <= 25 + 1022. AND p_size BETWEEN 1 AND 1523. AND l_shipmode IN (‘AIR’, ‘AIR REG’)24. AND l_shipinstruct = ‘DELIVER IN PERSON’ );

Query #19 of the TPC-H benchmarkhttp://www.tpc.org/tpch

Example

Page 5: IntegriDB: Verifiable SQL for Outsourced Databases Yupeng Zhang, Jonathan Katz, Charalampos Papamanthou University of Maryland.

1. SELECT SUM (l_extendedprice * (1 - l_discount)) AS revenue FROM lineitem, part WHERE2. ( p_partkey = l_partkey3. AND p_brand = ‘Brand#41’4. AND p_container IN (‘SM CASE’, ‘SM BOX’, ‘SM PACK’, ‘SM PKG’)5. AND l_quantity >= 7 AND l_quantity <= 7 + 106. AND p_size BETWEEN 1 AND 57. AND l_shipmode IN (‘AIR’, ‘AIR REG’)8. AND l_shipinstruct = ‘DELIVER IN PERSON’ )9. OR10. ( p_partkey = l_partkey11. AND p_brand = ‘Brand#14’12. AND p_container IN (‘MED BAG’, ‘MED BOX’,‘MED PKG’, ‘MED PACK’)13. AND l_quantity >= 14 AND l_quantity <= 14 + 1014. AND p_size BETWEEN 1 AND 1015. AND l_shipmode IN (‘AIR’, ‘AIR REG’)16. AND l_shipinstruct = ‘DELIVER IN PERSON’ )17. OR18. ( p_partkey = l_partkey19. AND p_brand = ‘Brand#23’20. AND p_container IN (‘LG CASE’, ‘LG BOX’, ‘LG PACK’, ‘LG PKG’)21. AND l_quantity >= 25 AND l_quantity <= 25 + 1022. AND p_size BETWEEN 1 AND 1523. AND l_shipmode IN (‘AIR’, ‘AIR REG’)24. AND l_shipinstruct = ‘DELIVER IN PERSON’ );

Query #19 of the TPC-H benchmarkhttp://www.tpc.org/tpch

Example

Executed on tables with 6 million rows (2.8GB)

Proof size: 184.16KBVerification time: 232ms

Page 6: IntegriDB: Verifiable SQL for Outsourced Databases Yupeng Zhang, Jonathan Katz, Charalampos Papamanthou University of Maryland.

Metrics

result proof

or

database

SQL query

Update

Update

digest δ

owner server

clientID na ag

e

1 ali 12

2 ba 24

setuptime

prover time

proof size.“Optimal”: O(|

R|)

verification time

update time

Page 7: IntegriDB: Verifiable SQL for Outsourced Databases Yupeng Zhang, Jonathan Katz, Charalampos Papamanthou University of Maryland.

Prior solutions• Generic verifiable computation systems• Circuit-based VC [PHCM13, BCTV13, CFHKNPZ14]• RAM-based VC [BFRSBW13, BCGTV13, BCTV14]Not practical

• Authenticated data structuresJoin Multidim

rangeFunction

sNested queries

Update

Tree-based [YPPK09]

Signature-based [PZM09]

Multi-range [PPT14] IntegriDB

Page 8: IntegriDB: Verifiable SQL for Outsourced Databases Yupeng Zhang, Jonathan Katz, Charalampos Papamanthou University of Maryland.

Multidimensional range queriesrow_ID student_ID age GPA First_name

1 747 18 3.5 Alice

2 781 24 3.3 Bob

3 715 21 3.0 Cathy

4 721 20 3.7 David

Table T:

Page 9: IntegriDB: Verifiable SQL for Outsourced Databases Yupeng Zhang, Jonathan Katz, Charalampos Papamanthou University of Maryland.

Multidimensional range queriesrow_ID student_ID age GPA First_name

1 747 18 3.5 Alice

2 781 24 3.3 Bob

3 715 21 3.0 Cathy

4 721 20 3.7 David

SELECT * FROM T WHERE (T.age BETWEEN 17 AND 22) AND (T.student_ID > 720)

Table T:

row_ID student_ID age GPA First_name

1 747 18 3.5 Alice

4 721 20 3.7 David

Result:

Page 10: IntegriDB: Verifiable SQL for Outsourced Databases Yupeng Zhang, Jonathan Katz, Charalampos Papamanthou University of Maryland.

Multidimensional range queriesrow_ID student_ID age GPA First_name

1 747 18 3.5 Alice

2 781 24 3.3 Bob

3 715 21 3.0 Cathy

4 721 20 3.7 David

Table T:

18, {1}

20, {4}

21, {3}

24, {2}

(age, row_ID):

18, {1,4}

21, {2,3}

18, {1,2,3,4}

SELECT * FROM T WHERE (T.age BETWEEN 17 AND 22) AND (T.student_ID > 720)

Page 11: IntegriDB: Verifiable SQL for Outsourced Databases Yupeng Zhang, Jonathan Katz, Charalampos Papamanthou University of Maryland.

Multidimensional range queriesrow_ID student_ID age GPA First_name

1 747 18 3.5 Alice

2 781 24 3.3 Bob

3 715 21 3.0 Cathy

4 721 20 3.7 David

Table T:

18, {1}

20, {4}

21, {3}

24, {2}

18, {1,4}

21, {2,3}

18, {1,2,3,4}

Result of age: {1,4} U {3} = {1,3,4}

715, {3}

721, {4}

747, {1}

781, {2}

715, {3,4}

747, {1,2}

715, {1,2,3,4}

Result of student_ID: {4} U {1,2} = {1,2,4}

Final result: {1,3,4} ∩ {1,2,4} = {1, 4}

(student_ID, row_ID):

SELECT * FROM T WHERE (T.age BETWEEN 17 AND 22) AND (T.student_ID > 720)

(age, row_ID):

Page 12: IntegriDB: Verifiable SQL for Outsourced Databases Yupeng Zhang, Jonathan Katz, Charalampos Papamanthou University of Maryland.

Multidimensional range queriesrow_ID student_ID age GPA First_name

1 747 18 3.5 Alice

2 781 24 3.3 Bob

3 715 21 3.0 Cathy

4 721 20 3.7 David

Table T:

18, {1}

20, {4}

21, {3}

24, {2}

18, {1,4}

21, {2,3}

18, {1,2,3,4}

Result of age: {1,4} U {3} = {1,3,4}

715, {3}

721, {4}

747, {1}

781, {2}

715, {3,4}

747, {1,2}

715, {1,2,3,4}

Result of student_ID: {4} U {1,2} = {1,2,4}

Final result: {1,3,4} ∩ {1,2,4} = {1, 4}

(student_ID, row_ID):

SELECT * FROM T WHERE (T.age BETWEEN 17 AND 22) AND (T.student_ID > 720)

(age, row_ID):

Not efficient!!

Page 13: IntegriDB: Verifiable SQL for Outsourced Databases Yupeng Zhang, Jonathan Katz, Charalampos Papamanthou University of Maryland.

Multidimensional range queries

18, {1}

20, {4}

21, {3}

24, {2}

(age, row_ID):

18, {1,4}

21, {2,3}

18, {1,2,3,4}

( )

Bilinear accumulator: ( ) [Nguyen05]x A

x s

acc A g

Final result: {1,3,4} ∩ {1,2,4} = {1, 4}

Page 14: IntegriDB: Verifiable SQL for Outsourced Databases Yupeng Zhang, Jonathan Katz, Charalampos Papamanthou University of Maryland.

Multidimensional range queries

18, acc({1}

)

20, acc({4}

)

21, acc({3

})

24, acc({2

})

(age, row_ID):

18, acc({1,4}

)

21, acc({2,3

})

18, acc({1,2,3,4

})

Final result: {1,3,4} ∩ {1,2,4} = {1, 4}

( )

Bilinear accumulator: ( ) [Nguyen05]x A

x s

acc A g

Page 15: IntegriDB: Verifiable SQL for Outsourced Databases Yupeng Zhang, Jonathan Katz, Charalampos Papamanthou University of Maryland.

Join queries

row_ID student_ID course_ID

1 715 ENEE140

2 779 ENEE150

3 781 ENEE340

Table T’:

SELECT * FROM T JOIN T’ ON T.student_ID = T’.student_ID

student_ID age course_ID

715 21 ENEE140

781 24 ENEE340

Result:

row_ID student_ID age GPA First_name

1 747 18 3.5 Alice

2 781 24 3.3 Bob

3 715 21 3.0 Cathy

4 721 20 3.7 David

Table T:

GPA First_name

3.0 Cathy

3.3 Bob

Page 16: IntegriDB: Verifiable SQL for Outsourced Databases Yupeng Zhang, Jonathan Katz, Charalampos Papamanthou University of Maryland.

Supporting summationWe extend the bilinear accumulator to support summation 1( )

New bilinear accumulator: '( ) x A

x s

acc A g

11 0

2 3 1 3 1 2 1 1 2

( ) ...

1 1 1 1... ( ... )

... ... ... ...

n

nx A

n

n n n n

x s a s a s a

s sx x x x x x x x x x x x

1 0 1 2/ ... ( )na a x x x sum A

Page 17: IntegriDB: Verifiable SQL for Outsourced Databases Yupeng Zhang, Jonathan Katz, Charalampos Papamanthou University of Maryland.

row_ID student_ID age GPA First_name

1 747 18 3.5 Alice

2 781 24 3.3 Bob

3 715 21 3.0 Cathy

4 721 20 3.7 David

Table T:

SQL SUM

SELECT SUM(age) FROM T: 83

(age, age):

18, '({18,20,21,24})acc

18, '({18,20})acc 21, '({21,24})acc

18, '({18})acc 20, '({20})acc 21, '({21})acc 24, '({24})acc

Page 18: IntegriDB: Verifiable SQL for Outsourced Databases Yupeng Zhang, Jonathan Katz, Charalampos Papamanthou University of Maryland.

SQL functions

SELECT MAX(age) FROM A WHERE A.GPA BETWEEN 3.0 AND 3.5 24

SELECT age FROM A WHERE (A.GPA BETWEEN 3.0 AND 3.5) AND (A.age >= 24)

row_ID student_ID age GPA First_name

1 747 18 3.5 Alice

2 781 24 3.3 Bob

3 715 21 3.0 Cathy

4 721 20 3.7 David

Table T:

SELECT COUNT(*) FROM A WHERE A.age BETWEEN 17 AND 22: 2

SELECT SUM(row_ID’) – SUM(row_ID) FROM A WHERE A.age BETWEEN 17 AND 22

row_ID’

2

3

4

5

Page 19: IntegriDB: Verifiable SQL for Outsourced Databases Yupeng Zhang, Jonathan Katz, Charalampos Papamanthou University of Maryland.

Nested queriesrow_ID student_ID age GPA First_nam

e

1 747 18 3.5 Alice

2 781 24 3.3 Bob

3 715 21 3.0 Cathy

4 721 20 3.7 David

row_ID student_ID course_ID

1 715 ENEE140

2 779 ENEE150

3 781 ENEE340

18, {747}

20, {721}

21, {715}

24, {781}

(T.age, T.student_ID):

18, {721,74

7}

21, {715,78

1}

18, {715,721,747,7

81}

1, {715

}

2, {779,78

1}

1, {715,715,78

1}

(T’.row_ID , T’.student_ID):

SELECT COUNT(student_ID) FROM (SELECT * FROM T WHERE age>19) JOIN (SELECT * FROM T’ WHERE row_ID >1) ON T.student_ID = T’.student_ID

3, {781}

2, {779}

Page 20: IntegriDB: Verifiable SQL for Outsourced Databases Yupeng Zhang, Jonathan Katz, Charalampos Papamanthou University of Maryland.

Nested queriesrow_ID student_ID course_ID

1 715 ENEE140

2 779 ENEE150

3 781 ENEE340

Final result: COUNT(({721} U {715,781}) {715,781}∩ )

SELECT COUNT(student_ID) FROM (SELECT * FROM T WHERE age>19) JOIN (SELECT * FROM T’ WHERE row_ID >1) ON T.student_ID = T’.student_ID

18, {747}

20, {721}

21, {715}

24, {781}

(T.age, T.student_ID):

18, {721,74

7}

21, {715,78

1}

18, {715,721,747,7

81}

1, {715

}

2, {779,78

1}

1, {715,715,78

1}

(T’.row_ID , T’.student_ID):

3, {781}

2, {779}

row_ID student_ID age GPA First_name

1 747 18 3.5 Alice

2 781 24 3.3 Bob

3 715 21 3.0 Cathy

4 721 20 3.7 David

Page 21: IntegriDB: Verifiable SQL for Outsourced Databases Yupeng Zhang, Jonathan Katz, Charalampos Papamanthou University of Maryland.

Efficient update

Updates can be done in logarithmic time

See our paper for details

Page 22: IntegriDB: Verifiable SQL for Outsourced Databases Yupeng Zhang, Jonathan Katz, Charalampos Papamanthou University of Maryland.

Implementation

IntegriDB

Client

SQL Client

SQL query

ClientIntegriDB

queryIntegriD

B Server

SQL Server

subqueries

intermediate

results

Result& proof

Resultor

Server

IntegriDBData

Owner

digest ADS

database

Page 23: IntegriDB: Verifiable SQL for Outsourced Databases Yupeng Zhang, Jonathan Katz, Charalampos Papamanthou University of Maryland.

1. SELECT SUM (l_extendedprice * (1 - l_discount)) AS revenue FROM lineitem, part WHERE2. ( p_partkey = l_partkey3. AND p_brand = ‘Brand#41’4. AND p_container IN (‘SM CASE’, ‘SM BOX’, ‘SM PACK’, ‘SM PKG’)5. AND l_quantity >= 7 AND l_quantity <= 7 + 106. AND p_size BETWEEN 1 AND 57. AND l_shipmode IN (‘AIR’, ‘AIR REG’)8. AND l_shipinstruct = ‘DELIVER IN PERSON’ )9. OR10. ( p_partkey = l_partkey11. AND p_brand = ‘Brand#14’12. AND p_container IN (‘MED BAG’, ‘MED BOX’,‘MED PKG’, ‘MED PACK’)13. AND l_quantity >= 14 AND l_quantity <= 14 + 1014. AND p_size BETWEEN 1 AND 1015. AND l_shipmode IN (‘AIR’, ‘AIR REG’)16. AND l_shipinstruct = ‘DELIVER IN PERSON’ )17. OR18. ( p_partkey = l_partkey19. AND p_brand = ‘Brand#23’20. AND p_container IN (‘LG CASE’, ‘LG BOX’, ‘LG PACK’, ‘LG PKG’)21. AND l_quantity >= 25 AND l_quantity <= 25 + 1022. AND p_size BETWEEN 1 AND 1523. AND l_shipmode IN (‘AIR’, ‘AIR REG’)24. AND l_shipinstruct = ‘DELIVER IN PERSON’ );

TPC-H #19

Page 24: IntegriDB: Verifiable SQL for Outsourced Databases Yupeng Zhang, Jonathan Katz, Charalampos Papamanthou University of Maryland.

1. SELECT SUM (l_extendedprice * (1 - l_discount)) AS revenue FROM lineitem, part WHERE2. ( p_partkey = l_partkey3. AND p_brand = ‘Brand#41’4. AND p_container IN (‘SM CASE’, ‘SM BOX’, ‘SM PACK’, ‘SM PKG’)5. AND l_quantity >= 7 AND l_quantity <= 7 + 106. AND p_size BETWEEN 1 AND 57. AND l_shipmode IN (‘AIR’, ‘AIR REG’)8. AND l_shipinstruct = ‘DELIVER IN PERSON’ )9. OR10. ( p_partkey = l_partkey11. AND p_brand = ‘Brand#14’12. AND p_container IN (‘MED BAG’, ‘MED BOX’,‘MED PKG’, ‘MED PACK’)13. AND l_quantity >= 14 AND l_quantity <= 14 + 1014. AND p_size BETWEEN 1 AND 1015. AND l_shipmode IN (‘AIR’, ‘AIR REG’)16. AND l_shipinstruct = ‘DELIVER IN PERSON’ )17. OR18. ( p_partkey = l_partkey19. AND p_brand = ‘Brand#23’20. AND p_container IN (‘LG CASE’, ‘LG BOX’, ‘LG PACK’, ‘LG PKG’)21. AND l_quantity >= 25 AND l_quantity <= 25 + 1022. AND p_size BETWEEN 1 AND 1523. AND l_shipmode IN (‘AIR’, ‘AIR REG’)24. AND l_shipinstruct = ‘DELIVER IN PERSON’ );

TPC-H #19

Page 25: IntegriDB: Verifiable SQL for Outsourced Databases Yupeng Zhang, Jonathan Katz, Charalampos Papamanthou University of Maryland.

1. SELECT SUM (l_extendedprice * (1 - l_discount)) AS revenue FROM lineitem, part WHERE2. ( p_partkey = l_partkey3. AND p_brand = ‘Brand#41’4. AND p_container IN (‘SM CASE’, ‘SM BOX’, ‘SM PACK’, ‘SM PKG’)5. AND l_quantity >= 7 AND l_quantity <= 7 + 106. AND p_size BETWEEN 1 AND 57. AND l_shipmode IN (‘AIR’, ‘AIR REG’)8. AND l_shipinstruct = ‘DELIVER IN PERSON’ )9. OR10. ( p_partkey = l_partkey11. AND p_brand = ‘Brand#14’12. AND p_container IN (‘MED BAG’, ‘MED BOX’,‘MED PKG’, ‘MED PACK’)13. AND l_quantity >= 14 AND l_quantity <= 14 + 1014. AND p_size BETWEEN 1 AND 1015. AND l_shipmode IN (‘AIR’, ‘AIR REG’)16. AND l_shipinstruct = ‘DELIVER IN PERSON’ )17. OR18. ( p_partkey = l_partkey19. AND p_brand = ‘Brand#23’20. AND p_container IN (‘LG CASE’, ‘LG BOX’, ‘LG PACK’, ‘LG PKG’)21. AND l_quantity >= 25 AND l_quantity <= 25 + 1022. AND p_size BETWEEN 1 AND 1523. AND l_shipmode IN (‘AIR’, ‘AIR REG’)24. AND l_shipinstruct = ‘DELIVER IN PERSON’ );

TPC-H #19

Join query

Multi-range on part

Multi-range on lineitem

Page 26: IntegriDB: Verifiable SQL for Outsourced Databases Yupeng Zhang, Jonathan Katz, Charalampos Papamanthou University of Maryland.

Performance on query #19

Setup time

Prover time

Verification time

Proof size

Update time

Digest size

25272.76s

6422.13s 232ms 184.16KB 150ms 256bits

lineitem: 6 million rows x 16 columns part: 200,000 rows x 9 columns

2.8GB 54MB

Disadvantages:• Slow setup time • Slow prover time

Advantages:• Small digest size• Fast verification time• Small proof size• Excellent update time

One-time cost

Page 27: IntegriDB: Verifiable SQL for Outsourced Databases Yupeng Zhang, Jonathan Katz, Charalampos Papamanthou University of Maryland.

Expressiveness

• IntegriDB supports • 12 out of 22 queries in TPC-H benchmark• 94% of the queries in TPC-C benchmark

Page 28: IntegriDB: Verifiable SQL for Outsourced Databases Yupeng Zhang, Jonathan Katz, Charalampos Papamanthou University of Maryland.

Future work

• Support larger class of SQL queries• Join with duplicates in nested queries• Aggregations among columns• Arbitrary nesting

• Improve prover time• Parallelization• Better data structures, precomputation• Faster crypto primitives

Page 29: IntegriDB: Verifiable SQL for Outsourced Databases Yupeng Zhang, Jonathan Katz, Charalampos Papamanthou University of Maryland.

Code available at http://integridb.github.io/

Thank you!!!

Page 30: IntegriDB: Verifiable SQL for Outsourced Databases Yupeng Zhang, Jonathan Katz, Charalampos Papamanthou University of Maryland.

• Problems of generic circuit-based VC• Cannot support updates• Cannot support nested queries• All queries must be merged into one circuit

• Problems of generic RAM-based VC• Compiler of SQL queries introduces high overhead• All queries must be merged into one RAM program

• Our comparison: only 1 query, no compiler included for generic VC

Comparison to generic VC

Page 31: IntegriDB: Verifiable SQL for Outsourced Databases Yupeng Zhang, Jonathan Katz, Charalampos Papamanthou University of Maryland.

Comparison to generic VC

Libsnark(circuit-based)

[BCTV13]

SNARKs for C(RAM-based)

[BCTV14]

IntegriDB

setup time 187.96s 2000s* 13.878s

prover time 47.57s 1000s* 10.420s

verification time 8ms 10ms* 112ms

proof size 288 Bytes 288 Bytes 84 KB

SELECT SUM(c1) FROM T WHERE (c1 BETWEEN a1 AND b1) AND

... AND (c10 BETWEEN a10 AND b10)

Sum on a 10-dimensional range query on a table of 1000x10 table

*estimation