Wavelet and Matrix Mechanism

39
Wavelet and Matrix Mechanism CompSci 590.03 Instructor: Ashwin Machanavajjhala 1 Lecture 11 : 590.03 Fall 12

description

Wavelet and Matrix Mechanism. CompSci 590.03 Instructor: Ashwin Machanavajjhala. Announcement. Project proposal submission deadline is Fri, Oct 12 noon . . Recap: Laplace Mechanism. - PowerPoint PPT Presentation

Transcript of Wavelet and Matrix Mechanism

Page 1: Wavelet and Matrix Mechanism

Lecture 11 : 590.03 Fall 12 1

Wavelet and Matrix Mechanism

CompSci 590.03Instructor: Ashwin Machanavajjhala

Page 2: Wavelet and Matrix Mechanism

Lecture 11 : 590.03 Fall 12 2

Announcement

• Project proposal submission deadline is Fri, Oct 12 noon.

Page 3: Wavelet and Matrix Mechanism

Lecture 11 : 590.03 Fall 12 3

Recap: Laplace MechanismThm: If sensitivity of the query is S, then adding Laplace noise with

parameter λ guarantees ε-differential privacy, when

λ = S/ε

Sensitivity: Smallest number s.t. for any d, d’ differing in one entry, || q(d) – q(d’) || ≤ S(q)

Histogram query: Sensitivity = 2• Variance / error on each entry = 2λ2 = 2x4/ε2

Page 4: Wavelet and Matrix Mechanism

Lecture 11 : 590.03 Fall 12 4

Laplace Mechanism is Suboptimal• Query 1: Number of cancer patients• Query 2: Number of cancer patients

• If you answer both using Laplace mechanism– Sensitivity = 2– Error in each answer: 2x4/ε2

– Average of two answers gives an error of 4/ε2

• If you just answer the first and return the same answer– Sensitivity = 1– Error in the answer: 2/ε2

Page 5: Wavelet and Matrix Mechanism

Lecture 11 : 590.03 Fall 12 5

Outline• Constrained inference

– Ensure that the returned answers are consistent with each other.

• Query Strategy– Answer a different set of strategy queries A– Answer original queries using A

– Universal Histograms– Wavelet Mechanism [Xiao et al ICDE 09]– Matrix Mechanism [Li et al PODS 10]

Page 6: Wavelet and Matrix Mechanism

Lecture 11 : 590.03 Fall 12 6

Note• The following solution ideas are useful whenever

– You want to answer a set of correlated queries.– Queries are based on noisy measurements.– Each measurement (x1 or x1+x2) has similar variance.

Page 7: Wavelet and Matrix Mechanism

Lecture 11 : 590.03 Fall 12 7

Range Queries• Given a set of values {v1, v2, …, vn}• Let xi = number of tuples with value v1. • Range query: q(j,k) = xj + … + xk

Q: Suppose we want to answer all range queries?

Page 8: Wavelet and Matrix Mechanism

Lecture 11 : 590.03 Fall 12 8

Range QueriesQ: Suppose we want to answer all range queries?

Strategy 1: Answer all range queries using Laplace mechanism

• Sensitivity = O(n2)• O(n4/ε2) total error across all range queries. • May reduce using constrained optimization …

Page 9: Wavelet and Matrix Mechanism

Lecture 11 : 590.03 Fall 12 9

Range QueriesQ: Suppose we want to answer all range queries?

Strategy 2: Answer all xi queries using Laplace mechanism Answer range queries using noisy xi values.

• O(1/ε2) error for each xi. • Error(q(1,n)) = O(n/ε2)• Total error on all range queries : O(n3/ε2)

Page 10: Wavelet and Matrix Mechanism

Lecture 11 : 590.03 Fall 12 10

Universal Histograms for Range QueriesStrategy 3:

Answer sufficient statistics using Laplace mechanismAnswer range queries using noisy sufficient statistics.

x1 x2 x3 x4 x5 x6 x7 x8

x12 x34 x56 x78

x1234 x5678

x1-8

[Hay et al VLDB 2010]

Page 11: Wavelet and Matrix Mechanism

Lecture 11 : 590.03 Fall 12 11

Universal Histograms for Range Queries• Sensitivity: log n• q(2,6) = x2+x3+x4+x5+x6 Error = 2 x 5log2n/ε2

= x2 + x34 + x56 Error = 2 x 3log2n/ε2

x1 x2 x3 x4 x5 x6 x7 x8

x12 x34 x56 x78

x1234 x5678

x1-8

Page 12: Wavelet and Matrix Mechanism

Lecture 11 : 590.03 Fall 12 12

Universal Histograms for Range Queries• Every range query can be answered by summing at most log n

different noisy answers• Maximum error on any range query = O(log3n / ε2)• Total error on all range queries = O(n2 log3n / ε2)

x1 x2 x3 x4 x5 x6 x7 x8

x12 x34 x56 x78

x1234 x5678

x1-8

Page 13: Wavelet and Matrix Mechanism

Lecture 11 : 590.03 Fall 12 13

Outline• Constrained inference

– Ensure that the returned answers are consistent with each other.

• Query Strategy– Answer a different set of strategy queries A– Answer original queries using A

– Universal Histograms– Wavelet Mechanism– Matrix Mechanism

Page 14: Wavelet and Matrix Mechanism

Lecture 11 : 590.03 Fall 12 14

Wavelet Mechanism

x1 x2 x3 x4 x5 xn

C2 C3 Cm

… C1

Step 1: Compute Wavelet coefficients

C2+η2 C3+η3 Cm+ηm … C1+η1

Step 2: Add noise to coefficients

y1 y2 y3 y4 y5 yn… Step 3: Reconstruct

original counts

Page 15: Wavelet and Matrix Mechanism

Lecture 11 : 590.03 Fall 12 15

Haar Wavelet

Page 16: Wavelet and Matrix Mechanism

Lecture 11 : 590.03 Fall 12 16

Haar WaveletFor an internal node, Let a = average of leaves in

left subtreeLet b = average of leaves in

right subtree

Page 17: Wavelet and Matrix Mechanism

Lecture 11 : 590.03 Fall 12 17

Haar Wavelet ReconstructionSum of coefficients on root

to leaf path• + if xi is in the left

subtree of coefficient• - if xi is in right subtree

Page 18: Wavelet and Matrix Mechanism

Lecture 11 : 590.03 Fall 12 18

Haar Wavelet : Range QueriesRange Query: number of tuples in a

range S = [a,b]

Let α(c) be the number of values in the left subtree of c that are in S

Let β(c) be the number of values in the right subtree of c that are in S

Page 19: Wavelet and Matrix Mechanism

Lecture 11 : 590.03 Fall 12 19

Haar Wavelet : Range Queries

α(c) – β(c) = 0 when no leaves under c are contained in S

α(c) – β(c) = 0 when all leaves under c are contained in S

Only need to consider those coefficients with partial overlap with the range.

Page 20: Wavelet and Matrix Mechanism

Lecture 11 : 590.03 Fall 12 20

Haar WaveletFor an internal node, Let a = average of leaves in

left subtreeLet b = average of leaves in

right subtree

Page 21: Wavelet and Matrix Mechanism

Lecture 11 : 590.03 Fall 12 21

Adding noise to wavelet coefficients• Associate each coefficient with a weight• level( c ) = height of c in the tree.

• Generalized sensitivity (ρ)

Page 22: Wavelet and Matrix Mechanism

Lecture 11 : 590.03 Fall 12 22

Adding noise to wavelet coefficientsTheorem: Adding noise to a coefficient c from Laplace(λ/W(c))

guarantees (2ρ/λ)-differential privacy.

Proof:

Page 23: Wavelet and Matrix Mechanism

Lecture 11 : 590.03 Fall 12 23

Generalized Sensitivity of Wavelet Mechanism

Proof:• Any coefficient changes by 1/m, where m is the number of values

in its subtree.• m = 1/W(c)• Only c0 and the coefficients in one root to leaf path change if

some xi changes by 1.

Page 24: Wavelet and Matrix Mechanism

Lecture 11 : 590.03 Fall 12 24

Error in answering range queries

• Range query depends on at most O(log n) coefficients.

• Error in each coefficient is at most O(log2n/ε2)

• Error in a range query is O(log3n/ε2)

Page 25: Wavelet and Matrix Mechanism

Lecture 11 : 590.03 Fall 12 25

Summary of Wavelet Mechanism• Query Strategy: use wavelet coefficients

• Can be computed in linear time

• Noise in each range query: O(log3n/ε2)

Page 26: Wavelet and Matrix Mechanism

Lecture 11 : 590.03 Fall 12 26

Outline• Constrained inference

– Ensure that the returned answers are consistent with each other.

• Query Strategy– Answer a different set of strategy queries A– Answer original queries using A

– Universal Histograms– Wavelet Mechanism– Matrix Mechanism

Page 27: Wavelet and Matrix Mechanism

Lecture 11 : 590.03 Fall 12 27

Linear Queries• A set of linear queries can be represented by a matrix• X = [x1, x2, x3, x4] is a vector

representing the counts of 4 values• H4 X represents the following 7 queries

– x1+x2+x3+x4– x1+x2– x3+x4– x1– x2– x3– x4

Page 28: Wavelet and Matrix Mechanism

Lecture 11 : 590.03 Fall 12 28

Query Matrices

Identity Binary Index Haar Wavelet

Page 29: Wavelet and Matrix Mechanism

Lecture 11 : 590.03 Fall 12 29

Sensitivity of a Query Matrix• How many queries are affected by a change in a single count?

Sensitivity = 1 Sensitivity = 3 Sensitivity = 3

Page 30: Wavelet and Matrix Mechanism

Lecture 11 : 590.03 Fall 12 30

Laplace Mechanism

Sensitivity

Noise Vector of Laplace(1)

Page 31: Wavelet and Matrix Mechanism

Lecture 11 : 590.03 Fall 12 31

Matrix MechanismOriginal

Data Noisy Representation

Reconstructed Data

Final query answer

Page 32: Wavelet and Matrix Mechanism

Lecture 11 : 590.03 Fall 12 32

Reconstruction

Page 33: Wavelet and Matrix Mechanism

Lecture 11 : 590.03 Fall 12 33

Matrix Mechanism

Page 34: Wavelet and Matrix Mechanism

Lecture 11 : 590.03 Fall 12 34

Error analysis

Page 35: Wavelet and Matrix Mechanism

Lecture 11 : 590.03 Fall 12 35

Extreme strategies• Strategy A = In

– Noisily answer each xi– Answer queries using noisy counts

• Strategy A = W– Add noise to all the query answers

Good when each query hits a few

values.

Good when sensitivity is small

Page 36: Wavelet and Matrix Mechanism

Lecture 11 : 590.03 Fall 12 36

Finding the Optimal Strategy• Find A that minimizes TotalErrorA(W)

– Reduces to solving a semi-definite program with rank constraints– O(n6) running time.

• See paper for approximations and an interesting discussion on geometry.

Page 37: Wavelet and Matrix Mechanism

Lecture 11 : 590.03 Fall 12 37

Summary• A linear query workload and strategy can be modeled using

matrices

• Previous techniques to find a better strategy to answer a batch of queries is subsumed by the matrix mechanism

• General mechanism to answer queries.

• Noise depends on the sensitivity of the strategy and AtA-1

Page 38: Wavelet and Matrix Mechanism

Lecture 11 : 590.03 Fall 12 38

Next Class• Sparse Vector Technique

– Answering a workload of “sparse” queries

Page 39: Wavelet and Matrix Mechanism

Lecture 11 : 590.03 Fall 12 39

ReferencesX. Xiao, G. Wang, J. Gehrke, “Differential Privacy via Wavelet Transform”, ICDE 2009C. Li, M. Hay, V. Rastogi, G. Miklau, A. McGregor, “Optimizing Linear Queries under

Differential Privacy”, PODS 2010