Preservation of Proximity Privacy in Publishing Numerical Sensitive Data

29
Preservation of Proximity Privacy in Publishing Numerical Sensitive Data J. Li, Y. Tao, and X. Xiao SIGMOD 08 Presented by Hongwei Tian

description

Preservation of Proximity Privacy in Publishing Numerical Sensitive Data. J. Li, Y. Tao, and X. Xiao SIGMOD 08 Presented by Hongwei Tian. Outline. What is PPDP Existing Privacy Principles Proximity Attack ( ε , m)-anonymity Determine ε and m Algorithm Experiments and Conclusion. - PowerPoint PPT Presentation

Transcript of Preservation of Proximity Privacy in Publishing Numerical Sensitive Data

Page 1: Preservation of Proximity Privacy in Publishing Numerical Sensitive Data

Preservation of Proximity Privacy in Publishing Numerical Sensitive DataJ. Li, Y. Tao, and X. Xiao

SIGMOD 08

Presented by Hongwei Tian

Page 2: Preservation of Proximity Privacy in Publishing Numerical Sensitive Data

Outline

What is PPDP Existing Privacy Principles

Proximity Attack (ε, m)-anonymity Determine ε and m Algorithm

Experiments and Conclusion

Page 3: Preservation of Proximity Privacy in Publishing Numerical Sensitive Data

Privacy Preservation Data Publishing A true story in Massachusetts, 1997

GIC 20 dollars Governor Weld

Page 4: Preservation of Proximity Privacy in Publishing Numerical Sensitive Data

PPDP

Page 5: Preservation of Proximity Privacy in Publishing Numerical Sensitive Data

PPDP

Anonymization Technique Generalization

Specific value -> General value Maintain the semantic meaning

78256 -> 7825*, UTSA -> University, 28 -> [20, 30]

Perturbation One value -> another random value Huge information loss -> poor utility

Page 6: Preservation of Proximity Privacy in Publishing Numerical Sensitive Data

PPDP

Example of Generalization

Page 7: Preservation of Proximity Privacy in Publishing Numerical Sensitive Data

Some Existing Privacy Principles Generalization

SA – Categorical k-anonymity l-diversity, (α, k)-anonymity, m-invariance, … (c, k)-safety, Skyline-privacy …

SA – Numerical (k, e)-anonymity, Variance Control t-closeness δ-presence …

Page 8: Preservation of Proximity Privacy in Publishing Numerical Sensitive Data

Next…

What is PPDP Existing Privacy Principles

Proximity Attack (ε, m)-anonymity Determine ε and m Algorithm

Experiments and Conclusion

Page 9: Preservation of Proximity Privacy in Publishing Numerical Sensitive Data

Proximity Attack

Page 10: Preservation of Proximity Privacy in Publishing Numerical Sensitive Data

(ε, m)-anonymity

I(t) private neighborhood of tuple t I(t) = [t.SA − ε, t.SA + ε] I(t) = [t.SA·(1 − ε), t.SA·(1 + ε)]

P(t) the risk of proximity breach of tuple t P(t) = x / |G|

Page 11: Preservation of Proximity Privacy in Publishing Numerical Sensitive Data

(ε, m)-anonymity

ε = 20 I(t1) = [980, 1020] x = 3, |G| = 4 P(t1) = 3/4

Page 12: Preservation of Proximity Privacy in Publishing Numerical Sensitive Data

(ε, m)-anonymity

Principle Given a real value ε and an integer m ≥ 1, a

generalized table T∗ fulfills absolute (relative) (ε,m)-anonymity, if

P(t) ≤ 1/m

for every tuple t T.∈

Larger ε and m mean stricter privacy requirement

Page 13: Preservation of Proximity Privacy in Publishing Numerical Sensitive Data

(ε, m)-anonymity

What is the Meaning of m? |G| ≥ m The best situation is for any two tuples ti and tj in

G, and

Similar to l-diversity when the equivalence class has l tuples with distinct SA values.

Page 14: Preservation of Proximity Privacy in Publishing Numerical Sensitive Data

(ε, m)-anonymity

How to make tj.SA does not fall in I(ti)? All tuples in G are sorted in ascending order of their

SA values

| j – i | max≧ { |left(tj,G)|, |right(ti,G)| }

Page 15: Preservation of Proximity Privacy in Publishing Numerical Sensitive Data

(ε, m)-anonymity

Let maxsize(G) =

max t G∀ ∈ { max{ |left(t,G)|, |right(t,G)| } }

| j – i | ≧ maxsize(G)

Page 16: Preservation of Proximity Privacy in Publishing Numerical Sensitive Data

(ε, m)-anonymity

Page 17: Preservation of Proximity Privacy in Publishing Numerical Sensitive Data

(ε, m)-anonymity (6, 2)-anonymity

Privacy is breached P(t3)= ¾ >1/m =1/2

Need partitioning An ascending order is ready according to SA values g = maxsize(G) = 2 j = (i mod 2)+1 New P(t3)= 1/2

tupleNo QI SA

1 q 10

2 q 20

3 q 25

4 q 30

Page 18: Preservation of Proximity Privacy in Publishing Numerical Sensitive Data

Determine ε and m

Page 19: Preservation of Proximity Privacy in Publishing Numerical Sensitive Data

Algorithm

Step 1: Splitting Mondrain, ICDE 2006. Splitting is only based on QI-attributes

Iteratively find median value of frequency sets on one selected QI-dimension to cut G into G1 and G2, and make sure G1 and G2 are legal to be partitioned.

Page 20: Preservation of Proximity Privacy in Publishing Numerical Sensitive Data

Algorithm Splitting ((6, 2)-anonymity)

20

30

25

10 40

50

Page 21: Preservation of Proximity Privacy in Publishing Numerical Sensitive Data

Algorithm

Step 2: Partitioning After step 1 stops Check all G produced by splitting

Release directly if G satisfies (ε, m)-anonymity Otherwise, Partitioning, and then release new

buckets

Page 22: Preservation of Proximity Privacy in Publishing Numerical Sensitive Data

Algorithm

Partitioning ((6, 2)-anonymity)

20

30

25

10 40

50

Page 23: Preservation of Proximity Privacy in Publishing Numerical Sensitive Data

Next…

What is PPDP Evolution of Privacy Preservation Proximity Attack

(ε, m)-anonymity determine ε and m algorithm

Experiments and Conclusion

Page 24: Preservation of Proximity Privacy in Publishing Numerical Sensitive Data

Experiments

Real Database SAL http://ipums.org Attributes are Age, Birthplace, Occupation and

Income with domains [16,93], [1,710], [1,983], and [1k, 100k], respectively.

500K tuples

Compare to a perturbation method (OLAP, SIGMOD 2005 )

Page 25: Preservation of Proximity Privacy in Publishing Numerical Sensitive Data

Experiments - Utility Use count query with workload = 1000

Page 26: Preservation of Proximity Privacy in Publishing Numerical Sensitive Data

Experiments - Utility

Page 27: Preservation of Proximity Privacy in Publishing Numerical Sensitive Data

Experiments - Efficiency

Page 28: Preservation of Proximity Privacy in Publishing Numerical Sensitive Data

Conclusion

Page 29: Preservation of Proximity Privacy in Publishing Numerical Sensitive Data

Any Question?