Preservation of Proximity Privacy in Publishing Numerical Sensitive Data
description
Transcript of Preservation of Proximity Privacy in Publishing Numerical Sensitive Data
Preservation of Proximity Privacy in Publishing Numerical Sensitive DataJ. Li, Y. Tao, and X. Xiao
SIGMOD 08
Presented by Hongwei Tian
Outline
What is PPDP Existing Privacy Principles
Proximity Attack (ε, m)-anonymity Determine ε and m Algorithm
Experiments and Conclusion
Privacy Preservation Data Publishing A true story in Massachusetts, 1997
GIC 20 dollars Governor Weld
PPDP
PPDP
Anonymization Technique Generalization
Specific value -> General value Maintain the semantic meaning
78256 -> 7825*, UTSA -> University, 28 -> [20, 30]
Perturbation One value -> another random value Huge information loss -> poor utility
PPDP
Example of Generalization
Some Existing Privacy Principles Generalization
SA – Categorical k-anonymity l-diversity, (α, k)-anonymity, m-invariance, … (c, k)-safety, Skyline-privacy …
SA – Numerical (k, e)-anonymity, Variance Control t-closeness δ-presence …
Next…
What is PPDP Existing Privacy Principles
Proximity Attack (ε, m)-anonymity Determine ε and m Algorithm
Experiments and Conclusion
Proximity Attack
(ε, m)-anonymity
I(t) private neighborhood of tuple t I(t) = [t.SA − ε, t.SA + ε] I(t) = [t.SA·(1 − ε), t.SA·(1 + ε)]
P(t) the risk of proximity breach of tuple t P(t) = x / |G|
(ε, m)-anonymity
ε = 20 I(t1) = [980, 1020] x = 3, |G| = 4 P(t1) = 3/4
(ε, m)-anonymity
Principle Given a real value ε and an integer m ≥ 1, a
generalized table T∗ fulfills absolute (relative) (ε,m)-anonymity, if
P(t) ≤ 1/m
for every tuple t T.∈
Larger ε and m mean stricter privacy requirement
(ε, m)-anonymity
What is the Meaning of m? |G| ≥ m The best situation is for any two tuples ti and tj in
G, and
Similar to l-diversity when the equivalence class has l tuples with distinct SA values.
(ε, m)-anonymity
How to make tj.SA does not fall in I(ti)? All tuples in G are sorted in ascending order of their
SA values
| j – i | max≧ { |left(tj,G)|, |right(ti,G)| }
(ε, m)-anonymity
Let maxsize(G) =
max t G∀ ∈ { max{ |left(t,G)|, |right(t,G)| } }
| j – i | ≧ maxsize(G)
(ε, m)-anonymity
(ε, m)-anonymity (6, 2)-anonymity
Privacy is breached P(t3)= ¾ >1/m =1/2
Need partitioning An ascending order is ready according to SA values g = maxsize(G) = 2 j = (i mod 2)+1 New P(t3)= 1/2
tupleNo QI SA
1 q 10
2 q 20
3 q 25
4 q 30
Determine ε and m
Algorithm
Step 1: Splitting Mondrain, ICDE 2006. Splitting is only based on QI-attributes
Iteratively find median value of frequency sets on one selected QI-dimension to cut G into G1 and G2, and make sure G1 and G2 are legal to be partitioned.
Algorithm Splitting ((6, 2)-anonymity)
20
30
25
10 40
50
Algorithm
Step 2: Partitioning After step 1 stops Check all G produced by splitting
Release directly if G satisfies (ε, m)-anonymity Otherwise, Partitioning, and then release new
buckets
Algorithm
Partitioning ((6, 2)-anonymity)
20
30
25
10 40
50
Next…
What is PPDP Evolution of Privacy Preservation Proximity Attack
(ε, m)-anonymity determine ε and m algorithm
Experiments and Conclusion
Experiments
Real Database SAL http://ipums.org Attributes are Age, Birthplace, Occupation and
Income with domains [16,93], [1,710], [1,983], and [1k, 100k], respectively.
500K tuples
Compare to a perturbation method (OLAP, SIGMOD 2005 )
Experiments - Utility Use count query with workload = 1000
Experiments - Utility
Experiments - Efficiency
Conclusion
Any Question?