Preservation of Proximity Privacy in Publishing Numerical Sensitive Data

Preservation of Proximity Privacy in Publishing Numerical Sensitive DataJ. Li, Y. Tao, and X. Xiao

SIGMOD 08

Presented by Hongwei Tian

Outline

What is PPDP Existing Privacy Principles

Proximity Attack (ε, m)-anonymity Determine ε and m Algorithm

Experiments and Conclusion

Privacy Preservation Data Publishing A true story in Massachusetts, 1997

GIC 20 dollars Governor Weld

PPDP

Anonymization Technique Generalization

Specific value -> General value Maintain the semantic meaning

78256 -> 7825*, UTSA -> University, 28 -> [20, 30]

Perturbation One value -> another random value Huge information loss -> poor utility

PPDP

Example of Generalization

Some Existing Privacy Principles Generalization

SA – Categorical k-anonymity l-diversity, (α, k)-anonymity, m-invariance, … (c, k)-safety, Skyline-privacy …

SA – Numerical (k, e)-anonymity, Variance Control t-closeness δ-presence …

Next…

What is PPDP Existing Privacy Principles

Proximity Attack (ε, m)-anonymity Determine ε and m Algorithm


Proximity Attack

(ε, m)-anonymity

I(t) private neighborhood of tuple t I(t) = [t.SA − ε, t.SA + ε] I(t) = [t.SA·(1 − ε), t.SA·(1 + ε)]

P(t) the risk of proximity breach of tuple t P(t) = x / |G|

(ε, m)-anonymity

ε = 20 I(t1) = [980, 1020] x = 3, |G| = 4 P(t1) = 3/4

(ε, m)-anonymity

Principle Given a real value ε and an integer m ≥ 1, a

generalized table T∗ fulfills absolute (relative) (ε,m)-anonymity, if

P(t) ≤ 1/m

for every tuple t T.∈

Larger ε and m mean stricter privacy requirement

(ε, m)-anonymity

What is the Meaning of m? |G| ≥ m The best situation is for any two tuples ti and tj in

G, and

Similar to l-diversity when the equivalence class has l tuples with distinct SA values.

(ε, m)-anonymity

How to make tj.SA does not fall in I(ti)? All tuples in G are sorted in ascending order of their

SA values

| j – i | max≧ { |left(tj,G)|, |right(ti,G)| }

(ε, m)-anonymity

Let maxsize(G) =

max t G∀ ∈ { max{ |left(t,G)|, |right(t,G)| } }

| j – i | ≧ maxsize(G)

(ε, m)-anonymity

(ε, m)-anonymity (6, 2)-anonymity

Privacy is breached P(t3)= ¾ >1/m =1/2

Need partitioning An ascending order is ready according to SA values g = maxsize(G) = 2 j = (i mod 2)+1 New P(t3)= 1/2

tupleNo QI SA

1 q 10

2 q 20

3 q 25

4 q 30

Determine ε and m

Algorithm

Step 1: Splitting Mondrain, ICDE 2006. Splitting is only based on QI-attributes

Iteratively find median value of frequency sets on one selected QI-dimension to cut G into G1 and G2, and make sure G1 and G2 are legal to be partitioned.

Algorithm Splitting ((6, 2)-anonymity)

20

30

25

10 40

50

Algorithm

Step 2: Partitioning After step 1 stops Check all G produced by splitting

Release directly if G satisfies (ε, m)-anonymity Otherwise, Partitioning, and then release new

buckets

Algorithm

Partitioning ((6, 2)-anonymity)

20

30

25

10 40

50

Next…

What is PPDP Evolution of Privacy Preservation Proximity Attack

(ε, m)-anonymity determine ε and m algorithm


Experiments

Real Database SAL http://ipums.org Attributes are Age, Birthplace, Occupation and

Income with domains [16,93], [1,710], [1,983], and [1k, 100k], respectively.

500K tuples

Compare to a perturbation method (OLAP, SIGMOD 2005 )

Experiments - Utility Use count query with workload = 1000

Experiments - Utility

Experiments - Efficiency

Conclusion

Any Question?

Preservation of Proximity Privacy in Publishing Numerical Sensitive Data

Documents

Transcript of Preservation of Proximity Privacy in Publishing Numerical Sensitive Data