Abstract
Although there are a number of anonymization techniques in the microdata publication, two problems remain: (1) the privacy breaches with auxiliary knowledge; (2) the large information losses during the anonymization. We establish the requirement of presence anonymity and propose the two-step process of synthesizing, consisting of learning a model from the original data, and then sampling a published version with it, which has the similar statistical characteristics and includes fake records. The advantage is that it prevents the auxiliary knowledge attacks as well as enables researchers get correct or approximately correct conclusions. Furthermore, its effectiveness is proved through extensive experiments.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Sweeney, L.: k-anonymity: a model for protecting privacy. Int. J. Uncertain. Fuzziness Knowl.-Based Syst. 10, 557–570 (2002)
Xiao, X., Tao, Y.: Anatomy: Simple and effective privacy preservation. In: VLDB, pp. 139–150 (2006)
Machanavajjhala, A., Gehrke, J., Kifer, D., Venkitasubramaniam, M.: ℓ-diversity-diversity: Privacy beyond k-anonymity. In: ICDE, p. 24 (2006)
Koudas, N., Srivastava, D., Yu, T., Zhang, Q.: Distribution-based microdata anonymization. PVLDB 2, 958–969 (2009)
Li, N., Li, T., Venkatasubramanian, S.: t-closeness: Privacy beyond k-anonymity and l-diversity. In: ICDE, pp. 106–115 (2007)
Ganta, S.R., Kasiviswanathan, S.P., Smith, A.: Composition attacks and auxiliary information in data privacy. In: KDD’08, pp. 265–273. ACM, New York (2008)
Xiao, X., Tao, Y.: M-invariance: towards privacy preserving re-publication of dynamic datasets. In: SIGMOD Conference, pp. 689–700 (2007)
Martin, D.J., Kifer, D., Machanavajjhala, A., Gehrke, J., Halpern, J.Y.: Worst-case background knowledge for privacy-preserving data publishing. In: ICDE, pp. 126–135 (2007)
Chen, B.C., Ramakrishnan, R., LeFevre, K.: Privacy skyline: Privacy with multidimensional adversarial knowledge. In: VLDB, pp. 770–781 (2007)
Brickell, J., Shmatikov, V.: The cost of privacy: destruction of data-mining utility in anonymized data publishing. In: KDD’08, pp. 70–78. ACM, New York (2008)
Dwork, C.: Differential privacy. In: Bugliesi, M., Preneel, B., Sassone, V., Wegener, I. (eds.) ICALP 2006. LNCS, vol. 4052, pp. 1–12. Springer, Heidelberg (2006)
Samarati, P.: Protecting respondents’ identities in microdata release. IEEE Transactions on Knowledge and Data Engineering 13 (2001)
Kifer, D., Gehrke, J.: Injecting utility into anonymized datasets. In: SIGMOD Conference, pp. 217–228 (2006)
Raghunathan, T., Reiter, J., Rubin, D.: Multiple imputation for statistical disclosure limitation. Journal of Official Statistics (2003)
Woodcock, S.D., Benedetto, G.: Distribution-preserving statistical disclosure limitation. Comput. Stat. Data Anal. 53, 4228–4242 (2009)
Nadaraya, E.A.: On estimating regression. Theory of Probability and its Applications 9, 141–142 (1964)
Wolf, M.: Nonparametric econometrics: Theory and practice. qi li and jeffrey scott racine. Journal of the American Statistical Association 103, 885–886 (2008)
Trenkler, G.: Statistical distributions. Computational Statistics & Data Analysis 19, 483–484 (1995)
Hundepool, A., Willenborg, L.: μ- and τ-argus: Software for statistical disclosure control. In: Third Int’l Seminar Statistical Confidentiality (1997)
Meyerson, A., Williams, R.: On the complexity of optimal k-anonymity. In: PODS, pp. 223–228 (2004)
Bhattacharyya, A.: On a measure of divergence between two statistical populations defined by their probability distributions. Bulletin of the Calcutta Mathematical Society 35, 99–109 (1943)
Han, J.: Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers Inc., San Francisco (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Gu, J., Chen, Y., Fu, J., Peng, H., Ye, X. (2010). Synthesizing: Art of Anonymization. In: Bringas, P.G., Hameurlain, A., Quirchmayr, G. (eds) Database and Expert Systems Applications. DEXA 2010. Lecture Notes in Computer Science, vol 6261. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15364-8_33
Download citation
DOI: https://doi.org/10.1007/978-3-642-15364-8_33
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15363-1
Online ISBN: 978-3-642-15364-8
eBook Packages: Computer ScienceComputer Science (R0)