Abstract
To protect tabular data through cell suppression, efficient algorithms are essential. Gaussian elimination can be used for secondary cell suppression to prevent exact disclosure. A beneficial feature of this method is that all tables created from the same microdata can be handled simultaneously. This paper presents a solution to the issue where suppressed zeros in frequency tables cannot protect each other. In magnitude tables, it outlines how the algorithm can be tailored to provide protection against singleton contributors using their own data for disclosure.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bates, D., Maechler, M., Jagan, M.: Matrix: Sparse and Dense Matrix Classes and Methods (2024). https://CRAN.R-project.org/package=Matrix, r package version 1.7-0
European Commission: Commission regulation (EU) 2017/712 of 20 April 2017 on statistical data and metadata for population and housing censuses. Official Journal of the European Union (2017). https://eur-lex.europa.eu/eli/reg/2017/712/oj
Fischetti, M., Salazar, J.J.: Solving the cell suppression problem on tabular data with linear constraints. Manag. Sci. 47(7), 1008–1027 (2001). http://www.jstor.org/stable/822485
Hundepool, A., et al.: Statistical Disclosure Control. Wiley, Hoboken (2012). https://doi.org/10.1002/9781118348239.ch1
Langsrud, Ø.: Sparse model matrices for multidimensional hierarchical aggregation. R J. 15, 150–166 (2023). https://doi.org/10.32614/RJ-2023-088
Langsrud, Ø.: About the Norwegian Hypercubes for the 2021 Census (2024). https://github.com/statisticsnorway/sdc-census-2021-hypercubes
Langsrud, Ø., Bøvelstad, H.M.: Synthetic decimal numbers as a flexible tool for suppression of post-published tabular data. In: Domingo-Ferrer, J., Laurent, M. (eds.) Privacy in Statistical Databases, pp. 105–115. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-13945-1_8
Langsrud, Ø., Lupp, D.: GaussSuppression: Tabular Data Suppression using Gaussian Elimination (2024), https://CRAN.R-project.org/package=GaussSuppression, r package version 0.8.5
Langsrud, Ø., Lupp, D.: SSBtools: Statistics Norway’s Miscellaneous Tools (2024). https://CRAN.R-project.org/package=SSBtools, r package version 1.5.2
Lupp, D.P., Langsrud, Ø.: Suppression of directly-disclosive cells in frequency tables. In: Joint UNECE/Eurostat Expert Meeting on Statistical Data Confidentiality, Poznań, Poland, 1–3 December 2021 (2021)
Meindl, B.: sdcTable: Methods for Statistical Disclosure Control in Tabular Data (2023). https://CRAN.R-project.org/package=sdcTable, r package version 0.32.6
de Wolf, P.P., Hundepool, A., Giessing, S., Salazar, J.J., Castro, J.: tau-ARGUS user’s manual, version 4.1. Technical report, Statistics Netherlands (2014). https://github.com/sdcTools/tauargus
Acknowledgements
I would like to thank my colleague Vidar Norstein Klungre at Statistics Norway, as well as two anonymous reviewers, for their valuable comments that led to improvements.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix
Appendix
Figures 5 and 6 illustrate steps from two alternative eliminations starting from the matrix in Fig. 4. These figures are referred to in Sect. 5.
To illustrate the methods with realistic examples, we consider so-called hypercubes, according to the European Census 2021 [2]. We utilize data of 1,000,000 synthetic individuals based on Norwegian data, with a synthetic numeric variable and approximately 650,000 unique IDs added for use in magnitude tables [6]. Tables 4 and 5 provide examples of how the methods described in Sect. 4 and 5 affect the number of secondary suppressed cells. Each of the first three table groups, consisting of four, three, and three linked hierarchical tables, respectively, is analyzed using the GaussSuppression package [8]. These tables also indicate the total execution time, measured on a Linux server, from input microdata to final results.
In Table 4, illustrating the methods in Sect. 4, we chose to primary suppress ones, twos, and threes, while zeros remained unprotected and were considered structural. Thus, these methods address the suppression of ones, which, as described in Sect. 6, is analogous to handling zeros. This approach results in similar examples for frequency and magnitude tables.
The primary suppression of the magnitude tables in Table 5 is based on the \(p\%\)-rule [12], with \(p=5\). Note that in the GaussSuppression package, virtual cells are created by examining the X-matrix. This may differ somewhat from the Tau-Argus method described in Sect. 5.1. The beginning of Sect. 5 mentioned that both rows and columns of the X-matrix can be labeled with singleton contributor IDs. Generally, there may be IDs on rows without a matching column. Extra primary cells representing these missing IDs may be added so that all singleton contributions are considered sensitive. This is included in the singleton methods used in Table 5.
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Langsrud, Ø. (2024). Secondary Cell Suppression by Gaussian Elimination: An Algorithm Suitable for Handling Issues with Zeros and Singletons. In: Domingo-Ferrer, J., Önen, M. (eds) Privacy in Statistical Databases. PSD 2024. Lecture Notes in Computer Science, vol 14915. Springer, Cham. https://doi.org/10.1007/978-3-031-69651-0_6
Download citation
DOI: https://doi.org/10.1007/978-3-031-69651-0_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-69650-3
Online ISBN: 978-3-031-69651-0
eBook Packages: Computer ScienceComputer Science (R0)