Secondary Cell Suppression by Gaussian Elimination: An Algorithm Suitable for Handling Issues with Zeros and Singletons | SpringerLink
Skip to main content

Secondary Cell Suppression by Gaussian Elimination: An Algorithm Suitable for Handling Issues with Zeros and Singletons

  • Conference paper
  • First Online:
Privacy in Statistical Databases (PSD 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14915))

Included in the following conference series:

  • 341 Accesses

Abstract

To protect tabular data through cell suppression, efficient algorithms are essential. Gaussian elimination can be used for secondary cell suppression to prevent exact disclosure. A beneficial feature of this method is that all tables created from the same microdata can be handled simultaneously. This paper presents a solution to the issue where suppressed zeros in frequency tables cannot protect each other. In magnitude tables, it outlines how the algorithm can be tailored to provide protection against singleton contributors using their own data for disclosure.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 8465
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 10581
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Bates, D., Maechler, M., Jagan, M.: Matrix: Sparse and Dense Matrix Classes and Methods (2024). https://CRAN.R-project.org/package=Matrix, r package version 1.7-0

  2. European Commission: Commission regulation (EU) 2017/712 of 20 April 2017 on statistical data and metadata for population and housing censuses. Official Journal of the European Union (2017). https://eur-lex.europa.eu/eli/reg/2017/712/oj

  3. Fischetti, M., Salazar, J.J.: Solving the cell suppression problem on tabular data with linear constraints. Manag. Sci. 47(7), 1008–1027 (2001). http://www.jstor.org/stable/822485

  4. Hundepool, A., et al.: Statistical Disclosure Control. Wiley, Hoboken (2012). https://doi.org/10.1002/9781118348239.ch1

    Book  Google Scholar 

  5. Langsrud, Ø.: Sparse model matrices for multidimensional hierarchical aggregation. R J. 15, 150–166 (2023). https://doi.org/10.32614/RJ-2023-088

    Article  Google Scholar 

  6. Langsrud, Ø.: About the Norwegian Hypercubes for the 2021 Census (2024). https://github.com/statisticsnorway/sdc-census-2021-hypercubes

  7. Langsrud, Ø., Bøvelstad, H.M.: Synthetic decimal numbers as a flexible tool for suppression of post-published tabular data. In: Domingo-Ferrer, J., Laurent, M. (eds.) Privacy in Statistical Databases, pp. 105–115. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-13945-1_8

    Chapter  Google Scholar 

  8. Langsrud, Ø., Lupp, D.: GaussSuppression: Tabular Data Suppression using Gaussian Elimination (2024), https://CRAN.R-project.org/package=GaussSuppression, r package version 0.8.5

  9. Langsrud, Ø., Lupp, D.: SSBtools: Statistics Norway’s Miscellaneous Tools (2024). https://CRAN.R-project.org/package=SSBtools, r package version 1.5.2

  10. Lupp, D.P., Langsrud, Ø.: Suppression of directly-disclosive cells in frequency tables. In: Joint UNECE/Eurostat Expert Meeting on Statistical Data Confidentiality, Poznań, Poland, 1–3 December 2021 (2021)

    Google Scholar 

  11. Meindl, B.: sdcTable: Methods for Statistical Disclosure Control in Tabular Data (2023). https://CRAN.R-project.org/package=sdcTable, r package version 0.32.6

  12. de Wolf, P.P., Hundepool, A., Giessing, S., Salazar, J.J., Castro, J.: tau-ARGUS user’s manual, version 4.1. Technical report, Statistics Netherlands (2014). https://github.com/sdcTools/tauargus

Download references

Acknowledgements

I would like to thank my colleague Vidar Norstein Klungre at Statistics Norway, as well as two anonymous reviewers, for their valuable comments that led to improvements.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Øyvind Langsrud .

Editor information

Editors and Affiliations

Appendix

Appendix

Figures 5 and 6 illustrate steps from two alternative eliminations starting from the matrix in Fig. 4. These figures are referred to in Sect. 5.

Fig. 5.
figure 5

Some Gaussian Elimination steps, including the last one, conducted with the matrix in Fig. 4 as a starting point. The treatment of singletons here leads to secondary suppression marked as \(\circ \) in Table 3.

Fig. 6.
figure 6

Some Gaussian Elimination steps, including the last one, conducted with the matrix in Fig. 4 as a starting point. Singletons are handled by two parallel eliminations, but the figure only shows the elimination sequence from one of these. The result is the secondary suppressed cells marked as \(\#\) in Table 3. This figure also illustrates the singleton method which considers combinations. Then the process stops after step 5 and the secondary cells are those marked with \(\triangle \) in Table 3.

Table 4. Synthetic data examples illustrating methods for handling ones, equivalent to methods for handling zeros. Each of the first three hypercube groups, according to the European Census 2021, is considered.
Table 5. Synthetic data examples illustrating methods for handling singletons. Each of the first three hypercube groups, according to the European Census 2021, is considered.

To illustrate the methods with realistic examples, we consider so-called hypercubes, according to the European Census 2021 [2]. We utilize data of 1,000,000 synthetic individuals based on Norwegian data, with a synthetic numeric variable and approximately 650,000 unique IDs added for use in magnitude tables [6]. Tables 4 and 5 provide examples of how the methods described in Sect. 4 and 5 affect the number of secondary suppressed cells. Each of the first three table groups, consisting of four, three, and three linked hierarchical tables, respectively, is analyzed using the GaussSuppression package [8]. These tables also indicate the total execution time, measured on a Linux server, from input microdata to final results.

In Table 4, illustrating the methods in Sect. 4, we chose to primary suppress ones, twos, and threes, while zeros remained unprotected and were considered structural. Thus, these methods address the suppression of ones, which, as described in Sect. 6, is analogous to handling zeros. This approach results in similar examples for frequency and magnitude tables.

The primary suppression of the magnitude tables in Table 5 is based on the \(p\%\)-rule [12], with \(p=5\). Note that in the GaussSuppression package, virtual cells are created by examining the X-matrix. This may differ somewhat from the Tau-Argus method described in Sect. 5.1. The beginning of Sect. 5 mentioned that both rows and columns of the X-matrix can be labeled with singleton contributor IDs. Generally, there may be IDs on rows without a matching column. Extra primary cells representing these missing IDs may be added so that all singleton contributions are considered sensitive. This is included in the singleton methods used in Table 5.

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Langsrud, Ø. (2024). Secondary Cell Suppression by Gaussian Elimination: An Algorithm Suitable for Handling Issues with Zeros and Singletons. In: Domingo-Ferrer, J., Önen, M. (eds) Privacy in Statistical Databases. PSD 2024. Lecture Notes in Computer Science, vol 14915. Springer, Cham. https://doi.org/10.1007/978-3-031-69651-0_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-69651-0_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-69650-3

  • Online ISBN: 978-3-031-69651-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics