Asymptotic Utility of Spectral Anonymization | SpringerLink
Skip to main content

Asymptotic Utility of Spectral Anonymization

  • Conference paper
  • First Online:
Privacy in Statistical Databases (PSD 2024)

Abstract

In the contemporary data landscape characterized by multi-source data collection and third-party sharing, ensuring individual privacy stands as a critical concern. While various anonymization methods exist, their utility preservation and privacy guarantees remain challenging to quantify. In this work, we address this gap by studying the utility and privacy of the spectral anonymization (SA) algorithm, particularly in an asymptotic framework. Unlike conventional anonymization methods that directly modify the original data, SA operates by perturbing the data in a spectral basis and subsequently reverting them to their original basis. Alongside the original version \(\mathcal {P}\)-SA, employing random permutation transformation, we introduce two novel SA variants: \(\mathcal {J}\)-spectral anonymization and \(\mathcal {O}\)-spectral anonymization, which employ sign-change and orthogonal matrix transformations, respectively. We show how well, under some practical assumptions, these SA algorithms preserve the first and second moments of the original data. Our results reveal, in particular, that the asymptotic efficiency of all three SA algorithms in covariance estimation is exactly 50% when compared to the original data. To assess the applicability of these asymptotic results in practice, we conduct a simulation study with finite data and also evaluate the privacy protection offered by these algorithms using distance-based record linkage. Our research reveals that while no method exhibits clear superiority in finite-sample utility, \(\mathcal {O}\)-SA distinguishes itself for its exceptional privacy preservation, never producing identical records, albeit with increased computational complexity. Conversely, \(\mathcal {P}\)-SA emerges as a computationally efficient alternative, demonstrating unmatched efficiency in mean estimation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 8465
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 10581
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Awan, J., Kenney, A., Reimherr, M., Slavković, A.: Benefits and pitfalls of the exponential mechanism with applications to Hilbert spaces and functional PCA. In: Proceedings of the 36th International Conference on Machine Learning, vol. 97, pp. 374—384. PMLR (2019)

    Google Scholar 

  2. Bishop, Y.M., Fienberg, S.E., Holland, P.W.: Discrete Multivariate Analysis: Theory and Practice. Springer, New York (2007)

    Google Scholar 

  3. Calviño, A., Aldeguer, P., Domingo-Ferrer, J.: Factor analysis for anonymization. In: 2017 IEEE International Conference on Data Mining Workshops (ICDMW), pp. 984–991 (2017)

    Google Scholar 

  4. Domingo-Ferrer, J., Torra, V.: Disclosure risk assessment in statistical data protection. J. Comput. Appl. Math. 164–165, 285–293 (2004)

    Article  MathSciNet  Google Scholar 

  5. Dunsche, M., Kutta, T., Dette, H.: Multivariate mean comparison under differential privacy. In: Domingo-Ferrer, J., Laurent, M. (eds.) PSD 2022. LNCS, vol. 13463, pp. 31–45. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-13945-1_3

    Chapter  Google Scholar 

  6. Dwork, C.: Differential privacy. In: Bugliesi, M., Preneel, B., Sassone, V., Wegener, I. (eds.) ICALP 2006. LNCS, vol. 4052, pp. 1–12. Springer, Heidelberg (2006). https://doi.org/10.1007/11787006_1

    Chapter  Google Scholar 

  7. Kollo, T., von Rosen, D.: Advanced Multivariate Statistics with Matrices. Springer, Dordrecht (2005)

    Book  Google Scholar 

  8. Kundu, S., Suthaharan, S.: Privacy-preserving predictive model using factor analysis for neuroscience applications. In: IEEE International Conference on Big Data Security on Cloud (BigDataSecurity), High Performance and Smart Computing (HPSC) and Intelligent Data and Security (IDS), pp. 67–73. IEEE (2019)

    Google Scholar 

  9. Lasko, T.A., Vinterbo, S.A.: Spectral anonymization of data. IEEE Trans. Knowl. Data Eng. 22(3), 437–446 (2009)

    Article  Google Scholar 

  10. Machanavajjhala, A., Kifer, D., Gehrke, J., Venkitasubramaniam, M.: l-diversity. ACM Trans. Knowl. Discov. Data 1, 3-es (2007)

    Google Scholar 

  11. Muralidhar, K., Sarathy, R.: Data shuffling-a new masking approach for numerical data. Manag. Sci. 52(5), 658–670 (2006)

    Article  Google Scholar 

  12. Ninghui, L., Tiancheng, L., Venkatasubramanian, S.: t-closeness: privacy beyond k-anonymity and l-diversity. In: Proceedings of the 23rd International Conference on Data Engineering, pp. 106–115 (2007)

    Google Scholar 

  13. Ross, N.: Fundamentals of Stein’s method. Probab. Surv. 8, 210–293 (2011)

    Article  MathSciNet  Google Scholar 

  14. Seeman, J., Reimherr, M., Slavković, A.: Exact privacy guarantees for Markov chain implementations of the exponential mechanism with artificial atoms. In: Advances in Neural Information Processing Systems, vol. 34, pp. 13125–13136. Curran Associates, Inc. (2021)

    Google Scholar 

  15. Shlomo, N., De Waal, T.: Protection of micro-data subject to edit constraints against statistical disclosure. J. Off. Stat. 24(2), 229–253 (2008)

    Google Scholar 

  16. Stewart, G.W.: The efficient generation of random orthogonal matrices with an application to condition estimators. SIAM J. Numer. Anal. 17(3), 403–409 (1980)

    Article  MathSciNet  Google Scholar 

  17. Sweeney, L.: k-anonymity: a model for protecting privacy. Int. J. Uncertain. Fuzziness Knowl.-Based Syst. 10, 557–570 (2002)

    Article  MathSciNet  Google Scholar 

  18. Tyler, D.E.: Radial estimates and the test for sphericity. Biometrika 69(2), 429–436 (1982)

    Article  MathSciNet  Google Scholar 

  19. Viana, M.A.: The covariance structure of random permutation matrices. Contemp. Math. 287, 303–326 (2001)

    Article  MathSciNet  Google Scholar 

  20. Xiao, H., Ye, Y., Devadas, S.: Local differential privacy in decentralized optimization. arXiv preprint (2019)

    Google Scholar 

Download references

Acknowledgments

The study was supported by the Finnish Cultural Foundation (grant 00220801) and the Research Council of Finland (grants 347501 and 353769). The authors would like to thank two anonymous reviewers for their valuable comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Katariina Perkonoja .

Editor information

Editors and Affiliations

Ethics declarations

Disclosure of Interests

The authors have no competing interests to declare that are relevant to the content of this article.

Appendices

A Proofs of Technical Results

Proof

of Theorem 1. We begin with the \(\mathcal {P}\)-SA. Denoting \(Y_n := X_{n, \mathcal {P}}\), we have \(\bar{y}_n = \frac{1}{n} Y_n' 1_n = \frac{1}{n} V_n D_n U_{n0}' 1_n + \bar{x}_n\). Observing now that \(U_{n0}' 1_n = U_{n}' 1_n\) and that \(V_n D_n U_{n}' 1_n = 0\), the desired result follows from the central limit theorem.

For the \(\mathcal {J}\)-SA, we first consider the case where the data come from \(\mathcal {N}_p(0, \varLambda )\) for some diagonal matrix \(\varLambda \) with strictly positive and mutually distinct diagonal elements. We then write

$$\begin{aligned} \sqrt{n} \bar{y}_n = \frac{1}{\sqrt{n}} Y_n' 1_n = \frac{1}{\sqrt{n}} V_n D_n U_{n0}' 1_n + \sqrt{n}\bar{x}_n. \end{aligned}$$
(3)

Now, the kth element of \(D_n U_{n0}' 1_n\) equals

$$\begin{aligned} d_{nk} u_{nk}' J_{nk} 1_n = e_k' D_n U_n' J_{nk} 1_n = e_k' V_n' X_n' H_n J_{nk} 1_n. \end{aligned}$$
(4)

We next show that \(X_n' H_n J_{nk} 1_n/\sqrt{n}\) admits a limiting distribution and is, as such, stochastically bounded. To see this, we let the ith diagonal element of \(J_{nk}\) be \(s_{n,i} \sim \textrm{Uniform}\{-1 ,1\}\) and write \(\frac{1}{\sqrt{n}} X_n' H_n J_{nk} 1_n = \frac{1}{\sqrt{n}} \sum _{i = 1}^n x_{n, i} s_{n,i} - \sqrt{n} \bar{x}_n \frac{1}{n} \sum _{i = 1}^n s_{n,i}\). Now, the first term has a limiting distribution (by CLT) and the second term is \(o_p(1)\) (by CLT and Slutsky’s lemma). Thus, fixing the sign of \(V_n\) such that \(V_n \rightarrow _p I_p\), we have, by (4) that \(\frac{1}{\sqrt{n}} d_{nk} u_{nk}' J_{nk} 1_n = \frac{1}{\sqrt{n}} e_k' X_n' H_n J_{nk} 1_n + o_p(1)\). Using the same in (3), we also obtain \(\sqrt{n} \bar{y}_n = \sqrt{n}\bar{x}_n + \frac{1}{\sqrt{n}} D_n U_{n0}' 1_n + o_p(1)\). This implies that the kth element of \(\sqrt{n} \bar{y}_n\) has the expansion \((\sqrt{n} \bar{y}_n)_k = \frac{1}{\sqrt{n}} \sum _{i = 1}^n x_{n, ik} (1 + s_{n,i}) + o_p(1)\). As \(x_{n, ik} (1 + s_{n, i})\), \(i = 1, \ldots , n\), are i.i.d. with mean zero and variance \(2 \textrm{Var}(x_{n, ik})\), and as \(x_{n, ik} (1 + s_{n, i})\) and \(x_{n, i\ell } (1 + s_{n, i})\) are uncorrelated for \(k \ne \ell \), the limiting distribution of \(\sqrt{n} \bar{y}_n\) is \(\mathcal {N}_p(0, 2 \varLambda )\).

Let now \(Z_n = X_n O' + 1_n \mu '\) for some \(p \times p\) orthogonal matrix O and \(\mu \in \mathbb {R}^p\). Then, \( Z_{n, \mathcal {P}} = (X_{n, \mathcal {P}} - 1_n \bar{x}_n')O' + 1_n (O \bar{x}_n + \mu )'\), thus implying that \( \frac{1}{n} Z_{n, \mathcal {P}}' 1_n = O \frac{1}{n} X_{n, \mathcal {P}}' 1_n - O \bar{x}_n + O \bar{x}_n + \mu \) and \( \sqrt{n} \left( \frac{1}{n} Z_{n, \mathcal {P}}' 1_n - \mu \right) = \sqrt{n} O \bar{y}_n \rightsquigarrow \mathcal {N}_p(0, 2 O \varLambda O')\). This completes the proof for \(\mathcal {J}\)-SA.

For \(\mathcal {O}\)-SA, the proof is similar to \(\mathcal {J}\)-SA, and we point out only the differences next. To see that \(X_n' H_n O_{nk} 1_n/\sqrt{n}\) is \(\mathcal {O}_p(1)\), we observe that \(O_{nk} 1_n \sim \sqrt{n} O_{nk} e_1 \sim \sqrt{n} u_n\), where \(u_n\) is uniform on the unit sphere \(\mathbb {S}^{n - 1}\). Moreover, letting \(\chi _n\) denote a random variable obeying a \(\chi \)-distribution with n degrees of freedom that is independent of \(u_n\), we have \(b_n := \chi _n u_n \sim \mathcal {N}_n(0, I_n)\) and \(\chi _n/\sqrt{n} \rightarrow _p 1\). Consequently, denoting a generic column of \(X_n\) by \(x_n \sim \mathcal {N}_n(0, \lambda I_n)\), we have \( (1/\sqrt{n}) x_n' H_n O_{nk} 1_n \sim x_n' H_n u_n = x_n' u_n - \bar{x}_n 1_n' u_n = b_n^{-1} (1/\sqrt{n}) x_n' z_n - \sqrt{n} \bar{x}_n \bar{z}_n b_n^{-1} \), showing that \(X_n' H_n O_{nk} 1_n/\sqrt{n}\) is stochastically bounded. Hence,

$$\begin{aligned} (\sqrt{n} \bar{y}_n)_k = ( \sqrt{n}\bar{x}_n )_k + b_n^{-1} \frac{1}{\sqrt{n}} e_k' X_n' z_n + o_p(1) = \frac{1}{\sqrt{n}} \sum _{i = 1}^n x_{n, ik} (1 + z_{n, i}) + o_p(1). \end{aligned}$$

The limiting distribution of this is \(\mathcal {N}(0, 2 \textrm{Var}(x_{n, ik}))\), and rest of the proof follows similarly as for \(\mathcal {J}\)-SA.

The following lemma is a direct consequence of Proposition 4.1 in [19].

Lemma 1

Let P be uniformly random \(n \times n\) permutation matrix. Then, \(\textrm{E}(P) = (1/n) 1_n 1_n'\), \(\textrm{E}\{ \textrm{tr}(P^2) \} = 2\) and \(\textrm{E}[\{ \textrm{tr}(P) \}^2] = 2\).

Proof

of Theorem 2. Starting again with the case of \(\mathcal {N}_p(0, \varLambda )\)-distributed data and using the notation of the proof of Theorem 1, we have

$$\begin{aligned} \sqrt{n} (S_{n, \mathcal {P}} - \varLambda ) = \sqrt{n} \left( \frac{1}{n} Y_n' Y_n - \varLambda - \bar{y}_n \bar{y}_n' \right) = \sqrt{n} \left( \frac{1}{n} Y_n' Y_n - \varLambda \right) + o_p(1), \end{aligned}$$

as \(\sqrt{n} \bar{y}_n = \mathcal {O}_p(1)\). From Theorem 1 we have \(V_n D_n U_{n0}' 1_n \bar{x}_n/\sqrt{n} = o_p(1)\), giving

$$\begin{aligned} \begin{aligned} \sqrt{n} (S_{n, \mathcal {P}} - \varLambda ) =& \sqrt{n} \left( \frac{1}{n} V_n D_n U_{n0}' U_{n0} D_n V_n' - \varLambda \right) + o_p(1), \\ =& \sqrt{n} \left( \frac{1}{n} X_n' H_n X_n - \varLambda \right) + \frac{1}{\sqrt{n}} V_n D_n G_n D_n V_n' + o_p(1), \\ =& \sqrt{n} \left( \frac{1}{n} X_n' X_n - \varLambda \right) + \frac{1}{\sqrt{n}} V_n D_n G_n D_n V_n' + o_p(1), \end{aligned} \end{aligned}$$
(5)

where \(G_n \in \mathbb {R}^{p \times p}\) has \(g_{n, kk} = 0\), \(g_{n, k\ell } = u_{nk}' T_{nk}' T_{n\ell } u_{n \ell }\), where \(T_{nk}\) are either permutation, sign-change or orthogonal matrices, depending on the type of SA. We next show that, for \(k \ne \ell \), the quantity \(f_{n, k\ell } := d_{nk} d_{n\ell } g_{n, k\ell }/\sqrt{n}\) is \(\mathcal {O}_p(1)\).

For \(\mathcal {P}\)-SA, this is seen by writing \( \sqrt{n} f_{n, k\ell } = e_k' D_n U_n' P_{nk}' P_{n\ell } U_n D_n e_\ell = e_k' V_n' X_n' H_n P_{nk}' P_{n\ell } H_n X_n V_n e_\ell = e_k' V_n' (X_n' P_{nk}' P_{n\ell } X_n) V_n e_\ell - n e_k' V_n' (\bar{x}_n \bar{x}_n') V_n e_\ell \), giving \( f_{n, k\ell } = n^{-1/2} e_k' V_n' (X_n' P_{nk}' P_{n\ell } X_n) V_n e_\ell + o_p(1) = n^{-1/2} x_{n, k}' P_{nk}' P_{n\ell } x_{n, \ell } + o_p(1) \), where \(x_{n, k}\) is the kth column of \(X_n\). For the second equality, we have used \( \frac{1}{\sqrt{n}} X_n' P_{nk}' P_{n\ell } X_n = \mathcal {O}_p(1) \) which holds for the off-diagonal elements by the independence of \(x_{n, k}\) and \(x_{n, \ell }\). For the diagonal elements, the boundedness is equivalent to \(x_n' P_n x_n/\sqrt{n} = \mathcal {O}_p(1)\) when \(x_n \sim \mathcal {N}_n(0, I_n)\) and \(P_n\) is a uniformly random \(n \times n\) permutation matrix. Since \(\textrm{E}(x_n' P_n x_n) = 0\), by [2, Theorem 14.4-1],

$$\begin{aligned} \frac{1}{\sqrt{n}} x_n' P_n x_n = \frac{1}{\sqrt{n}} \textrm{SD} (x_n' P_n x_n) \cdot \mathcal {O}_p(1). \end{aligned}$$
(6)

Now, \(\textrm{E}(x_n x_n' P_n x_n x_n'|P_n) = \textrm{tr}(P_n) I_n + P_n + P_n'\), giving, using Lemma 1, that \(\textrm{Var}(x_n' P_n x_n) = \textrm{E}[\{ \textrm{tr}(P_n) \}^2] + \textrm{E} \{ \textrm{tr}(P_n^2) \} + n = n + 4\). Plugging in to (6), the stochastic boundedness of \(X_n' P_{nk}' P_{n\ell } X_n/\sqrt{n}\) and \(f_{n, k\ell }\) now follows.

For \(\mathcal {J}\)-SA, the reasoning is similar but uses the facts that \(\frac{1}{n} X_n' J_{nk} J_{n\ell } 1_n = o_p(1)\) and \(\frac{1}{n} 1_n' J_{nk} J_{n\ell } 1_n = o_p(1)\) where the former holds because \(J_{n\ell } J_{nk} X_n \sim X_n\). Furthermore, instead of Lemma 1, we use \(\textrm{tr}(J_n^2) = n\) and \(\textrm{E}[\{ \textrm{tr}(J_n) \}^2] = n\).

For \(\mathcal {O}\)-SA, we similarly have that \(\frac{1}{n} X_n' O_{nk} O_{n\ell } 1_n = o_p(1)\) and \(\frac{1}{n} 1_n' O_{nk} O_{n\ell } 1_n = o_p(1)\), since \(O_{n\ell } O_{nk} X_n \sim X_n\) and since, reasoning as in the proof of Theorem 1, we have \(\frac{1}{n} 1_n' O_{nk} O_{n\ell } 1_n = \frac{1}{n} z_{n1}' z_{n2} + o_p(1)\), where \(z_{n1}, z_{n2} \sim \mathcal {N}_n(0, I_n)\) are independent of each other. Also, instead of Lemma 1, we use for \(\mathcal {O}\)-SA the bounds \(\textrm{tr}(O_n^2) \le n\) and \(\textrm{E}\{ ( P_{n, 11} + \cdots + P_{n,nn} )^2 \} \le \{ [\textrm{E}(P_{n, 11}^2)]^{1/2} + \cdots + [\textrm{E}(P_{n, nn}^2)]^{1/2} \}^2\), together with \(\textrm{E}(P_{n, ii}) = 0\) and \(\textrm{E}(P_{n, ii}^2) = 1/n\).

Thus, for every form of SA, by (5), we have \( \sqrt{n} (S_{n, \mathcal {P}} - \varLambda ) = \sqrt{n} ( \frac{1}{n} X_n' X_n - \varLambda ) + \frac{1}{\sqrt{n}} D_n G_n D_n + o_p(1) \). Writing out the individual elements of \(B_n := \sqrt{n} (S_{n, \mathcal {P}} - \varLambda )\), the diagonal and off-diagonal thus read

$$\begin{aligned} b_{n, kk} &= \sqrt{n} \left( \frac{1}{n} x_{n, k}' x_{n, k} - \lambda _k \right) + o_p(1), \end{aligned}$$
(7)
$$\begin{aligned} b_{n, k\ell } &= \frac{1}{\sqrt{n}} x_{n, k}' (I_n + T_{nk}' T_{n\ell }) x_{n, \ell } + o_p(1). \end{aligned}$$
(8)

What remains now is to show that \(b_{n, kk}, b_{n, k\ell }\), \(k, \ell = 1, \ldots , p\), \(k \ne \ell \), have a limiting joint normal distribution with the desired covariance matrix. We begin by computing the covariance matrix. For \(\mathcal {P}\)-SA, direct computation gives \(\textrm{Var}(b_{n, kk}) = 2 \lambda _k^2\) and \(\textrm{Var}(b_{n, k\ell }) = 2 \lambda _k \lambda _\ell \{ 1 + (1/n) \textrm{Etr}(P_{nk}' P_{n\ell }) \} = 2 (1 + 1/n) \lambda _k \lambda _\ell = 2 \lambda _k \lambda _\ell + o(1)\), where we have used the fact that \(\textrm{E}(P_{nk}) = 1_n 1_n'/n\). Similarly, we obtain the following non-zero covariances (rest of the covariances are zero): \(\textrm{Cov}(b_{n, k \ell }, b_{n, \ell k}) = \textrm{Var}(b_{n, k\ell }) = 2 \lambda _k \lambda _\ell + o(1)\). Analogous computation reveals that the same asymptotic variances and covariances are obtained also for \(\mathcal {J}\)-SA and \(\mathcal {O}\)-SA. Consequently, in each case, the limiting covariance matrix of \(\sqrt{n} \textrm{vec} (S_{n, \mathcal {P}} - \varLambda )\) is \((\varLambda ^{1/2} \otimes \varLambda ^{1/2})[2 I_{p^2} + 2 K_{p,p} - 2 \textrm{diag} \{ \textrm{vec}(I_p) \} ](\varLambda ^{1/2} \otimes \varLambda ^{1/2}) \). The form for a general normal distribution now follows with equivariance arguments as in the proof of Theorem 1. That the moments of the limiting distribution actually are the limits of the moments we computed earlier, is guaranteed by showing that \(b_{n, kk}\) and \(b_{n, k\ell }\) are uniformly integrable. E.g., for \(b_{n, k\ell }\), this can be done by denoting \(A := (1/\sqrt{n}) x_{n, k}' x_{n, \ell }\) and \(B := (1/\sqrt{n}) x_{n, k}' T_{nk}' T_{n\ell } x_{n, \ell }\), using Young’s inequality to bound \(\textrm{E}\{(A + B)^4\} \le a \textrm{E}(A^4) + b \textrm{E}(B^4)\) for some scalars \(a, b \in \mathbb {R}\) and observing that \(T_{nk} x_{n, k} \sim \mathcal {N}_n(0, I_n)\) for each form of SA. The boundedness of \(\textrm{E}(A^4)\) and \(\textrm{E}(B^4)\) now follows from the moments of the Gaussian distribution.

Next, we show that the limiting distribution exists. By the equivariance properties of the multivariate normal, it is sufficient to consider only the case \(\varLambda = I_p\). We begin with \(\mathcal {O}\)-SA. Letting \(Q_{n1}, \ldots , Q_{np}\) be random orthogonal matrices uniform from the Haar distribution (and independent of all our other random variables), we have \(x_{n,k} \sim Q_{nk} x_{n, k}\). Hence, (7) and (8) can be written as \(b_{n, kk} = \sqrt{n} \{ (1/n) x_{n, k}' x_{n, k} - 1 \} + o_p(1)\) and \(b_{n, k\ell } = (1/\sqrt{n}) x_{n, k}' (Q_{nk}' Q_{n\ell } + R_{nk}' R_{n\ell }) x_{n, \ell } + o_p(1)\) where \(R_{nk} := O_{nk} Q_{nk}\) is Haar-uniformly distributed orthogonal matrix and independent of \(Q_{nk}\). We then decompose \(x_{n,k} = \chi _{nk} u_{nk}\) where \(\chi _{nk} \sim \chi _n\) and \(u_{nk}\) is uniform on the unit sphere and let \(\chi _{nk1}, \chi _{nk2}\) be independent \(\chi _n\)-random variables. Using these we write \(b_{n, kk}\) and \(b_{n, k\ell }\) as \( b_{n, kk} = \sqrt{n} \{ (1/n) (\chi _{nk} u_{nk})' (\chi _{nk} u_{nk}) - 1 \} + o_p(1) \) and as

$$\begin{aligned} b_{n, k\ell } &= \frac{\chi _{nk}\chi _{n\ell }}{\chi _{nk1}\chi _{n\ell 1}} \frac{1}{\sqrt{n}} (\chi _{nk1} Q_{nk} u_{nk})' (\chi _{n\ell 1} Q_{n\ell } u_{n\ell }) \\ &+ \frac{\chi _{nk}\chi _{n\ell }}{\chi _{nk2}\chi _{n\ell 2}} \frac{1}{\sqrt{n}} (\chi _{nk2} R_{nk} u_{nk})' (\chi _{n\ell 2} R_{n\ell } u_{n\ell }) + o_p(1). \end{aligned}$$

Now, above \((\chi _{nk}\chi _{n\ell })/(\chi _{nk1}\chi _{n\ell 1}) \rightarrow _p 1\) and \((\chi _{nk}\chi _{n\ell })/(\chi _{nk2}\chi _{n\ell 2}) \rightarrow _p 1\). Moreover, \(\chi _{nk} u_{nk}\), \(\chi _{nk1} Q_{nk} u_{nk}\) and \(\chi _{nk2} R_{nk} u_{nk}\) are independent of each other and all obey the \(\mathcal {N}_n(0, I_n)\)-distribution. Hence, by CLT and Slutsky’s lemma, the desired joint limiting normal distribution for \(b_{n, kk}\) and \(b_{n, k\ell }\), \(k, \ell = 1, \ldots , p\), \(k \ne \ell \), is obtained in the case of \(\mathcal {O}\)-SA.

For \(\mathcal {J}\)-SA, joint limiting normality follows from the multivariate CLT, as \(J_{n1}, \ldots , J_{np}\) do not “mix” the observations and retain their i.i.d. nature.

What is thus left is \(\mathcal {P}\)-SA. In that case, we use the Cramér-Wold device combined with the CLT for local dependency neighbourhoods [13, Theorem 3.6]. We demonstrate the proof below in the case \(p = 2\), the general case following analogously (but with more cluttered notation). That is, the claim holds for \(p = 2\) once we show that \( S_n := a_{11} \sqrt{n} ( \frac{1}{n} x' x - 1 ) + a_{12} \frac{1}{\sqrt{n}} x' (I_n + P) y + a_{22} \sqrt{n} ( \frac{1}{n} y' y - 1 ) \) admits a limiting normal distribution where \(a_{11}, a_{12}, a_{22}\) are arbitrary constants, P is a uniformly random \(n \times n\) orthogonal matrix and \(x, y \sim \mathcal {N}_n(0, I_n)\) are independent of each other and P. Our objective is to use [13, Theorem 3.6], where we will take (using their notation) \(X_i = a_{11} (x_i^2 - 1)/\sqrt{n} + a_{12} x_i (y_i + y_{\delta (i)})/\sqrt{n} + a_{22} (y_i^2 - 1)/\sqrt{n}\), where \(\delta \) is the permutation mapping corresponding to the permutation matrix P, and \(\sigma ^2 = \textrm{Var}(S_n) = 2 a_{11} + 2 a_{12} (1 + 1/n) + 2 a_{22}\).

Inspection of the proof of their Theorem 3.6 reveals that the dependency neighbourhood \(N_i\) can be taken to be random (as they are for us), as long as their maximal degree is almost surely some finite D (as it is for us, since \(x_i (y_i + y_{\delta (i)})\) depends on maximally two other terms), when specific modifications to the proof/result are done: (i) The first term on the RHS of the statement of Theorem 3.6 can be kept as such, but when deriving it, the sums of the form “\(\sum _{j \in N_i}\)” have to be replaced with “\(\sum _{j = 1}^n \mathbb {I}(j \in N_i)\)”, since \(N_i\) are random. (ii) In their formula (3.12), we decompose the term \(\textrm{Var} ( \sum _{i = 1}^n \sum _{j \in N_i} X_i X_j )\) as

$$\begin{aligned} & \textrm{E} \left\{ \textrm{Var} \left( \sum _{i = 1}^n \sum _{j \in N_i} X_i X_j \mid P \right) \right\} + \textrm{Var} \left\{ \textrm{E} \left( \sum _{i = 1}^n \sum _{j \in N_i} X_i X_j \mid P \right) \right\} \\ \le & 14 D^3 \sum _{i = 1}^n \textrm{E}(X_i^4) + \textrm{Var}\{ \textrm{Var}(S_n \mid P) \}, \end{aligned}$$

where the inequality uses the variance bound derived in the proof of [13, Theorem 3.6] (for our conditioned-upon dependency neighborhoods) and the second term uses the fact that \(\textrm{Var}(\sum _{i = 1}^n X_i \mid P) = \textrm{E} ( \sum _{i = 1}^n \sum _{j \in N_i} X_i X_j \mid P )\).

Now, as D is fixed, as \(\sigma ^2\) approaches a non-zero constant when \(n \rightarrow \infty \), and as the \(X_i\) are identically distributed, the desired claim holds once we show that

$$\begin{aligned} \textrm{E} |X_1|^3 = o(1/n), \quad \textrm{E} X_1^4 = o(1/n) \quad \text{ and } \quad \textrm{Var}\{ \textrm{Var}(S_n \mid P) \} = o(1). \end{aligned}$$
(9)

For the third claim in (9), we have \(\textrm{Var}(S_n \mid P) = 2 a_{11} + 2 a_{12} \{ 1 + (1/n) \textrm{tr}(P) \} + 2 a_{22}\). Hence,

$$\begin{aligned} \textrm{Var}\{ \textrm{Var}(S_n \mid P) \} = \frac{4 a_{12}^2}{n^2} \textrm{Var} \{ \textrm{tr}(P) \} \le \frac{4 a_{12}^2}{n^2} \textrm{E} [ \{ \textrm{tr}(P) \}^2 ] = \frac{8 a_{12}^2}{n^2}, \end{aligned}$$

where we use Lemma 1, taking care of the third claim. Then, as \(X_1\) equals \(1/\sqrt{n}\) times a random variable with all moments finite, \( X_1 = (1/\sqrt{n}) \{ a_{11} (x_1^2 - 1) + a_{12} x_1 (y_1 + y_{\sigma (i)}) + a_{22} (y_1^2 - 1) \} \), the first two claims in (9) also trivially hold. This concludes the proof in the case \(p = 2\) and the general case is handled analogously.

B Additional Figures

Figure 6 shows the results of the utility simulation study in Sect. 3.1 for Poisson-distributed data. The top (bottom) row of Fig. 6 is analogous to Fig. 2 (Fig. 3).

Fig. 6.
figure 6

Relative error of the empirical covariance matrices of sample mean (solid line) and sample covariance (dotted line) compared to their asymptotic covariance matrices for \(\mathcal {P}\)-SA (blue), \(\mathcal {J}\)-SA (orange), and \(\mathcal {O}\)-SA (yellow). In the top row, the original data (green/darkest) is sampled from a Poisson distribution meeting Assumption 1 while on the second row the assumption is violated. Sample size is on a logarithmic scale. (Color figure online)

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Perkonoja, K., Virta, J. (2024). Asymptotic Utility of Spectral Anonymization. In: Domingo-Ferrer, J., Önen, M. (eds) Privacy in Statistical Databases. PSD 2024. Lecture Notes in Computer Science, vol 14915. Springer, Cham. https://doi.org/10.1007/978-3-031-69651-0_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-69651-0_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-69650-3

  • Online ISBN: 978-3-031-69651-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics