We will consider the following problem in this paper: Assume that there are \(n\) numerical data \(\{x_1,x_2,\ldots,x_n\}\) (like salaries of \(n\) individuals) stored in a database and some subsums of these numbers are made public or just available for persons not eligible to learn the original data. Our motivating question is: At most how many of these subsums may be disclosed such that none of the numbers \(x_1,x_2,\ldots,x_n\) can be uniquely determined from these sums. These types of problems arise in the cases when certain tasks concerning a database are done by subcontractors who are not eligible to learn the elements of the database, but naturally should be given some data to fulfill there task. In database theory such examples are called statistical databases as they are used for statistical purposes and no individual data are supposed to be obtained using a restricted list of SUM queries. This problem was originally introduced by [1], originally solved by Miller et al. [7] and revisited by Griggs [4, 5]. It was shown in [7] that no more than \({n\choose n/2}\) subsums of a given set of secure data may be disclosed without disclosing at least one of the data, which upper bound is sharp as well. To calculate a subsum, it might need some operations whose number is limited. This is why it is natural to assume that the disclosed subsums of the original elements of the database will contain only a limited number of elements, say at most \(k\). The goal of the present paper is to determine the maximum number of subsums of size at most \(k\) which can be disclosed without making possible to calculate any of the individual data \(x_i\). The maximum is exactly determined for the case when the number of data is much larger than the size restriction \(k\).
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
F.Y. Chin, G. Ozsoyoglu, Auditing and inference control in statistical databases, IEEE Transactions on Software Engineering SE-8 (1982) 574–582.
D.E. Denning, Cryptography and Data Security (Addison-Wesley, Sydney, 1982).
D.E. Denning and J. Schlorer, Inference controls for statistical databases, Computer (1983), 69–82.
J.R. Griggs, Concentrating subset sums at \(k\) points, Bulletin of the Institute of Combinatorics and its Applications 20 (1997) 65–74.
J.R. Griggs, Database security and the distribution of subset sums in \({\bf R}^m\), in: “Graph Theory and Combinatorial Biology,” BSMS 7, eds. L. Lovász et al. (Bolyai Society, Budapest, 1999).
J. Littlewood and C. Offord, On the number of real roots of a random algebraic equation III, Mat. Sbornik 12 (1943) 277–285.
M. Miller, I. Roberts and I. Simpson, Application of symnmetric chains to an optimization problem in the security of statistical databases, Bulletin ICA 2 (1991) 47–58.
M. Miller, I. Roberts, I. Simpson, Preventation of relative compromise in statistical databases using Audit Expert, Bulletin ICA 10 (1994) 51–62.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Demetrovics, J., Katona, G.O.H. & Miklós, D. On the security of individual data. Ann Math Artif Intell 46, 98–113 (2006). https://doi.org/10.1007/s10472-005-9014-x
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10472-005-9014-x