Abstract
With more and more electronic information sources becoming widely available, the issue of the quality of these often-competing sources has become germane. We propose a standard for specifying the quality of databases, which is based on the dual concepts of data soundness and data completeness. The relational model of data is extended by associating a quality specification with each relation instance, and by extending its algebra to calculate the quality specifications of derived relation instances. This provides a method for calculating the quality of answers to arbitrary queries from the overall quality specification of the database. We show practical methods for estimating the initial quality specifications of given databases, and we report on experiments that test the validity of our methods. Finally, we describe how quality estimations are being applied in the Multiplex multidatabase system to resolve cross-database inconsistencies.
Preview
Unable to display preview. Download preview PDF.
References
Bort, J.: Scrubbing dirty data. Info World, 17(51), December 1995.
Breiman, L., Friedman, J., Olshen, R., and Stone, Ch.: Classification and Regression Trees. Wadsworth International Group, 1984.
Fox, C., Levitin, A., and Redman, T.: The notion of data and its quality dimensions. Information processing and management, 30(1), 1994.
Chen, M. C., McNamee, L., and Matloff, N.: Selectivity estimation using homogeneity measurement. Proceeding of the International Conference on Data Engineering, 1990.
Hurson, A.R., Bright, M.W., Pakzad, S.: Multidatabases: An Advanced Solution to Global Information Sharing, IEEE Computer Society Press, 1993.
Motro, A.: Integrity = validity + completeness. ACM Transactions on Database Systems, 14(4):480–502, December 1989.
Motro, A: Multiplex: A Formal Model for Multidatabases and Its Implementation. Technical Report ISSE-TR-95-103, Department of Information and Software Engineering, George Mason University, March 1995.
Motro, A., Rakov, I: Not all answers are equally good: Estimating the quality of database answers. In Flexible Answering Systems (T. Andreasen, H. Christiansen, and H.L. Larsen, Editors), Kluwer Academic Publishers, 1997, 1–21.
Rakov, I: Data quality and Its Use for Reconciling Inconsistencies in Multidatabase Environments, Ph.D. Dissertation, George Mason University, May 1998.
G. Salton and M. J. McGill: Introduction to Modern Information Retrieval. McGraw-Hill, New York, New York, 1983.
Wiederhold, G. (Ed.): Special Issue of the Journal of Intelligent Information Systems, 6(2–3), June 1996.
Wang, R., Storey, V., and Firth, Ch.: A framework for analysis of data quality research. IEEE Transactions on Knowledge and Data Engineering, 7(4), August 1995.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1998 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Motro, A., Rakov, I. (1998). Estimating the quality of databases. In: Andreasen, T., Christiansen, H., Larsen, H.L. (eds) Flexible Query Answering Systems. FQAS 1998. Lecture Notes in Computer Science, vol 1495. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0056011
Download citation
DOI: https://doi.org/10.1007/BFb0056011
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-65082-9
Online ISBN: 978-3-540-49655-7
eBook Packages: Springer Book Archive