Abstract
Recent work on data quality has primarily focused on data repairing algorithms for improving data consistency and record matching methods for data deduplication. This paper accentuates several other challenging issues that are essential to developing data cleaning systems, namely, error correction with performance guarantees, unification of data repairing and record matching, relative information completeness, and data currency. We provide an overview of recent advances in the study of these issues, and advocate the need for developing a logical framework for a uniform treatment of these issues.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Arenas, M., Bertossi, L.E., Chomicki, J.: Consistent query answers in inconsistent databases. TPLP 3(4-5), 393–424 (2003)
Berti-Equille, L., Sarma, A.D., Dong, X., Marian, A., Srivastava, D.: Sailing the information ocean with awareness of currents: Discovery and application of source dependence. In: CIDR (2009)
Bohannon, P., Fan, W., Flaster, M., Rastogi, R.: A cost-based model and effective heuristic for repairing constraints by value modification. In: SIGMOD (2005)
Bravo, L., Fan, W., Ma, S.: Extending dependencies with conditions. In: VLDB (2007)
Chiang, F., Miller, R.: Discovering data quality rules. PVLDB 1(1) (2008)
Cong, G., Fan, W., Geerts, F., Jia, X., Ma, S.: Improving data quality: Consistency and accuracy. In: VLDB (2007)
Dong, X., Berti-Equille, L., Srivastava, D.: Truth discovery and copying detection in a dynamic world. In: VLDB (2009)
Dong, X., Halevy, A., Madhavan, J.: Reference reconciliation in complex information spaces. In: SIGMOD (2005)
Eckerson, W.W.: Data quality and the bottom line: Achieving business success through a commitment to high quality data. The Data Warehousing Institute (2002)
Elmagarmid, A., Ipeirotis, P., Verykios, V.: Duplicate record detection: A survey. TKDE 19(1), 1–16 (2007)
Fan, W., Gao, H., Jia, X., Li, J., Ma, S.: Dynamic constraints for record matching. VLDB J. 20(4), 495–520 (2011)
Fan, W., Geerts, F.: Capturing missing tuples and missing values. In: PODS (2010)
Fan, W., Geerts, F.: Relative information completeness. TODS 35(4) (2010)
Fan, W., Geerts, F.: Uniform dependency language for improving data quality. IEEE Data Eng. Bull. 34(3), 34–42 (2011)
Fan, W., Geerts, F.: Foundations of Data Quality Management. Synthesis Lectures on Data Management. Morgan & Claypool Publishers (2012)
Fan, W., Geerts, F., Jia, X., Kementsietsidis, A.: Conditional functional dependencies for capturing data inconsistencies. TODS 33(2) (2008)
Fan, W., Geerts, F., Wijsen, J.: Determining the currency of data. In: PODS (2011)
Fan, W., Li, J., Ma, S., Tang, N., Yu, W.: Interaction between record matching and data repairing. In: SIGMOD (2011)
Fan, W., Li, J., Ma, S., Tang, N., Yu, W.: Towards certain fixes with editing rules and master data. VLDB J. 21(2), 213–238 (2012)
Fellegi, I., Holt, D.: A systematic approach to automatic edit and imputation. J. American Statistical Association 71(353), 17–35 (1976)
Gartner. Forecast: Data quality tools, worldwide, 2006-2011. Technical report, Gartner (2007)
Gottlob, G., Zicari, R.: Closed world databases opened through null values. In: VLDB (1988)
Grahne, G.: The Problem of Incomplete Information in Relational Databases. Springer (1991)
Herzog, T.N., Scheuren, F.J., Winkler, W.E.: Data Quality and Record Linkage Techniques. Springer (2009)
Imieliński, T., Lipski Jr., W.: Incomplete information in relational databases. JACM 31(4) (1984)
Levy, A.Y.: Obtaining complete answers from incomplete databases. In: VLDB (1996)
Loshin, D.: Master Data Management. Knowledge Integrity, Inc. (2009)
Mayfield, C., Neville, J., Prabhakar, S.: ERACER: a database approach for statistical inference and data cleaning. In: SIGMOD (2010)
Miller, D.W., et al.: Missing prenatal records at a birth center: A communication problem quantified. In: AMIA Annu. Symp. Proc. (2005)
Motro, A.: Integrity = validity + completeness. TODS 14(4) (1989)
Otto, B., Weber, K.: From health checks to the seven sisters: The data quality journey at BT (September 2009) BT TR-BE HSG/CC CDQ/8
Snodgrass, R.T.: Developing Time-Oriented Database Applications in SQL. Morgan Kaufmann (1999)
Song, S., Chen, L.: Discovering matching dependencies. In: CIKM (2009)
van der Meyden, R.: The complexity of querying indefinite data about linearly ordered domains. JCSS 54(1) (1997)
van der Meyden, R.: Logical approaches to incomplete information: A survey. In: Chomicki, J., Saake, G. (eds.) Logics for Databases and Information Systems. Kluwer (1998)
Weis, M., Naumann, F.: Dogmatix tracks down duplicates in XML. In: SIGMOD (2005)
Yakout, M., Elmagarmid, A.K., Neville, J., Ouzzani, M., Ilyas, I.F.: Guided data repair. PVLDB 4(1) (2011)
Zhang, H., Diao, Y., Immerman, N.: Recognizing patterns in streams with imprecise timestamps. In: VLDB (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Fan, W., Geerts, F., Ma, S., Tang, N., Yu, W. (2013). Data Quality Problems beyond Consistency and Deduplication. In: Tannen, V., Wong, L., Libkin, L., Fan, W., Tan, WC., Fourman, M. (eds) In Search of Elegance in the Theory and Practice of Computation. Lecture Notes in Computer Science, vol 8000. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41660-6_12
Download citation
DOI: https://doi.org/10.1007/978-3-642-41660-6_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-41659-0
Online ISBN: 978-3-642-41660-6
eBook Packages: Computer ScienceComputer Science (R0)