Data Quality Problems beyond Consistency and Deduplication | SpringerLink
Skip to main content

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8000))

Abstract

Recent work on data quality has primarily focused on data repairing algorithms for improving data consistency and record matching methods for data deduplication. This paper accentuates several other challenging issues that are essential to developing data cleaning systems, namely, error correction with performance guarantees, unification of data repairing and record matching, relative information completeness, and data currency. We provide an overview of recent advances in the study of these issues, and advocate the need for developing a logical framework for a uniform treatment of these issues.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 5719
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 7149
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Arenas, M., Bertossi, L.E., Chomicki, J.: Consistent query answers in inconsistent databases. TPLP 3(4-5), 393–424 (2003)

    MathSciNet  MATH  Google Scholar 

  2. Berti-Equille, L., Sarma, A.D., Dong, X., Marian, A., Srivastava, D.: Sailing the information ocean with awareness of currents: Discovery and application of source dependence. In: CIDR (2009)

    Google Scholar 

  3. Bohannon, P., Fan, W., Flaster, M., Rastogi, R.: A cost-based model and effective heuristic for repairing constraints by value modification. In: SIGMOD (2005)

    Google Scholar 

  4. Bravo, L., Fan, W., Ma, S.: Extending dependencies with conditions. In: VLDB (2007)

    Google Scholar 

  5. Chiang, F., Miller, R.: Discovering data quality rules. PVLDB 1(1) (2008)

    Google Scholar 

  6. Cong, G., Fan, W., Geerts, F., Jia, X., Ma, S.: Improving data quality: Consistency and accuracy. In: VLDB (2007)

    Google Scholar 

  7. Dong, X., Berti-Equille, L., Srivastava, D.: Truth discovery and copying detection in a dynamic world. In: VLDB (2009)

    Google Scholar 

  8. Dong, X., Halevy, A., Madhavan, J.: Reference reconciliation in complex information spaces. In: SIGMOD (2005)

    Google Scholar 

  9. Eckerson, W.W.: Data quality and the bottom line: Achieving business success through a commitment to high quality data. The Data Warehousing Institute (2002)

    Google Scholar 

  10. Elmagarmid, A., Ipeirotis, P., Verykios, V.: Duplicate record detection: A survey. TKDE 19(1), 1–16 (2007)

    Google Scholar 

  11. Fan, W., Gao, H., Jia, X., Li, J., Ma, S.: Dynamic constraints for record matching. VLDB J. 20(4), 495–520 (2011)

    Article  Google Scholar 

  12. Fan, W., Geerts, F.: Capturing missing tuples and missing values. In: PODS (2010)

    Google Scholar 

  13. Fan, W., Geerts, F.: Relative information completeness. TODS 35(4) (2010)

    Google Scholar 

  14. Fan, W., Geerts, F.: Uniform dependency language for improving data quality. IEEE Data Eng. Bull. 34(3), 34–42 (2011)

    Google Scholar 

  15. Fan, W., Geerts, F.: Foundations of Data Quality Management. Synthesis Lectures on Data Management. Morgan & Claypool Publishers (2012)

    Google Scholar 

  16. Fan, W., Geerts, F., Jia, X., Kementsietsidis, A.: Conditional functional dependencies for capturing data inconsistencies. TODS 33(2) (2008)

    Google Scholar 

  17. Fan, W., Geerts, F., Wijsen, J.: Determining the currency of data. In: PODS (2011)

    Google Scholar 

  18. Fan, W., Li, J., Ma, S., Tang, N., Yu, W.: Interaction between record matching and data repairing. In: SIGMOD (2011)

    Google Scholar 

  19. Fan, W., Li, J., Ma, S., Tang, N., Yu, W.: Towards certain fixes with editing rules and master data. VLDB J. 21(2), 213–238 (2012)

    Article  Google Scholar 

  20. Fellegi, I., Holt, D.: A systematic approach to automatic edit and imputation. J. American Statistical Association 71(353), 17–35 (1976)

    Article  Google Scholar 

  21. Gartner. Forecast: Data quality tools, worldwide, 2006-2011. Technical report, Gartner (2007)

    Google Scholar 

  22. Gottlob, G., Zicari, R.: Closed world databases opened through null values. In: VLDB (1988)

    Google Scholar 

  23. Grahne, G.: The Problem of Incomplete Information in Relational Databases. Springer (1991)

    Google Scholar 

  24. Herzog, T.N., Scheuren, F.J., Winkler, W.E.: Data Quality and Record Linkage Techniques. Springer (2009)

    Google Scholar 

  25. Imieliński, T., Lipski Jr., W.: Incomplete information in relational databases. JACM 31(4) (1984)

    Google Scholar 

  26. Levy, A.Y.: Obtaining complete answers from incomplete databases. In: VLDB (1996)

    Google Scholar 

  27. Loshin, D.: Master Data Management. Knowledge Integrity, Inc. (2009)

    Google Scholar 

  28. Mayfield, C., Neville, J., Prabhakar, S.: ERACER: a database approach for statistical inference and data cleaning. In: SIGMOD (2010)

    Google Scholar 

  29. Miller, D.W., et al.: Missing prenatal records at a birth center: A communication problem quantified. In: AMIA Annu. Symp. Proc. (2005)

    Google Scholar 

  30. Motro, A.: Integrity = validity + completeness. TODS 14(4) (1989)

    Google Scholar 

  31. Otto, B., Weber, K.: From health checks to the seven sisters: The data quality journey at BT (September 2009) BT TR-BE HSG/CC CDQ/8

    Google Scholar 

  32. Snodgrass, R.T.: Developing Time-Oriented Database Applications in SQL. Morgan Kaufmann (1999)

    Google Scholar 

  33. Song, S., Chen, L.: Discovering matching dependencies. In: CIKM (2009)

    Google Scholar 

  34. van der Meyden, R.: The complexity of querying indefinite data about linearly ordered domains. JCSS 54(1) (1997)

    Google Scholar 

  35. van der Meyden, R.: Logical approaches to incomplete information: A survey. In: Chomicki, J., Saake, G. (eds.) Logics for Databases and Information Systems. Kluwer (1998)

    Google Scholar 

  36. Weis, M., Naumann, F.: Dogmatix tracks down duplicates in XML. In: SIGMOD (2005)

    Google Scholar 

  37. Yakout, M., Elmagarmid, A.K., Neville, J., Ouzzani, M., Ilyas, I.F.: Guided data repair. PVLDB 4(1) (2011)

    Google Scholar 

  38. Zhang, H., Diao, Y., Immerman, N.: Recognizing patterns in streams with imprecise timestamps. In: VLDB (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Fan, W., Geerts, F., Ma, S., Tang, N., Yu, W. (2013). Data Quality Problems beyond Consistency and Deduplication. In: Tannen, V., Wong, L., Libkin, L., Fan, W., Tan, WC., Fourman, M. (eds) In Search of Elegance in the Theory and Practice of Computation. Lecture Notes in Computer Science, vol 8000. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41660-6_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-41660-6_12

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-41659-0

  • Online ISBN: 978-3-642-41660-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics