Towards Reusing Data Cleaning Knowledge | SpringerLink
Skip to main content

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 353))

  • 4297 Accesses

Abstract

The organizations’ demand to integrate several disparate data sources and an ever-increasing amount of data is intensifying the occurrence of data quality problems. Currently, data cleaning approaches are tailored for data sources having different schemas but sharing the same data model (e.g. relational model), and are highly dependent on a domain expert to specify data cleaning operations. This paper presents a novel and generic data cleaning methodology aiming to assist the domain expert during the specification of data cleaning operations through reusing knowledge previously expressed for other data sources even if those sources have different data models and/or schemas. This is achieved by abstracting data source models and schemas to a closer human level and by the use of vocabulary to describe the structure and the semantics of data cleaning operations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 39925
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Atzori, L., Iera, A., Morabito, G.: The internet of things: A survey. Elsevier. Computer networks 54, 2787–2805 (2010)

    Article  MATH  Google Scholar 

  2. Snijders, C., Matzat, U., Reips, U.-D.: ’Big Data’: Big gaps of knowledge in the field of Internet. Int. Journal of Internet Science 7, 1–5 (2012)

    Google Scholar 

  3. Ibrahim, A., Targio, H., Ibrar, Y., Nor, B.A., Salimah, M., Abdullah, G., Samee, U.K.: The rise of big data on cloud computing: Review and open research issues. Information Systems 47, 98–115 (2015), http://dx.doi.org/10.1016/j.is.2014.07.006 ISSN 0306-4379

  4. Codd, E.F.: A relational model of data for large shared data banks. Communications of the ACM, 377–387 (1970)

    Google Scholar 

  5. Booch, G.: Object- oriented analysis and design with applications, 2nd edn. Addison-Wesley Professional (1993) ISBN 9780805353402

    Google Scholar 

  6. Han, J., Haihong, E., Le, G., Du, J.: Survey on NoSQL database (2011)

    Google Scholar 

  7. OWL 2 Web Ontology Language RDF-Based Semantics (Second Edition), http://www.w3.org/TR/owl2-rdf-based-semantics/

  8. Oliveira, P.: Detection and Correction of Data Quality Problems: Model, Syntax and Semantic. University of Minho, PhD Thesis in Computer Science (2008)

    Google Scholar 

  9. Milano, D., Scannapieco, M., Catarci, T.: Using ontologies for xml data cleaning. In: On the Move to Meaningful Internet Systems. OTM Workshops, pp. 562–571 (2010)

    Google Scholar 

  10. Dasu, T., Vesonder, G.T., Wright, J.R.: Data Quality Trough Knowledge Engineering. In: Proceedings of the SIGKDD 2003 Conference, pp. 705–710. Washington (2003)

    Google Scholar 

  11. Fürber, C., Hepp, M.: Towards a vocabulary for data quality management in semantic web architectures. In: Proceedings of the 1st Int. Workshop on Linked Web Data Management, LWDM 2011, pp. 1–8. ACM, New York (2011)

    Google Scholar 

  12. Oliveira, P., Rodrigues, F., Henriques, P.: SmartClean: An Incremental Data Cleaning Tool. In: Proceedings of the 9th Int. Conference on Quality Software, Jeju, Korea, pp. 452–457 (August 2009)

    Google Scholar 

  13. Melanie, W., Ioana, M.: Declarative XML Data Cleaning with XClean (2006)

    Google Scholar 

  14. Magnus, K., Harald, S.: Data Cleansing Consolidation with PatchR, The Semantic Web: ESWC 2014 Satellite Events LNotes in Computer Science, pp. 231-235 (2014)

    Google Scholar 

  15. Oliveira, P., Rodrigues, F.: e Henriques, P.: SmartClean: An Incremental Data Cleaning Tool. In: Proceedings of the 9th Int. Conference on Quality Software, pp. 452–457 (2009)

    Google Scholar 

  16. Almeida, R., Oliveira, P., Braga, L., Barroso, J.: Ontologies for Reusing Data Cleaning Knowledge. In: IEEE Sixth Int. Conference on Semantic Computing, pp. 238–241 (2012)

    Google Scholar 

  17. SPARQL Query Language for RDF, http://www.w3.org/TR/rdf-sparql-query/

  18. Obrst, L., Liu, H., Wray, R.: Ontologies for Corporate Web Applications. AI Magazine 24(3), 49–62 (2003) ISSN:0738-4602

    Google Scholar 

  19. Accessing Relational Databases as Virtual RDF Graphs, http://d2rq.org

  20. A Direct Mapping of Relational Data to RDF, http://www.w3.org/TR/rdb-direct-mapping/

  21. R2RML: RDB to RDF Mapping Language, http://www.w3.org/TR/r2rml/

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ricardo Almeida .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Almeida, R., Maio, P., Oliveira, P., João, B. (2015). Towards Reusing Data Cleaning Knowledge. In: Rocha, A., Correia, A., Costanzo, S., Reis, L. (eds) New Contributions in Information Systems and Technologies. Advances in Intelligent Systems and Computing, vol 353. Springer, Cham. https://doi.org/10.1007/978-3-319-16486-1_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-16486-1_14

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-16485-4

  • Online ISBN: 978-3-319-16486-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics