Abstract
The organizations’ demand to integrate several disparate data sources and an ever-increasing amount of data is intensifying the occurrence of data quality problems. Currently, data cleaning approaches are tailored for data sources having different schemas but sharing the same data model (e.g. relational model), and are highly dependent on a domain expert to specify data cleaning operations. This paper presents a novel and generic data cleaning methodology aiming to assist the domain expert during the specification of data cleaning operations through reusing knowledge previously expressed for other data sources even if those sources have different data models and/or schemas. This is achieved by abstracting data source models and schemas to a closer human level and by the use of vocabulary to describe the structure and the semantics of data cleaning operations.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Atzori, L., Iera, A., Morabito, G.: The internet of things: A survey. Elsevier. Computer networks 54, 2787–2805 (2010)
Snijders, C., Matzat, U., Reips, U.-D.: ’Big Data’: Big gaps of knowledge in the field of Internet. Int. Journal of Internet Science 7, 1–5 (2012)
Ibrahim, A., Targio, H., Ibrar, Y., Nor, B.A., Salimah, M., Abdullah, G., Samee, U.K.: The rise of big data on cloud computing: Review and open research issues. Information Systems 47, 98–115 (2015), http://dx.doi.org/10.1016/j.is.2014.07.006 ISSN 0306-4379
Codd, E.F.: A relational model of data for large shared data banks. Communications of the ACM, 377–387 (1970)
Booch, G.: Object- oriented analysis and design with applications, 2nd edn. Addison-Wesley Professional (1993) ISBN 9780805353402
Han, J., Haihong, E., Le, G., Du, J.: Survey on NoSQL database (2011)
OWL 2 Web Ontology Language RDF-Based Semantics (Second Edition), http://www.w3.org/TR/owl2-rdf-based-semantics/
Oliveira, P.: Detection and Correction of Data Quality Problems: Model, Syntax and Semantic. University of Minho, PhD Thesis in Computer Science (2008)
Milano, D., Scannapieco, M., Catarci, T.: Using ontologies for xml data cleaning. In: On the Move to Meaningful Internet Systems. OTM Workshops, pp. 562–571 (2010)
Dasu, T., Vesonder, G.T., Wright, J.R.: Data Quality Trough Knowledge Engineering. In: Proceedings of the SIGKDD 2003 Conference, pp. 705–710. Washington (2003)
Fürber, C., Hepp, M.: Towards a vocabulary for data quality management in semantic web architectures. In: Proceedings of the 1st Int. Workshop on Linked Web Data Management, LWDM 2011, pp. 1–8. ACM, New York (2011)
Oliveira, P., Rodrigues, F., Henriques, P.: SmartClean: An Incremental Data Cleaning Tool. In: Proceedings of the 9th Int. Conference on Quality Software, Jeju, Korea, pp. 452–457 (August 2009)
Melanie, W., Ioana, M.: Declarative XML Data Cleaning with XClean (2006)
Magnus, K., Harald, S.: Data Cleansing Consolidation with PatchR, The Semantic Web: ESWC 2014 Satellite Events LNotes in Computer Science, pp. 231-235 (2014)
Oliveira, P., Rodrigues, F.: e Henriques, P.: SmartClean: An Incremental Data Cleaning Tool. In: Proceedings of the 9th Int. Conference on Quality Software, pp. 452–457 (2009)
Almeida, R., Oliveira, P., Braga, L., Barroso, J.: Ontologies for Reusing Data Cleaning Knowledge. In: IEEE Sixth Int. Conference on Semantic Computing, pp. 238–241 (2012)
SPARQL Query Language for RDF, http://www.w3.org/TR/rdf-sparql-query/
Obrst, L., Liu, H., Wray, R.: Ontologies for Corporate Web Applications. AI Magazine 24(3), 49–62 (2003) ISSN:0738-4602
Accessing Relational Databases as Virtual RDF Graphs, http://d2rq.org
A Direct Mapping of Relational Data to RDF, http://www.w3.org/TR/rdb-direct-mapping/
R2RML: RDB to RDF Mapping Language, http://www.w3.org/TR/r2rml/
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Almeida, R., Maio, P., Oliveira, P., João, B. (2015). Towards Reusing Data Cleaning Knowledge. In: Rocha, A., Correia, A., Costanzo, S., Reis, L. (eds) New Contributions in Information Systems and Technologies. Advances in Intelligent Systems and Computing, vol 353. Springer, Cham. https://doi.org/10.1007/978-3-319-16486-1_14
Download citation
DOI: https://doi.org/10.1007/978-3-319-16486-1_14
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-16485-4
Online ISBN: 978-3-319-16486-1
eBook Packages: Computer ScienceComputer Science (R0)