Abstract
We present experimental results on a comparison of incompleteness and inconsistency. Our experiments were conducted on 141 data sets, including 71 incomplete data and 62 inconsistent, created from eight original numerical data sets. We used the Modified Learning from Examples Module version 2 (MLEM2) rule induction algorithm for data mining. Among eight types of data sets combined with three kinds of probabilistic approximations used in experiments, in 12 out of 24 combinations the error rate, computed as a result of ten-fold cross validation, was smaller for inconsistent data (two-tailed test, 5 % significance level). For one data set, combined with all three probabilistic approximations, the error rate was smaller for incomplete data. For remaining nine combinations the difference in performance was statistically insignificant. Thus, we may claim that there is some experimental evidence that incompleteness is generally worse than inconsistency for data mining.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Chmielewski, M.R., Grzymala-Busse, J.W.: Global discretization of continuous attributes as preprocessing for machine learning. Int. J. Approximate Reasoning 15(4), 319–331 (1996)
Clark, P.G., Grzymala-Busse, J.W.: Experiments on probabilistic approximations. In: Proceedings of the 2011 IEEE International Conference on Granular Computing, pp. 144–149 (2011)
Grzymala-Busse, J.W.: A new version of the rule induction system LERS. Fundamenta Informaticae 31, 27–39 (1997)
Grzymala-Busse, J.W.: Rough set strategies to data with missing attribute values. In: Notes of the Workshop on Foundations and New Directions of Data Mining, in conjunction with the Third International Conference on Data Mining, pp. 56–63 (2003)
Grzymala-Busse, J.W.: Data with missing attribute values: Generalization of indiscernibility relation and rule induction. Trans. Rough Sets 1, 78–95 (2004)
Grzymala-Busse, J.W.: Generalized parameterized approximations. In: Proceedings of the 6-th International Conference on Rough Sets and Knowledge Technology, pp. 136–145 (2011)
Grzymala-Busse, J.W., Rzasa, W.: Definability and other properties of approximations for generalized indiscernibility relations. Trans. Rough Sets 11, 14–39 (2010)
Grzymala-Busse, J.W., Wang, A.Y.: Modified algorithms LEM1 and LEM2 for rule induction from data with missing attribute values. In: Proceedings of the 5-th International Workshop on Rough Sets and Soft Computing in conjunction with the Third Joint Conference on Information Sciences, pp. 69–72 (1997)
Pawlak, Z.: Rough sets. Int. J. Comput. Inform. Sci. 11, 341–356 (1982)
Pawlak, Z.: Rough Sets. Theoretical Aspects of Reasoning about Data. Kluwer Academic Publishers, Dordrecht (1991)
Pawlak, Z., Skowron, A.: Rough sets: some extensions. Inf. Sci. 177, 28–40 (2007)
Pawlak, Z., Wong, S.K.M., Ziarko, W.: Rough sets: probabilistic versus deterministic approach. Int. J. Man Mach. Stud. 29, 81–95 (1988)
Stefanowski, J., Tsoukias, A.: Incomplete information tables and rough classification. Comput. Intell. 17(3), 545–566 (2001)
Yao, Y.Y.: Probabilistic rough set approximations. Int. J. Approximate Reasoning 49, 255–271 (2008)
Yao, Y.Y., Wong, S.K.M.: A decision theoretic framework for approximate concepts. Int. J. Man Mach. Stud. 37, 793–809 (1992)
Ziarko, W.: Probabilistic approach to rough sets. Int. J. Approximate Reasoning 49, 272–284 (2008)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Clark, P.G., Gao, C., Grzymala-Busse, J.W. (2016). A Comparison of Mining Incomplete and Inconsistent Data. In: Dregvaite, G., Damasevicius, R. (eds) Information and Software Technologies. ICIST 2016. Communications in Computer and Information Science, vol 639. Springer, Cham. https://doi.org/10.1007/978-3-319-46254-7_33
Download citation
DOI: https://doi.org/10.1007/978-3-319-46254-7_33
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-46253-0
Online ISBN: 978-3-319-46254-7
eBook Packages: Computer ScienceComputer Science (R0)