Abstract
In the course of day-to-day work, huge volumes of data sets constantly grow accumulating a large number of features, but lack completeness and have relatively low information density. Dimensionality reduction and feature selection are the core issues in handling such data sets and more specifically, discovering relationships in data. Dimensionality reduction by reduct generation is an important aspect of classification where reduced attribute set has the same classification power as the entire set of attributes of an information system. In the paper, multiple reducts are generated integrating the concept of rough set theory (RST) and relational algebra operations. As a next step, the attributes of the reducts, which are relatively better associated and have stronger classification power, are selected to generate the single reduct using classical Apriori algorithm. Different classifiers are built using the single reduct and accuracies are compared to measure the effectiveness of the proposed method.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Carter, C., Hamilton, H.: Efficient attribute-oriented generalization for knowledge discovery from large databases. IEEE Trans. Knowledge and Data Engineering 10, 193–208 (1998)
Dong, G., Li, J.: Efficient mining of emerging patterns: Discovering trends and differences. In: Proc. 1999 Int. Conf. Knowledge Discovery and Data Mining, KDD 1999, pp. 43–52 (1999)
Klemettinen, M., Mannila, H., Ronkainen, P., Toivonen, H.: Finding interesting rules from large sets of discovered association rules. In: Proceedings of the 3rd International Conference on Information and Knowledge Management (CIKM 1994), pp. 401–407. ACM Press, New York (1994)
Pawlak, Z.: Rough set theory and its applications to data analysis. Cybernetics and systems 29, 661–688 (1998)
Pawlak, Z.: Rough sets – Theoritical aspects of reasoning about data, vol. 229. Kluwer Academic Publishers, Dordrecht (1991)
Agrawal, R., Srikant, R.: Fast Algorithm for Mining Association Rules. In: Proc. of the 20th VLDB Conference, pp. 487–499 (1994)
Ziarko, W.: Rough sets as a methodology for data mining. In: Rough Sets in Knowledge Discovery 1: Methodology and Applications, pp. 554–576. Physica-Verlag, Heidelberg (1998)
Swiniarski, W., Skowron, A.: Rough set methods in feature selection and recognition. Pattern Recog. Letters 24(6), 833–849 (2003)
The Apriori Algorithm (a Tutorial) Markus Hegland CMA, Australian National University John Dedman Building, Canberra ACT 0200, Australia
Pawlak, Z.: Drawing Conclusions from Data-The Rough Set Way. IJIS 16, 3–11 (2001)
Witten, I.H., Frank, E.: Data Mining:Practical Machine Learning Tools and Techniques with Java Implementations. MK (2000)
Han, J., Kamber, M.: Data Miningg:Concepts and Techniques. MK (2001)
Pawlak, Z.: Rough set. Int. J. of Computer and Information Science 11, 341–356 (1982)
Murphy, P., Aha, W.: UCI repository of machine learning databases (1996), http://www.ics.uci.edu/mlearn/MLRepository.html
Das, A.K., Sil, J.: An Efficient Classifier Design Integrating Rough Set and Graph Theory based Decision Forest. In: the 4th Indian International Conference on Artificial Intelligence (IICAI 2009), Siddaganga Institute of Technology, December 16-18, pp. 533–544, Tumkur, India (2009)
WEKA: Machine Learning Software, http://www.cs.waikato.ac.nz/~ml/
Borgelt, C.: Apriori: Finding Association Rules/ Hyperedges with the Apriori Algorithm School of Computer Science, University of Magdeburg (2004)
Quinlan, J.R.: The minimum description length and categorical theories. In: Proceedings 11th International Conference on Machine learning, New Brunswick, pp. 233–241. Morgan Kaufmann, San Francisco
Hansen, M., Yu, B.: Model selection and the principle of minimum description length. J. Am. Stat. Assoc. 96, 746–774 (2001)
Roman, W.S., Hargis, L.: Rough sets as a frontend as neural-networks texture classifiers. Neurocomputing 36, 85–102 (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Das, A.K., Sil, J. (2010). Dimensionality Reduction and Optimum Feature Selection in Designing Efficient Classifiers. In: Panigrahi, B.K., Das, S., Suganthan, P.N., Dash, S.S. (eds) Swarm, Evolutionary, and Memetic Computing. SEMCCO 2010. Lecture Notes in Computer Science, vol 6466. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-17563-3_67
Download citation
DOI: https://doi.org/10.1007/978-3-642-17563-3_67
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-17562-6
Online ISBN: 978-3-642-17563-3
eBook Packages: Computer ScienceComputer Science (R0)