Abstract
In order to mine useful information from huge datasets development of appropriate tools and techniques are needed to organize and evaluate such data. However, ultra high dimensionality of data poses serious challenges in data mining research. The method proposed in the paper encompasses a new strategy in dimensionality reduction by attribute clustering based on the dependency graph of the attributes. Information gain, an established theory of measuring uncertainty and quantified the information contained in the system, of each attribute is calculated that expresses dependency relationship between the attributes in the graph. The underlying principles able to select the optimum set of attributes, called reduct able to classify the dataset as could be done in presence of all attributes. The rate of dimension reduction of the datasets of UCI repository is measured and compared with existing methods and also the classification accuracy with reduced dataset is calculated by various classifiers to measure the effectiveness of the method.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Baldonado Pal, S.K., Mitra, S.: Neuro-Fuzzy pattern Recognition: Methods in Soft Computing. Willey, New York (1999)
Carreira-Perpinan, M.A.: A review of dimension reduction techniques. Technical report CS-96-09, Department of Computer Science, University of Sheffield (1997)
An, A., Huang, Y., Huang, X., Cercone, N.J.: Feature Selection with Rough Sets for Web Page Classification. In: Peters, J.F., Skowron, A., Dubois, D., Grzymała-Busse, J.W., Inuiguchi, M., Polkowski, L. (eds.) Transactions on Rough Sets II. LNCS, vol. 3135, pp. 1–13. Springer, Heidelberg (2004)
Pawlak, Z.: Rough sets. International Journal of information and Computer Sciences 11, 341–356 (1982)
Pawlak, Z.: Rough set theory and its applications to data analysis. Cybernetics and Systems 29(1998), 661–688 (1998)
Gupta, S.C., Kapoor, V.K.: Fundamental of Mathematical Statistics. Sultan Chand & Sons, A.S. Printing Press, India (1994)
Devroye, L., Gyorfi, L., Lugosi, G.: A Probabilistic Theory of Pattern Recognition. Springer, New York (1996)
Devijver, P.A., Kittler, J.: Pattern Recognition: A Statistical Approach. Prentice-Hall, Englewood Cliffs (1992)
Han, J., Kamber, M.: Data Mining: Concepts and Techniques, MK (2001)
Witten, I.H., Frank, E.: Data Mining:Practical Machine Learning Tools and Techniques with Java Implementations, MK (2000)
Deo, N.: Graph Theory with Applications to Engineering and Computer Science. Prentice-Hall of India Pvt. (1995) ISBN-81-203-0145-5
WEKA: Machine Learning Software, http://www.cs.waikato.ac.nz/~ml/
Murphy, P., Aha, W.: UCI repository of machine learning databases (1996), http://www.ics.uci.edu/mlearn/MLRepository.html
Hall, M.A.: Correlation-Based Feature Selection for Machine Learning PhD thesis, Dept. of Computer Science, Univ. of Waikato, Hamilton, New Zealand (1998)
Liu, H., Setiono, R.: A Probabilistic Approach to Feature Selection: A Filter Solution. In: Proc.13th Int’l Conf. Machine Learning, pp. 319–327 (1996)
Kerber, R.: ChiMerge: Discretization of Numeric Attributes. In: Proceedings of AAAI 1992, Ninth Int’l Conf. Artificial Intelligence, pp. 123–128. AAAI Press (1992)
Daren, Y., Qinghua, H., Wen, B.: Combining multiple neural networks for classification based on rough set reduction. In: IEEE int. Conf. Neural Network & Signal Processing, Nanjing, China, December 14-17 (2003)
Jain, A., Murty, M., Flynn, P.: Data clustering: A review. ACM Comput. Surv. 31(3), 264–323 (1999)
Everitt, B., Landau, S., Leese, M.: Cluster Analysis. Arnold, London (2001)
Hall, M.A.: Correlation-Based Feature Selection for Machine Learning PhD thesis, Dept. of Computer Science, Univ. of Waikato, Hamilton, New Zealand (1998)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Das, A.K., Sil, J., Phadikar, S. (2011). Attribute Clustering and Dimensionality Reduction Based on In/Out Degree of Attributes in Dependency Graph. In: Panigrahi, B.K., Suganthan, P.N., Das, S., Satapathy, S.C. (eds) Swarm, Evolutionary, and Memetic Computing. SEMCCO 2011. Lecture Notes in Computer Science, vol 7076. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-27172-4_46
Download citation
DOI: https://doi.org/10.1007/978-3-642-27172-4_46
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-27171-7
Online ISBN: 978-3-642-27172-4
eBook Packages: Computer ScienceComputer Science (R0)