Abstract
This paper introduces a filter, named FCF (Fuzzy Clustering-based Filter), for removing redundant features, thus making it possible to improve the efficacy and the efficiency of data mining algorithms. FCF is based on the fuzzy partitioning of features into clusters. The number of clusters is automatically estimated from data. After the clustering process, FCF selects a subset of features from the obtained clusters. To do so, we study four different strategies that are based on the information provided by the fuzzy partition matrix. We also show that these strategies can be combined for better performance. Empirical results illustrate the performance of FCF, which in general has obtained competitive results in classification tasks when compared to a related filter that is based on the hard partitioning of features.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Liu, H., Yu, L.: Toward integrating feature selection algorithms for classification and clustering. IEEE Trans. Know. Data Eng. 17(4), 491–502 (2005)
Liu, H., Motoda, H.: Feature Selection for Knowledge Discovery and Data Mining. Kluwer Academic Publishers, Dordrecht (1998)
Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artificial Intelligence 97(1-2), 273–324 (1997)
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)
Au, W., Chan, K., Wong, A., Wang, Y.: Attribute clustering for grouping, selection, and classification of gene expression data. IEEE/ACM Trans. Comput. Biol. Bioinformatics 2(2), 83–101 (2005)
Mitra, P., Murthy, C.A., Pal, S.K.: Unsupervised feature selection using feature similarity. IEEE Trans. Pattern Anal. Mach. Intell. 24(3), 301–312 (2002)
Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: ICML1997, pp. 412–420. Morgan Kaufmann Publishers Inc., San Francisco (1997)
Yu, L., Liu, H.: Efficient feature selection via analysis of relevance and redundancy. J. Mach. Learn. Res. 5, 1205–1224 (2004)
Covões, T.F., Hruschka, E.R., de Castro, L.N., dos Santos, A.M.: A cluster-based feature selection approach. In: Corchado, E., Wu, X., Oja, E., Herrero, Á., Baruque, B. (eds.) HAIS 2009. LNCS (LNAI), vol. 5572, pp. 169–176. Springer, Heidelberg (2009)
Covões, T.F., Hruschka, E.R.: An experimental study on unsupervised clustering-based feature selection methods. In: ISDA 2009, pp. 993–1000. IEEE Press, Los Alamitos (2009)
Arabie, P., Hubert, L.J.: 1. In: An Overview of Combinatorial Data Analysis, pp. 5–64. World Scientific Publishing Company, Singapore (1999)
Hruschka, E.R., Campello, R.J.G.B., de Castro, L.N.: Evolving clusters in gene-expression data. Information Sciences 176(13), 1898–1927 (2006)
Everitt, B.S.: Cluster Analysis. Edward Arnold and Halsted Press (2001)
Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data: An Introduction to Cluster Analysis. John Wiley, Chichester (1990)
Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: a new data clustering algorithm and its applications. Data Mining and Knowledge Discovery 1(2), 141–182 (1997)
Krishnapuram, R., Joshi, A., Nasraoui, O., Yi, L.: Low-complexity fuzzy relational clustering algorithms for web mining. IEEE Trans. Fuzzy Syst. 9(4), 595–607 (2001)
Bezdek, J.: Pattern Recognition with Fuzzy Objective Function Algorithms. Kluwer Academic Publishers, Dordrecht (1981)
Campello, R., Hruschka, E.: A fuzzy extension of the silhouette width criterion for cluster analysis. Fuzzy Sets and Systems 157(21), 2858–2875 (2006)
Yeung, K., Medvedovic, M., Bumgarner, R.: Clustering gene-expression data with repeated measurements. Genome Biology 4(5), R34 (2003)
Alon, U., Barkai, N., Notterman, D., Gishdagger, K., Ybarradagger, S., Mackdagger, D., Levine, A.: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc. Nat. Acad. Sciences USA 96(12), 6745–6750 (1999)
Asuncion, A., Newman, D.: UCI machine learning repository (2007), http://www.ics.uci.edu/~mlearn/MLRepository.html
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, San Francisco (2005)
Reunanen, J., Guyon, I., Elisseeff, A.: Overfitting in making comparisons between variable selection methods. J. of Mach. Learn. Res. 3, 1371–1382 (2003)
Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)
Wu, X., Kumar, V., Quinlan, J.R., Ghosh, J., Yang, Q., Motoda, H., Mclachlan, G., Ng, A., Liu, B., Yu, P., Zhou, Z., Steinbach, M., Hand, D., Steinberg, D.: Top 10 algorithms in data mining. Know. Inform. Systems 14(1), 1–37 (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Coletta, L.F.S., Hruschka, E.R., Covoes, T.F., Campello, R.J.G.B. (2010). Fuzzy Clustering-Based Filter. In: Hüllermeier, E., Kruse, R., Hoffmann, F. (eds) Information Processing and Management of Uncertainty in Knowledge-Based Systems. Theory and Methods. IPMU 2010. Communications in Computer and Information Science, vol 80. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14055-6_42
Download citation
DOI: https://doi.org/10.1007/978-3-642-14055-6_42
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-14054-9
Online ISBN: 978-3-642-14055-6
eBook Packages: Computer ScienceComputer Science (R0)