Abstract
In the current work, a multiobjective-based feature selection technique is proposed which utilizes different quality measures to evaluate the goodness of reduced feature set. Two different perspectives are incorporated in the feature selection process: (1) selected subset of features should not destroy the geometric distribution of the sample space, i.e., the neighborhood topology should be preserved in the reduced feature space; (2) selected feature subset should have minimal redundancy and high correlation with the classes. In order to capture the second goal, several information theory-based quality measures like normalized mutual information, correlation with the class attribute, information gain and entropy are utilized. In order to capture the first aspect, concepts of shared nearest-neighbor distance are utilized. Multiobjective framework is employed to optimize all these measures, individually and in different combinations to reduce the feature set. The approach is evaluated on six publicly available data sets with respect to different classifiers, and results conclusively demonstrate the potency of utilizing both types of objectives functions in reducing the feature set. Several performance metrics like accuracy, redundancy and Jaccard score are used for measuring the quality of the selected feature subset in comparison with several state-of-the-art techniques. Experimental results on several data sets illustrate that there is no universal model (optimization of a set of objective functions) which can perform well over all the data sets with respect to different quality measures. But in general optimization of all objective functions (PMCI model) consistently performs well for all the data sets.






Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
https://archive.ics.uci.edu/ml/support/
Optical+Recognition+of+Handwritten+Digits.
References
Bhadra T, Bandyopadhyay S (2015) Unsupervised feature selection using an improved version of differential evolution. Expert Syst Appl 42(8):4042–4053
Dash M, Liu H (1997) Feature selection for classification. Intell Data Anal 1(3):131–156. http://dl.acm.org/citation.cfm?id=2639279.2639281
Deb K, Kalyanmoy D (2001) Multi-objective optimization using evolutionary algorithms. Wiley, New York
Deb K, Pratap A, Agarwal S, Meyarivan T (2002) A fast and elitist multiobjective genetic algorithm: Nsga-ii. IEEE Trans Evol Comput 6(2):182–197
Garg RP, Sharapov I (2002) Techniques for optimizing applications: high performance computing. Prentice Hall Professional Technical Reference, Upper Saddle River, NJ
Houle ME, Kriegel H, Kröger P, Schubert E, Zimek A (2010) Can shared-neighbor distances defeat the curse of dimensionality? In: 22nd international conference scientific and statistical database management, SSDBM 2010, Heidelberg, Germany, 30 June–2 July 2010. Proceedings, pp 482–500. https://doi.org/10.1007/978-3-642-13818-8_34
Kira K, Rendell LA (1992) A practical approach to feature selection. In: Proceedings of the 9th international workshop on machine learning, ML92, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, pp 249–256. http://dl.acm.org/citation.cfm?id=141975.142034
Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97(1–2):273–324. https://doi.org/10.1016/S0004-3702(97)00043-X
Kundu PP, Mitra S (2015) Multi-objective optimization of shared nearest neighbor similarity for feature selection. Appl Soft Comput 37:751–762
Liu H, Yu L (2005) Toward integrating feature selection algorithms for classification and clustering. IEEE Trans Knowl Data Eng 17(4):491–502
Molina LC, Belanche L, Nebot A (2002) Feature selection algorithms: a survey and experimental evaluation, In: Proceedings of the 2002 IEEE international conference on data mining, ICDM ’02, IEEE Computer Society, Washington, DC, USA, p 306. http://dl.acm.org/citation.cfm?id=844380.844722
Sánchez-Maroño N, Alonso-Betanzos A, Tombilla-Sanromán M (2007) Filter methods for feature selection: a comparative study. In: Proceedings of the 8th international conference on intelligent data engineering and automated learning, IDEAL’07. Springer, Berlin, pp 178–187. http://dl.acm.org/citation.cfm?id=1777942.1777962
Zhao Z, Liu H (2007) Spectral feature selection for supervised and unsupervised learning, In: Proceedings of the 24th international conference on machine learning, ICML ’07, ACM, New York, NY, USA, pp 1151–1157. https://doi.org/10.1145/1273496.1273641
Zhao Z, Wang L, Liu H, Ye J (2013) On similarity preserving feature selection. IEEE Trans Knowl Data Eng 25(3):619–632. https://doi.org/10.1109/TKDE.2011.222
Acknowledgements
No funding is involved in this work. Authors would like to acknowledge the help from Indian Institute of Technology Patna to conduct this research.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
All the authors declare that they do not have any conflict of interest.
Human and animal rights
We have not performed any experiments which involve animals or humans.
Additional information
Communicated by V. Loia.
Rights and permissions
About this article
Cite this article
Saha, S., Kaur, M. Identification of topology-preserving, class-relevant feature subsets using multiobjective optimization. Soft Comput 23, 4717–4733 (2019). https://doi.org/10.1007/s00500-018-3122-0
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-018-3122-0