Abstract
Feature selection is an important step of data processing. When feature selection is conducted for multi-label classification problem in online learning fashion, it is the problem of online multi-label feature selection. Online feature selection is very appropriate for some actual situations in which the data is not available in advance, the data size is very large or fast running speed is highly demanding. We propose an online multi-label feature selection algorithm in which the data set is divided into many single-label data sets, feature selection is conducted for each single-label data set and the final features are selected from the selected single-label features. As many data sets are imbalanced, we use the basic idea of cost-sensitive learning to combat it. Experiment results corroborate the performance of our algorithm on various data sets and demonstrate that the proposed algorithm can improve online classification performance on imbalanced data sets effectively.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Li, H., Xu, X., Lai, L., Shen, Y.: Online commercial intention detection framework based on web pages. Int. J. Comput. Sci. Eng. 12(2/3), 176–185 (2016)
Perozzi, B., Al-Rfou, R., Skiena, S: Deepwalk: online learning of social representations. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 701–710 (2014)
Rosenblatt, F.: The perception: a probabilistic model for information storage and organization in the brain. Psychol. Rev. 65(6), 386–408 (1958)
Crammer, K., Dekel, O., Keshet, J., Shalev-Shwartz, S., Singer, Y.: Online passive-aggressive algorithms. J. Mach. Learn. Res. 7(3), 551–585 (2006)
Cesabianchi, N., Conconi, A., Gentile, C.: A second-order perceptron algorithm. SIAM J. Comput. 2375(3), 121–137 (2002)
Wang, J., Zhao, P., Hoi, S.C.H.: Exact soft confidence-weighted learning. In: Computer Science, pp. 107–114 (2012)
Crammer, K., Dredze, M., Pereira, F.: Confidence-weighted linear classification for text categorization. J. Mach. Learn. Res. 13(1), 1891–1926 (2012)
Zhang, M.L., Zhou, Z.H.: A review on multi-label learning algorithms. IEEE Trans. Knowl. Data Eng. 26(8), 1819–1837 (2014)
Dash, M., Gopalkrishnan, V.: Distance based feature selection for clustering microarray data. In: Haritsa, J.R., Kotagiri, R., Pudi, V. (eds.) DASFAA 2008. LNCS, vol. 4947, pp. 512–519. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-78568-2_41
Karegowda, A.G., Bharathi, P.T.: Enhancing cbir performance using evolutionary algorithm-assisted significant feature selection: a filter approach. Int. J. Appl. Res. Inf. Technol. Comput. 7(1), 53–59 (2016)
Rodrigues, D., Nakamura, R.Y.M., Costa, K.A.P., Yang, X.S.: A wrapper approach for feature selection based on bat algorithm and optimum-path forest. Expert Syst. Appl. 41(5), 2250–2258 (2014)
Chandrashekar, G., Sahin, F.: A survey on feature selection methods. Comput. Electr. Eng. 40(1), 16–28 (2014)
Li-Yeh, C., Ke, C.H., Yang, C.H.: A hybrid both filter and wrapper feature selection method for microarray classification. In: International Multi Conference of Engineers and Computer Scientists, vol. 2168 (2008)
Longadge, R., Dongre, S.: Class imbalance problem in data mining review. Int. J. Comput. Sci. Netw. 2(1), 83 (2013)
Wang, J., Zhao, P., Hoi, S.C.H., Jin, R.: Online feature selection and its applications. IEEE Trans. Knowl. Data Eng. 26(3), 698–710 (2013)
UCI machine learning repository. https://archive.ics.uci.edu/ml/datasets.html
Han, C., Tan, Y.K., Zhu, J.H., et al.: Online feature selection of class imbalance via PA algorithm. J. Comput. Sci. Technol. 31(4), 673–682 (2016)
Kubat, M., Matwin, S.: Addressing the curse of imbalanced training sets: one-sided selection. In: International Conference on Machine Learning, pp. 179–186 (1997)
Chen, X.W., Wasikowski, M.: FAST:a ROC-based feature selection metric for small samples and imbalanced data classification problems. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, vol. 46, pp. 124–132 (2008)
Acknowledgments
This work is supported by the National Natural Science Foundation of China (NSFC) under the grant number 61379127, 61379128.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Liu, J., Guo, Z., Sun, Z., Liu, S., Wang, X. (2018). Online Multi-label Feature Selection on Imbalanced Data Sets. In: Li, J., et al. Wireless Sensor Networks. CWSN 2017. Communications in Computer and Information Science, vol 812. Springer, Singapore. https://doi.org/10.1007/978-981-10-8123-1_15
Download citation
DOI: https://doi.org/10.1007/978-981-10-8123-1_15
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-8122-4
Online ISBN: 978-981-10-8123-1
eBook Packages: Computer ScienceComputer Science (R0)