Abstract
Multi-label data classification has become an important and active research topic, where the classification algorithm is required to deal with prediction of sets of label indicators for instances simultaneously. Label powerset (LP) method reduces the multi-label classification problem to a single-label multi-class classification problem by treating each distinct combination of labels. However, the predictive performance of LP is challenged with imbalanced distribution among the labelsets, deteriorating the performance of traditional classifiers. In this paper, we study the problem of multi-label imbalanced data classification and propose a novel solution, called CSRankSVM (Cost sensitive Ranking Support Vector Machine), which assigns a different misclassification cost for each labelset to effectively tackle the problem of imbalance for Multi-label data. Empirical studies on popular benchmark datasets with various imbalance ratios of labelsets demonstrate that the proposed CSRankSVM approach can effectively boost classification performances in multi-label datasets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Zhang, M.L., Zhou, Z.H.: A review on multi-label learning algorithms. IEEE Trans. Knowl. Data Eng. 26(8), 1819–1837 (2014)
Tsoumakas, G., Katakis, I.: Multi-label classification: an overview. Int. J. Data Warehouse. Min. 3(3), 1–13 (2007)
Tsoumakas, G., Katakis, I., Vlahavas, I.: Random k-labelsets for multilabel classification. IEEE Trans. Knowl. Data Eng. 23(7), 1079–1089 (2011)
Lo, H.Y., Lin, S.D., Wang, H.M.: Generalized k-labelsets ensemble for multi-label and cost-sensitive classification. IEEE Trans. Knowl. Data Eng. 26(7), 1679–1691 (2014)
He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)
Charte, F., Rivera, A.J., del Jesus, M.J., Herrera, F.: Addressing imbalance in multilabel classification: measures and random resampling algorithms. Neurocomputing 163, 3–16 (2015)
Tahir, M.A., Kittler, J., Yan, F.: Inverse random under sampling for class imbalance problem and its application to multi-label classification. Pattern Recogn. 45, 3738–3750 (2012)
Elisseeff, A., Weston, J.: Kernel Methods for Multi-labelled Classification and Categorical Regression Problems. Technical report, BIOwulf Technologies (2001). http://www.kyb.tuebingen.mpg.de/bs/people/weston/publications
Bao, L., Juan, C., Li, J., Zhang, Y.: Boosted Near-miss Under-sampling on SVM ensembles for concept detection in large-scale imbalanced datasets. Neurocomputing 172, 198–206 (2016)
Cao, P., Zhao, D., Zaiane, O.: An optimized cost-sensitive SVM for imbalanced data learning. In: Pei, J., Tseng, V.S., Cao, L., Motoda, H., Xu, G. (eds.) PAKDD 2013. LNCS (LNAI), vol. 7819, pp. 280–292. Springer, Heidelberg (2013). doi:10.1007/978-3-642-37456-2_24
Read, J.: A pruned problem transformation method for multi-label classification. In: Proceedings of the 2008 New Zealand Computer Science Research Student Conference (NZCSRS 2008), vol. 143150 (2008)
Zhang, M.L., Zhou, Z.H.: Multi-label neural networks with applications to functional genomics and text categorization. IEEE Trans. Knowl. Data Eng. 18(10), 1338–1351 (2006)
Boutell, M.R., Luo, J., Shen, X., Brown, C.M.: Learning multi-label scene classification. Pattern Recogn. 37(9), 1757–1771 (2004)
Tsoumakas, G., Katakis, I., Vlahavas, I.: Effective and efficient multilabel classification in domains with large number of labels . In: Proceedings of ECML/PKDD Workshop on Mining Multidimensional Data (MMD 2008), pp. 30–44 (2008)
Reed, J., Pfahringer, B., Holmes, G.: Classifier chain for multi-label classification. Mach. Learn. 85(3), 333–359 (2011)
Fürnkranz, J., Hüllermeier, E., Mencía, E.L., Brinker, K.: Multilabel classification via calibrated label ranking. Mach. Learn. 73(2), 133–153 (2008)
Acknowledgments
This research was supported by the National Natural Science Foundation of China (61502091), the Fundamental Research Funds for the Central Universities (N140403004), and the Postdoctoral Science Foundation of China (2015M570254).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Cao, P., Liu, X., Zhao, D., Zaiane, O. (2017). Cost Sensitive Ranking Support Vector Machine for Multi-label Data Learning. In: Abraham, A., Haqiq, A., Alimi, A., Mezzour, G., Rokbani, N., Muda, A. (eds) Proceedings of the 16th International Conference on Hybrid Intelligent Systems (HIS 2016). HIS 2016. Advances in Intelligent Systems and Computing, vol 552. Springer, Cham. https://doi.org/10.1007/978-3-319-52941-7_25
Download citation
DOI: https://doi.org/10.1007/978-3-319-52941-7_25
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-52940-0
Online ISBN: 978-3-319-52941-7
eBook Packages: EngineeringEngineering (R0)