Abstract
In many multilabel learning applications, instances with labels being fully provided are scarce, while partially labelled data and unlabelled data are more common due to the expensive cost of manual labelling. However, most of existing models are based on the assumption that the fully labelled training data is sufficient. To deal with the partially labelled and unlabelled data effectively, we present a novel semi-supervised multilabel learning approach based on constrained non-negative matrix factorization in this paper. This approach assumes that if two instances are highly similar in terms of their features, they would also be similar in their associated labels set. Specifically, We first define three matrices to measure the similarity of each pair of instances in two different ways. Then, the optimal assignation of labels to the unlabelled instance is determined by minimizing the differentiation between these two similarity sets via a non-negative matrix factorization process. We also present a threshold learning algorithm to determine the classification threshold for each label in our proposed approach. Extensive experiment is conducted on various datasets, and the results demonstrate that our method show significantly better performance than other state-of-the-art approaches. It is especially suitable for the situations with a smaller size of labelled training data, or subset of the training data are partially labelled.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
http://mulan.sourceforge.net/datasets-mlc.html
References
Ashfaq RAR, Wang XZ, Huang JZ, Abbas H, He YL (2017) Fuzziness based semi-supervised learning approach for intrusion detection system. Inf Sci 378:484–497
Belkin M, Niyogi P, Sindhwani V (2006) Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. J Mach Learn Res 7:2399–2434
Boutell MR, Luo J, Shen X, Brown CM (2004) Learning multi-label scene classification. Pattern Recogn 37(9):1757–1771
Cheng W, Hüllermeier E (2009) Combining instance-based learning and logistic regression for multilabel classification. Mach Learn 76(2–3):211–225
Clare A, King RD (2001) Knowledge discovery in multi-label phenotype data. In: European Conference on Principles of Data Mining and Knowledge Discovery, Springer, New York, pp 42–53
Elisseeff A, Weston J (2001) A kernel method for multi-labelled classification. In: Proceedings of the 14th international conference on neural information processing systems: natural and synthetic. MIT Press, pp 681–687
Fan RE, Lin CJ (2007) A study on threshold selection for multi-label classification. Department of Computer Science, National Taiwan University, pp 1–23
Feng X, Jiao Y, Lv C, Zhou D (2016) Label consistent semi-supervised non-negative matrix factorization for maintenance activities identification. Eng Appl Artif Intell 52:161–167
Fu B, Xu G, Wang Z, Cao L (2013) Leveraging supervised label dependency propagation for multi-label learning. 2013 IEEE 13th international conference on data mining (ICDM), IEEE, pp 1061–1066
Fürnkranz J, Hüllermeier E, Mencía EL, Brinker K (2008) Multilabel classification via calibrated label ranking. Mach Learn 73(2):133–153
Ghamrawi N, McCallum A (2005) Collective multi-label classification. In: Proceedings of the 14th ACM international conference on information and knowledge management. ACM, pp 195–200
Guo Y, Gu S (2011) Multi-label classification using conditional dependency networks. In: IJCAI proceedings–international joint conference on artificial intelligence, vol 22, no 1, p 1300
Kimura K, Kudo M, Sun L (2015) Dimension reduction using nonnegative matrix tri-factorization in multi-label classification. In: Proceedings of the international conference on parallel and distributed processing techniques and applications (PDPTA), p 250
Lee DD, Seung HS (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401(6755):788–791
Liu H, Wu Z, Li X, Cai D, Huang TS (2012) Constrained nonnegative matrix factorization for image representation. IEEE Trans Pattern Anal Mach Intell 34(7):1299–1311
Liu Y, Jin R, Yang L (2006) Semi-supervised multi-label learning by constrained non-negative matrix factorization. In: Proceedings of the National Conference on Artificial Intelligence, vol 21, Menlo Park, Cambridge, London; AAAI Press; MIT Press
Qi GJ, Hua XS, Rui Y, Tang J, Mei T, Zhang HJ (2007) Correlative multi-label video annotation. In: Proceedings of the 15th international conference on Multimedia, ACM, pp 17–26
Qian B, Davidson I (2010) Semi-supervised dimension reduction for multi-label classification. AAAI 10:569–574
Read J, Pfahringer B, Holmes G, Frank E (2011) Classifier chains for multi-label classification. Mach Learn 85(3):333–359
Schapire RE, Singer Y (2000) Boostexter: A boosting-based system for text categorization. Mach Learn 39(2):135–168
Tai F, Lin HT (2012) Multilabel classification with principal label space transformation. Neural Comput 24(9):2508–2542
Tsoumakas G, Katakis I, Vlahavas I (2009) Mining multi-label data. In: Maimon O, Rokach L (eds) Data mining and knowledge discovery handbook. Springer, Boston, MA, pp 667–685
Tsoumakas G, Spyromitros-Xioufis E, Vilcek J, Vlahavas I (2011) Mulan: a java library for multi-label learning. J Mach Learn Res 12:2411–2414
Wang D, Gao X, Wang X (2016) Semi-supervised nonnegative matrix factorization via constraint propagation. IEEE Trans Cybern 46(1):233–244
Wang R, Wang XZ, Kwong S, Xu C (2017) Incorporating diversity and informativeness in multiple-instance active learning. IEEE Trans Fuzzy Syst 25(6):1460–1475
Wicker J, Pfahringer B, Kramer S (2012) Multi-label classification using boolean matrix decomposition. In: Proceedings of the 27th annual ACM symposium on applied computing. ACM, pp 179–186
Yu D, Chen N, Jiang F, Fu B, Qin A (2017) Constrained nmf-based semi-supervised learning for social media spammer detection. Knowl-Based Syst 125:64–73
Zhang ML, Zhang K (2010) Multi-label learning by exploiting label dependency. In: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp 999–1008
Zhang ML, Zhou ZH (2007) Ml-knn: A lazy learning approach to multi-label learning. Pattern Recogn 40(7):2038–2048
Zhang ML, Zhou ZH (2014) A review on multi-label learning algorithms. IEEE Trans Knowl Data Eng 26(8):1819–1837
Zhu H, Wang X (2017) A cost-sensitive semi-supervised learning model based on uncertainty. Neurocomputing 251:106–114
Acknowledgements
Many thanks to the Advanced Analytics Institute (AAi) at University of Technology, Sydney for the study provided by the learning and working conditions. This work is supported by the Public Welfare Technology Application Research Project of Zhejiang province, China (2016C33196, 2017C33105), the Natural Science Foundation of Zhejiang Province, China (LY15F020016), and the Science and Technology Project of state administration of press, publication, radio, film and television of China (201309).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Yu, D., Fu, B., Xu, G. et al. Constrained nonnegative matrix factorization-based semi-supervised multilabel learning. Int. J. Mach. Learn. & Cyber. 10, 1093–1100 (2019). https://doi.org/10.1007/s13042-018-0787-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-018-0787-8