Abstract
Clustering is widely exploited in data mining. It has been proved that embedding weak label prior into clustering is effective to promote its performance. Previous researches mainly focus on only one type of prior. However, in many real scenarios, two kinds of weak label prior information, e.g., pairwise constraints and cluster ratio, are easily obtained or already available. How to incorporate them to improve clustering performance is important but rarely studied. We propose a novel constrained Clustering with Weak Label Prior method (CWLP), which is an integrated framework. Within the unified spectral clustering model, the pairwise constraints are employed as a regularizer in spectral embedding and label proportion is added as a constraint in spectral rotation. To approximate a variant of the embedding matrix more precisely, we replace a cluster indicator matrix with its scaled version. Instead of fixing an initial similarity matrix, we propose a new similarity matrix that is more suitable for deriving clustering results. Except for the theoretical convergence and computational complexity analyses, we validate the effectiveness of CWLP through several benchmark datasets, together with its ability to discriminate suspected breast cancer patients from healthy controls. The experimental evaluation illustrates the superiority of our proposed approach.
Similar content being viewed by others
References
Jain A K, Murty M N, Flynn P J. Data clustering: a review. ACM Computing Surveys, 1999, 31(3): 264–323
Voss J, Belkin M, Rademacher L. The hidden convexity of spectral clustering. In: Proceedings of the 30th AAAI Conference on Artificial Intelligence. 2016, 2108–2114
Elhamifar E, Vidal R. Sparse subspace clustering: algorithm, theory, and applications. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(11): 2765–2781
Kim S, Yoo C D, Nowozin S, Kohli P. Image segmentation using higher-order correlation clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 36(9): 1761–1774
Mittal H, Pandey A C, Saraswat M, Kumar S, Pal R, Modwel G. A comprehensive survey of image segmentation: clustering methods, performance parameters, and benchmark datasets. Multimedia Tools and Applications, 2022, 81(24): 35001–35026
Shi J, Malik J. Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000, 22(8): 888–905
Zhang J, Zhou K, Luximon Y, Li P, Iftikhar H. 3D-guided facial shape clustering and analysis. Multimedia Tools and Applications, 2022, 81(6): 8785–8806
Liu H, Liu T, Wu J, Tao D, Fu Y. Spectral ensemble clustering. In: Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2015, 715–724
Hartigan J A, Wong M A. A k-means clustering algorithm. Journal of the Royal Statistical Society Series C: Applied Statistics, 1979, 28(1): 100–108
Kanungo T, Mount D M, Netanyahu N S, Piatko C D, Silverman R, Wu A. An efficient k-means clustering algorithm: analysis and implementation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2002, 24(7): 881–892
Johnson S C. Hierarchical clustering schemes. Psychometrika, 1967, 32(3): 241–254
Levin M S. Towards hierarchical clustering. In: Proceedings of the 2nd international conference on Computer Science: theory and applications. 2007, 205–215
Von Luxburg U. A tutorial on spectral clustering. Statistics and Computing, 2007, 17(4): 395–416
Lu C, Yan S, Lin Z. Convex sparse spectral clustering: single-view to multi-view. IEEE Transactions on Image Processing, 2016, 25(6): 2833–2843
Khan A A, Mohanty S K. A fast spectral clustering technique using MST based proximity graph for diversified datasets. Information Sciences, 2022, 609: 1113–1131
Lu C, Feng J, Lin Z, Mei T, Yan S. Subspace clustering by block diagonal representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 41(2): 487–501
Gao H, Nie F, Li X, Huang H. Multi-view subspace clustering. In: Proceedings of 2015 IEEE International Conference on Computer Vision. 2015, 4238–4246
Dong W, Wu X J, Kittler J. Subspace clustering via joint 11,2 and 12,1 norms. Information Sciences, 2022, 612: 675–686
Parmar M, Wang D, Zhang X, Tan A H, Miao C, Jiang J, Zhou Y. Redpc: a residual error-based density peak clustering algorithm. Neurocomputing, 2019, 348: 82–96
Parmar M D, Pang W, Hao D, Jiang J, Liupu W, Wang L, Zhou Y. FREDPC: a feasible residual error-based density peak clustering algorithm with the fragment merging strategy. IEEE Access, 2019, 7: 89789–89804
Liu G, Lin Z, Yan S, Sun J, Yu Y, Ma Y. Robust recovery of subspace structures by low-rank representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(1): 171–184
Tao Z, Liu H, Li S, Ding Z, Fu Y. Robust spectral ensemble clustering via rank minimization. ACM Transactions on Knowledge Discovery from Data, 2019, 13(1): 1–25
Zhu Y, Kwok J T, Zhou Z H. Multi-label learning with global and local label correlation. IEEE Transactions on Knowledge and Data Engineering, 2018, 30(6): 1081–1094
Saha S, Alok A K, Ekbal A. Brain image segmentation using semisupervised clustering. Expert Systems with Applications, 2016, 52: 50–63
Śmieja M, Myronov O, Tabor J. Semi-supervised discriminative clustering with graph regularization. Knowledge-Based Systems, 2018, 151: 24–36
Nie F, Zhang H, Wang R, Li X. Semi-supervised clustering via pairwise constrained optimal graph. In: Proceedings of the 29th International Joint Conference on Artificial Intelligence. 2021, 437
Zhu Z, Gao Q. Semi-supervised clustering via cannot link relationship for multiview data. IEEE Transactions on Circuits and Systems for Video Technology, 2022, 32(12): 8744–8755
Wang X, Qian B, Davidson I. On constrained spectral clustering and its applications. Data Mining and Knowledge Discovery, 2014, 28(1): 1–30
Chen L M, Xiu B X, Ding Z Y. Multiple weak supervision for short text classification. Applied Intelligence, 2022, 52(8): 9101–9116
Rasti R, Teshnehlab M, Phung S L. Breast cancer diagnosis in DCE-MRI using mixture ensemble of convolutional neural networks. Pattern Recognition, 2017, 72: 381–390
Lai K T, Yu F X, Chen M S, Chang S F. Video event detection by inferring temporal instance labels. In: Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition. 2014, 2243–2250
Poyiadzi R, Santos-Rodriguez R, Twomey N. Label propagation for learning with label proportions. In: Proceedings of the 28th IEEE International Workshop on Machine Learning for Signal Processing. 2018, 1–6
Sun N, Luo T, Zhuge W, Tao H, Hou C, Hu D. Semi-supervised learning with label proportion. IEEE Transactions on Knowledge and Data Engineering, 2023, 35(1): 877–890
Ng A, Jordan M, Weiss Y. On spectral clustering: Analysis and an algorithm. In: Proceedings of the 14th International Conference on Neural Information Processing Systems. 2011
Zhang X, You Q. An improved spectral clustering algorithm based on random walk. Frontiers of Computer Science in China, 2011, 5(3): 268–278
Yu S X, Shi J. Multiclass spectral clustering. In: Proceedings of the 9th IEEE International Conference on Computer Vision. 2003, 313–313
Liu W, He J, Chang S. Large graph construction for scalable semi-supervised learning. In: Proceedings of the 27th International Conference on International Conference on Machine Learning. 2010, 679–686
Huang J, Nie F, Huang H. Spectral rotation versus k-means in spectral clustering. In: Proceedings of the 27th AAAI Conference on Artificial Intelligence. 2013, 431–437
Nie F P, Wang X Q, Jordan M, Huang H. The constrained laplacian rank algorithm for graph-based clustering. In: Proceedings of the 37th AAAI Conference on Artificial Intelligence. 2016, 1969–1976
Pang Y, Xie J, Nie F, Li X. Spectral clustering by joint spectral embedding and spectral rotation. IEEE Transactions on Cybernetics, 2020, 50(1): 247–258
Nie F, Zhang R, Li X. A generalized power iteration method for solving quadratic problem on the stiefel manifold. Science China Information Sciences, 2017, 60(11): 112101
Zhu X, Ghahramani Z. Learning from labeled and unlabeled data with label propagation. Pittsburgh: Carnegie Mellon University, 2002
Hagen L, Kahng A B. New spectral methods for ratio cut partitioning and clustering. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 1992, 11(9): 1074–1085
MacQueen J. Some methods for classification and analysis of multivariate observations. In: Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability. 1967, 281–297
Van Craenendonck T, Dumancic S, Blockeel H. COBRA: a fast and simple method for active clustering with pairwise constraints. In: Proceedings of the 26th International Joint Conference on Artificial Intelligence. 2018
Nie F, Zhu W, Li X. Unsupervised large graph embedding based on balanced and hierarchical k-means. IEEE Transactions on Knowledge and Data Engineering, 2020, 34(4): 2008–2019
Zhu J, Tao L, Yang H, Nie F. Unsupervised optimized bipartite graph embedding. IEEE Transactions on Knowledge and Data Engineering, 2023, 35(3): 3224–3238
Wang J, Ma Z, Nie F, Li X. Efficient discrete clustering with anchor graph. IEEE Transactions on Neural Networks and Learning Systems, 2023, 1–9, doi: https://doi.org/10.1109/TNNLS.2023.3279380
Zhou D, Bousquet O, Lal T N, Weston J, Schölkopf B. Learning with local and global consistency. In: Proceedings of the 16th International Conference on Neural Information Processing Systems. 2003, 321–328
Acknowledgements
This work was partially supported by the National Key R&D Program (No. 2022ZD0114803), and the National Natural Science Foundation of China (Grant Nos. 62136005, 61922087).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests The authors declare that they have no competing interests or financial conflicts to disclose.
Additional information
Jing Zhang is a PhD candidate at the National University of Defense Technology, China. She received the BS degree from Sichuan Normal University, China in 2017 and the MS degree from the National University of Defense Technology, China in 2019. Her research interests include machine learning, system science, and data mining.
Ruidong Fan is a PhD candidate at the National University of Defense Technology, China. He received the BS degree from Lanzhou University, China in 2018 and the MS degree from the National University of Defense Technology, China in 2020. His research interests include data mining, optimization, and transfer learning.
Hong Tao received the PhD degree from the National University of Defense Technology, China in 2019. She is currently a Lecturer with the College of Liberal Arts and Science of the same university. Her research interests include machine learning, system science, and data mining.
Jiacheng Jiang received the MS degree from the National University of Defense Technology in 2020, China. He received the BS degree with the same university in 2018. His research interests include data mining and machine learning.
Chenping Hou received the PhD degree from the National University of Defense Technology, China in 2009. He is currently a full Professor with the Department of Systems Science of the same university. He has authored 80+ peer-reviewed papers in journals and conferences, such as the IEEE TPAMI, TNNLS/TNN, IEEE TSMCB/TCB, IEEE TIP, the IJCAI, and AAAI. His current research interests include machine learning, data mining, and computer vision.
Electronic supplementary material
Rights and permissions
About this article
Cite this article
Zhang, J., Fan, R., Tao, H. et al. Constrained clustering with weak label prior. Front. Comput. Sci. 18, 183338 (2024). https://doi.org/10.1007/s11704-023-3355-7
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11704-023-3355-7