Constrained clustering with weak label prior | Frontiers of Computer Science Skip to main content
Log in

Constrained clustering with weak label prior

  • Research Article
  • Published:
Frontiers of Computer Science Aims and scope Submit manuscript

Abstract

Clustering is widely exploited in data mining. It has been proved that embedding weak label prior into clustering is effective to promote its performance. Previous researches mainly focus on only one type of prior. However, in many real scenarios, two kinds of weak label prior information, e.g., pairwise constraints and cluster ratio, are easily obtained or already available. How to incorporate them to improve clustering performance is important but rarely studied. We propose a novel constrained Clustering with Weak Label Prior method (CWLP), which is an integrated framework. Within the unified spectral clustering model, the pairwise constraints are employed as a regularizer in spectral embedding and label proportion is added as a constraint in spectral rotation. To approximate a variant of the embedding matrix more precisely, we replace a cluster indicator matrix with its scaled version. Instead of fixing an initial similarity matrix, we propose a new similarity matrix that is more suitable for deriving clustering results. Except for the theoretical convergence and computational complexity analyses, we validate the effectiveness of CWLP through several benchmark datasets, together with its ability to discriminate suspected breast cancer patients from healthy controls. The experimental evaluation illustrates the superiority of our proposed approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Jain A K, Murty M N, Flynn P J. Data clustering: a review. ACM Computing Surveys, 1999, 31(3): 264–323

    Article  Google Scholar 

  2. Voss J, Belkin M, Rademacher L. The hidden convexity of spectral clustering. In: Proceedings of the 30th AAAI Conference on Artificial Intelligence. 2016, 2108–2114

  3. Elhamifar E, Vidal R. Sparse subspace clustering: algorithm, theory, and applications. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(11): 2765–2781

    Article  Google Scholar 

  4. Kim S, Yoo C D, Nowozin S, Kohli P. Image segmentation using higher-order correlation clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 36(9): 1761–1774

    Article  Google Scholar 

  5. Mittal H, Pandey A C, Saraswat M, Kumar S, Pal R, Modwel G. A comprehensive survey of image segmentation: clustering methods, performance parameters, and benchmark datasets. Multimedia Tools and Applications, 2022, 81(24): 35001–35026

    Article  Google Scholar 

  6. Shi J, Malik J. Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000, 22(8): 888–905

    Article  Google Scholar 

  7. Zhang J, Zhou K, Luximon Y, Li P, Iftikhar H. 3D-guided facial shape clustering and analysis. Multimedia Tools and Applications, 2022, 81(6): 8785–8806

    Article  Google Scholar 

  8. Liu H, Liu T, Wu J, Tao D, Fu Y. Spectral ensemble clustering. In: Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2015, 715–724

  9. Hartigan J A, Wong M A. A k-means clustering algorithm. Journal of the Royal Statistical Society Series C: Applied Statistics, 1979, 28(1): 100–108

    MATH  Google Scholar 

  10. Kanungo T, Mount D M, Netanyahu N S, Piatko C D, Silverman R, Wu A. An efficient k-means clustering algorithm: analysis and implementation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2002, 24(7): 881–892

    Article  MATH  Google Scholar 

  11. Johnson S C. Hierarchical clustering schemes. Psychometrika, 1967, 32(3): 241–254

    Article  MATH  Google Scholar 

  12. Levin M S. Towards hierarchical clustering. In: Proceedings of the 2nd international conference on Computer Science: theory and applications. 2007, 205–215

  13. Von Luxburg U. A tutorial on spectral clustering. Statistics and Computing, 2007, 17(4): 395–416

    Article  MathSciNet  Google Scholar 

  14. Lu C, Yan S, Lin Z. Convex sparse spectral clustering: single-view to multi-view. IEEE Transactions on Image Processing, 2016, 25(6): 2833–2843

    Article  MathSciNet  MATH  Google Scholar 

  15. Khan A A, Mohanty S K. A fast spectral clustering technique using MST based proximity graph for diversified datasets. Information Sciences, 2022, 609: 1113–1131

    Article  Google Scholar 

  16. Lu C, Feng J, Lin Z, Mei T, Yan S. Subspace clustering by block diagonal representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 41(2): 487–501

    Article  Google Scholar 

  17. Gao H, Nie F, Li X, Huang H. Multi-view subspace clustering. In: Proceedings of 2015 IEEE International Conference on Computer Vision. 2015, 4238–4246

  18. Dong W, Wu X J, Kittler J. Subspace clustering via joint 11,2 and 12,1 norms. Information Sciences, 2022, 612: 675–686

    Article  Google Scholar 

  19. Parmar M, Wang D, Zhang X, Tan A H, Miao C, Jiang J, Zhou Y. Redpc: a residual error-based density peak clustering algorithm. Neurocomputing, 2019, 348: 82–96

    Article  Google Scholar 

  20. Parmar M D, Pang W, Hao D, Jiang J, Liupu W, Wang L, Zhou Y. FREDPC: a feasible residual error-based density peak clustering algorithm with the fragment merging strategy. IEEE Access, 2019, 7: 89789–89804

    Article  Google Scholar 

  21. Liu G, Lin Z, Yan S, Sun J, Yu Y, Ma Y. Robust recovery of subspace structures by low-rank representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(1): 171–184

    Article  Google Scholar 

  22. Tao Z, Liu H, Li S, Ding Z, Fu Y. Robust spectral ensemble clustering via rank minimization. ACM Transactions on Knowledge Discovery from Data, 2019, 13(1): 1–25

    Article  Google Scholar 

  23. Zhu Y, Kwok J T, Zhou Z H. Multi-label learning with global and local label correlation. IEEE Transactions on Knowledge and Data Engineering, 2018, 30(6): 1081–1094

    Article  Google Scholar 

  24. Saha S, Alok A K, Ekbal A. Brain image segmentation using semisupervised clustering. Expert Systems with Applications, 2016, 52: 50–63

    Article  Google Scholar 

  25. Śmieja M, Myronov O, Tabor J. Semi-supervised discriminative clustering with graph regularization. Knowledge-Based Systems, 2018, 151: 24–36

    Article  Google Scholar 

  26. Nie F, Zhang H, Wang R, Li X. Semi-supervised clustering via pairwise constrained optimal graph. In: Proceedings of the 29th International Joint Conference on Artificial Intelligence. 2021, 437

  27. Zhu Z, Gao Q. Semi-supervised clustering via cannot link relationship for multiview data. IEEE Transactions on Circuits and Systems for Video Technology, 2022, 32(12): 8744–8755

    Article  Google Scholar 

  28. Wang X, Qian B, Davidson I. On constrained spectral clustering and its applications. Data Mining and Knowledge Discovery, 2014, 28(1): 1–30

    Article  MathSciNet  MATH  Google Scholar 

  29. Chen L M, Xiu B X, Ding Z Y. Multiple weak supervision for short text classification. Applied Intelligence, 2022, 52(8): 9101–9116

    Article  Google Scholar 

  30. Rasti R, Teshnehlab M, Phung S L. Breast cancer diagnosis in DCE-MRI using mixture ensemble of convolutional neural networks. Pattern Recognition, 2017, 72: 381–390

    Article  Google Scholar 

  31. Lai K T, Yu F X, Chen M S, Chang S F. Video event detection by inferring temporal instance labels. In: Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition. 2014, 2243–2250

  32. Poyiadzi R, Santos-Rodriguez R, Twomey N. Label propagation for learning with label proportions. In: Proceedings of the 28th IEEE International Workshop on Machine Learning for Signal Processing. 2018, 1–6

  33. Sun N, Luo T, Zhuge W, Tao H, Hou C, Hu D. Semi-supervised learning with label proportion. IEEE Transactions on Knowledge and Data Engineering, 2023, 35(1): 877–890

    Article  Google Scholar 

  34. Ng A, Jordan M, Weiss Y. On spectral clustering: Analysis and an algorithm. In: Proceedings of the 14th International Conference on Neural Information Processing Systems. 2011

  35. Zhang X, You Q. An improved spectral clustering algorithm based on random walk. Frontiers of Computer Science in China, 2011, 5(3): 268–278

    Article  MathSciNet  MATH  Google Scholar 

  36. Yu S X, Shi J. Multiclass spectral clustering. In: Proceedings of the 9th IEEE International Conference on Computer Vision. 2003, 313–313

  37. Liu W, He J, Chang S. Large graph construction for scalable semi-supervised learning. In: Proceedings of the 27th International Conference on International Conference on Machine Learning. 2010, 679–686

  38. Huang J, Nie F, Huang H. Spectral rotation versus k-means in spectral clustering. In: Proceedings of the 27th AAAI Conference on Artificial Intelligence. 2013, 431–437

  39. Nie F P, Wang X Q, Jordan M, Huang H. The constrained laplacian rank algorithm for graph-based clustering. In: Proceedings of the 37th AAAI Conference on Artificial Intelligence. 2016, 1969–1976

  40. Pang Y, Xie J, Nie F, Li X. Spectral clustering by joint spectral embedding and spectral rotation. IEEE Transactions on Cybernetics, 2020, 50(1): 247–258

    Article  Google Scholar 

  41. Nie F, Zhang R, Li X. A generalized power iteration method for solving quadratic problem on the stiefel manifold. Science China Information Sciences, 2017, 60(11): 112101

    Article  MathSciNet  Google Scholar 

  42. Zhu X, Ghahramani Z. Learning from labeled and unlabeled data with label propagation. Pittsburgh: Carnegie Mellon University, 2002

    Google Scholar 

  43. Hagen L, Kahng A B. New spectral methods for ratio cut partitioning and clustering. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 1992, 11(9): 1074–1085

    Article  Google Scholar 

  44. MacQueen J. Some methods for classification and analysis of multivariate observations. In: Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability. 1967, 281–297

  45. Van Craenendonck T, Dumancic S, Blockeel H. COBRA: a fast and simple method for active clustering with pairwise constraints. In: Proceedings of the 26th International Joint Conference on Artificial Intelligence. 2018

  46. Nie F, Zhu W, Li X. Unsupervised large graph embedding based on balanced and hierarchical k-means. IEEE Transactions on Knowledge and Data Engineering, 2020, 34(4): 2008–2019

    Google Scholar 

  47. Zhu J, Tao L, Yang H, Nie F. Unsupervised optimized bipartite graph embedding. IEEE Transactions on Knowledge and Data Engineering, 2023, 35(3): 3224–3238

    Google Scholar 

  48. Wang J, Ma Z, Nie F, Li X. Efficient discrete clustering with anchor graph. IEEE Transactions on Neural Networks and Learning Systems, 2023, 1–9, doi: https://doi.org/10.1109/TNNLS.2023.3279380

    Google Scholar 

  49. Zhou D, Bousquet O, Lal T N, Weston J, Schölkopf B. Learning with local and global consistency. In: Proceedings of the 16th International Conference on Neural Information Processing Systems. 2003, 321–328

Download references

Acknowledgements

This work was partially supported by the National Key R&D Program (No. 2022ZD0114803), and the National Natural Science Foundation of China (Grant Nos. 62136005, 61922087).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chenping Hou.

Ethics declarations

Competing interests The authors declare that they have no competing interests or financial conflicts to disclose.

Additional information

Jing Zhang is a PhD candidate at the National University of Defense Technology, China. She received the BS degree from Sichuan Normal University, China in 2017 and the MS degree from the National University of Defense Technology, China in 2019. Her research interests include machine learning, system science, and data mining.

Ruidong Fan is a PhD candidate at the National University of Defense Technology, China. He received the BS degree from Lanzhou University, China in 2018 and the MS degree from the National University of Defense Technology, China in 2020. His research interests include data mining, optimization, and transfer learning.

Hong Tao received the PhD degree from the National University of Defense Technology, China in 2019. She is currently a Lecturer with the College of Liberal Arts and Science of the same university. Her research interests include machine learning, system science, and data mining.

Jiacheng Jiang received the MS degree from the National University of Defense Technology in 2020, China. He received the BS degree with the same university in 2018. His research interests include data mining and machine learning.

Chenping Hou received the PhD degree from the National University of Defense Technology, China in 2009. He is currently a full Professor with the Department of Systems Science of the same university. He has authored 80+ peer-reviewed papers in journals and conferences, such as the IEEE TPAMI, TNNLS/TNN, IEEE TSMCB/TCB, IEEE TIP, the IJCAI, and AAAI. His current research interests include machine learning, data mining, and computer vision.

Electronic supplementary material

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, J., Fan, R., Tao, H. et al. Constrained clustering with weak label prior. Front. Comput. Sci. 18, 183338 (2024). https://doi.org/10.1007/s11704-023-3355-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11704-023-3355-7

Keywords

Navigation