Abstract
Deep clustering using a deep neural network (DNN) is widely used for simultaneously learning feature representation and clustering. The existing constrained deep clustering methods utilize prior knowledge for improving deep clustering. However, most of these methods randomly select prior knowledge (pairwise constraints) and fail to use it appropriately in the deep clustering process. The present study aims to address this limitation by proposing a new scheme for integrating and improving constrained deep clustering by active learning from dual source. The scheme is DNN for initializing the nonlinear transformation of the original feature space, clustering layer, as well as the constrained clustering layer which is parallel to the clustering layer and uses prior knowledge as a set of neighborhoods. In addition, active learning uses the above-mentioned two layers as a source simultaneously as the proposed scheme for selecting informative and diverse data. The suggested method can simultaneously lead to constrained clustering, learn the latent feature space with the guidance of the constraints set, and indirectly cause the data belonging to one neighborhood to be closer to its center (i.e. away from other neighborhoods centers). Different experiments on different datasets indicate the efficiency and robustness of the proposed method.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
References
Jain AK (2010) Data clustering: 50 years beyond K-means. Pattern Recogn Lett 31(8):651–666
Altınel B, Ganiz MC (2018) Semantic text classification: a survey of past and recent advances. Inf Process Manag 54(6):1129–1153
Kim HK, Kim H, Cho S (2017) Bag-of-concepts: comprehending document representation through clustering words in distributed representation. Neurocomputing 266:336–352
Huang S, Xu Z, Lv J (2018) Adaptive local structure learning for document co-clustering. Knowl-Based Syst 148:74–84
Zhao K, Dai Y, Jia Z, Ji Y (2021) General fuzzy C-means clustering algorithm using Minkowski metric. Signal Processing 188:108161
Dinler D, Tural MK (2016) A survey of constrained clustering. In: Celebi M, Aydin K (eds) Unsupervised Learning Algorithms. Springer, Cham. https://doi.org/10.1007/978-3-319-24211-8_9
Ren Y, Hu K, Dai X, Pan L, Hoi SC, Xu Z (2019) Semi-supervised deep embedded clustering. Neurocomputing 325:121–130
Adolfsson A, Ackerman M, Brownstein NC (2019) To cluster, or not to cluster: an analysis of clusterability methods. Pattern Recogn 88:13–26
Van Engelen JE, Hoos HH (2020) A survey on semi-supervised learning. Mach Learn 109(2):373–440
Fu Y, Zhu X, Li B (2013) A survey on instance selection for active learning. Knowl Inf Syst 35(2):249–283
Maggu J, Majumdar A, Chouzenoux E, Chierchia G (2020) Deeply transformed subspace clustering. Signal Process 174:107628
Kumar P, Gupta A (2020) Active learning query strategies for classification, regression, and clustering: a survey. J Comput Sci Technol 35(4):913–945
Sheikhpour R, Sarram MA, Gharaghani S, Chahooki MAZ (2017) A survey on semi-supervised feature selection methods. Pattern Recogn 64:141–158
Mai X, Cheng J, Wang S (2019) Research on semi supervised K-means clustering algorithm in data mining. Clust Comput 22(2):3513–3520
Olsson F (2009) A literature survey of active machine learning in the context of natural language processing. Swedish Institute of Computer Science. https://www.ccs.neu.edu/home/vip/teach/MLcourse/4_boosting/materials/SICS-T--2009-06--SE.pdf
Xiong S, Azimi J, Fern XZ (2013) Active learning of constraints for semi-supervised clustering. IEEE Trans Knowl Data Eng 26(1):43–54
Xiong C, Johnson DM, Corso JJ (2016) Active clustering with model-based uncertainty reduction. IEEE Trans Pattern Anal Mach Intell 39(1):5–17
Basu S, Banerjee A, Mooney RJ (2004) Active semi-supervision for pairwise constrained clustering. In: Proceedings of the 2004 SIAM international conference on data mining: 2004: SIAM, 333–344
Bilenko M, Basu S, Mooney RJ (2004) Integrating constraints and metric learning in semi-supervised clustering. In: Proceedings of the twenty-first international conference on Machine learning: 2004, 11
Zhang H, Zhan T, Basu S, Davidson I (2021) A framework for deep constrained clustering. Data Min Knowl Disc 35(2):593–620
Li X, Yin H, Zhou K, Zhou X (2020) Semi-supervised clustering with deep metric learning and graph embedding. World Wide Web 23(2):781–798
Śmieja M, Struski Ł, Figueiredo MAT (2020) A classification-based approach to semi-supervised clustering with pairwise constraints. Neural Netw 127:193–203
Van Craenendonck T, Blockeel H (2017) Constraint-based clustering selection. Mach Learn 106(9):1497–1521
Settles B (2009) Active learning literature survey. Computer sciences technical report 1648. University of Wisconsin-Madison. https://minds.wisconsin.edu/handle/1793/60660
Li Y, Wang Y, Yu D, Ye N, Hu P, Zhao R (2020) ASCENT: active supervision for semi-supervised learning. IEEE Trans Knowl Data Eng 32(5):868–882
Wang X, Ding S, Jia W (2020) Active constraint spectral clustering based on hessian matrix. Soft Comput 24(3):2381–2390
Bai L, Liang J, Cao F (2021) Semi-supervised clustering with constraints of different types from multiple information sources. IEEE Trans Pattern Anal Mach Intell 43(9):3247–3258
Wang Z, Fang X, Tang X, Wu C (2018) Multi-class active learning by integrating uncertainty and diversity. IEEE Access 6:22794–22803
Yu H, Wang X, Wang G, Zeng X (2018) An active three-way clustering method via low-rank matrices for multi-view data. Inf Sci 507:50–60
Wang K, Zhang D, Li Y, Zhang R, Lin L (2016) Cost-effective active learning for deep image classification. IEEE Trans Circuits Syst Video Technol 27(12):2591–2600
Zhong G, Wang L-N, Ling X, Dong J (2016) An overview on data representation learning: from traditional feature learning to recent deep learning. J Finan Data Sci 2(4):265–278
Ren Y, Zhang G, Yu G, Li X (2012) Local and global structure preserving based feature selection. Neurocomputing 89:147–157
Guo X, Gao L, Liu X, Yin J (2017) Improved deep embedded clustering with local structure preservation. In: Ijcai: 2017, 1753–1759
Ilić V, Tadić J (2021) Active learning using a self-correcting neural network (ALSCN). Appl Intell 52:1956–1968
Guo W, Cai J, Wang S (2020) Unsupervised discriminative feature representation via adversarial auto-encoder. Appl Intell 50(4):1155–1171
Diallo B, Hu J, Li T, Khan GA, Liang X, Zhao Y (2021) Deep embedding clustering based on contractive autoencoder. Neurocomputing 433:96–107
Enguehard J, O’Halloran P, Gholipour A (2019) Semi-supervised learning with deep embedded clustering for image classification and segmentation. IEEE Access 7:11093–11104
Jia X, Jing XY, Zhu X, Chen S, Du B, Cai Z, He Z, Yue D (2021) Semi-supervised multi-view deep discriminant representation learning. IEEE Trans Pattern Anal Mach Intell 43(7):2496–2509
Bengio Y, Courville A, Vincent P (2013) Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell 35(8):1798–1828
Ngoc MT, Park D-C (2018) Centroid neural network with pairwise constraints for semi-supervised learning. Neural Process Lett 48(3):1721–1747
Peng X, Xiao S, Feng J, Yau W-Y, Yi Z (2016) Deep subspace clustering with sparsity prior. In: IJCAI: 2016, 1925–1931
Xie J, Girshick R, Farhadi A (2016) Unsupervised deep embedding for clustering analysis. In: International conference on machine learning: 2016: PMLR, 478–487
Ohi AQ, Mridha MF, Safir FB, Hamid MA, Monowar MM (2020) Autoembedder: a semi-supervised DNN embedding system for clustering. Knowl-Based Syst 204:106190
Wagstaff K, Cardie C, Rogers S, Schrödl S (2001) Constrained k-means clustering with background knowledge. In: Icml: 2001, 577–584
Basu S (2003) Semi-supervised clustering: learning with limited user feedback: computer science department, University of Texas at Austin
Schroff F, Kalenichenko D, Philbin J (2015) Facenet: a unified embedding for face recognition and clustering. In: Proceedings of the IEEE conference on computer vision and pattern recognition: 2015, 815–823
Hsu Y-C, Kira Z (2015) Neural network-based clustering using pairwise constraints. arXiv preprint arXiv:151106321
Ren P, Xiao Y, Chang X, Huang P-Y, Li Z, Gupta BB, Chen X, Wang X (2021) A survey of deep active learning. ACM Comput Surv (CSUR) 54(9):1–40
Greene D, Cunningham P (2007) Constraint selection by committee: An ensemble approach to identifying informative constraints for semi-supervised clustering. In: European Conference on Machine Learning: 2007: Springer, 140–151
Yu Z, Luo P, Liu J, Wong H, You J, Han G, Zhang J (2018) Semi-supervised ensemble clustering based on selected constraint projection. IEEE Trans Knowl Data Eng 30(12):2394–2407
Yang F, Li T, Zhou Q, Xiao H (2017) Cluster ensemble selection with constraints. Neurocomputing 235:59–70
Yu Z, Luo P, You J, Wong H-S, Leung H, Wu S, Zhang J, Han G (2015) Incremental semi-supervised clustering ensemble for high dimensional data clustering. IEEE Trans Knowl Data Eng 28(3):701–714
de Oliveira RM, Chaves AA, Lorena LAN (2017) A comparison of two hybrid methods for constrained clustering problems. Appl Soft Comput 54:256–266
Lei Q, Li T (2020) Semi-supervised selective affinity propagation ensemble clustering with active constraints. IEEE Access 8:46255–46266
Xu X, He P (2016) Improving clustering with constrained communities. Neurocomputing 188:239–252
Mallapragada PK, Jin R, Jain AK (2008) Active query selection for semi-supervised clustering. In: 2008 19Th international conference on pattern recognition: 2008: IEEE, 1–4
Liu X (2017) Joint constrained clustering and feature learning based on deep neural networks. Applied Sciences: School of Computing Science
Fard MM, Thonet T, Gaussier E (2020) Deep k-means: jointly clustering with k-means and learning representations. Pattern Recogn Lett 138:185–192
Acknowledgments
The authors received no financial support for the research and/or authorship of this article.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Hazratgholizadeh, R., Balafar, M.A. & Derakhshi, M.R.F. Active constrained deep embedded clustering with dual source. Appl Intell 53, 5337–5367 (2023). https://doi.org/10.1007/s10489-022-03752-5
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-022-03752-5