计算机科学 ›› 2021, Vol. 48 ›› Issue (11A): 81-87.doi: 10.11896/jsjkx.210300036
康雁, 寇勇奇, 谢思宇, 王飞, 张兰, 吴志伟, 李浩
KANG Yan, KOU Yong-qi, XIE Si-yu, WANG Fei, ZHANG Lan, WU Zhi-wei, LI Hao
摘要: 聚类作为数据挖掘和机器学习中最基本的任务之一,在各种现实世界任务中已得到广泛应用。随着深度学习的发展,深度聚类成为一个研究热点。现有的深度聚类算法主要从节点表征学习或者结构表征学习两个方面入手,较少考虑同时将这两种信息进行融合以完成表征学习。提出一种融合变分图注意自编码器的深度聚类模型FVGTAEDC(Deep Clustering Model Based on Fusion Varitional Graph Attention Self-encoder),此模型通过联合自编码器和变分图注意自编码器进行聚类,模型中自编码器将变分图注意自编码器从网络中学习(低阶和高阶)结构表示进行集成,随后从原始数据中学习特征表示。在两个模块训练的同时,为了适应聚类任务,将自编码器模块融合节点和结构信息的表示特征进行自监督聚类训练。通过综合聚类损失、自编码器重构数据损失、变分图注意自编码器重构邻接矩阵损失、后验概率分布与先验概率分布相对熵损失,该模型可以有效聚合节点的属性和网络的结构,同时优化聚类标签分配和学习适合于聚类的表示特征。综合实验证明,该方法在5个现实数据集上的聚类效果均优于当前先进的深度聚类方法。
中图分类号:
[1]YANG B,LIU D Y,LIU J M,et al.Complex network clustering method[J].Journal of Software,2009,20(1):54-66. [2]AHARTIGAN J,WONG M A.Algorithm AS 136:A k-means clustering algorithm.Journal of the Royal Statistical Society[J].Series C (Applied Statistics),1979,28(1):100-108. [3]CHANG J L,WANG L F,MENG G F,et al.Deep AdaptiveImage Clustering[C]//IEEE International Conference on Computer Vision.2017:5880-5888. [4]CAGGARWAL C,ZHAI C X.A survey of text clustering algorithms[C]//Mining Text Data.Springer.2012:77-128. [5]SHANG J W,WANG C K,XIN X,et al.Community discovery algorithm based on deep sparse autoencoder[J].Journal of Software,2017,28(3):648-662. [6]ARTHUR D,ASSILVITSKII S V.k-means++:The advantages of careful seeding[C]//SODA.2007:1027-1035. [7]ESTER M,KRIEGEL H P,SANDER J,et al.A density-based algorithm for discovering clusters in large spatial databases with noise[C]//KDD.1996:226-231. [8]POUYANFAR S,SADIQ S,YAN Y L,et al.A Survey on Deep Learning:Algorithms,Techniques,and Applications[C]//ACM Computing Surveys.2019:1-36. [9]TIAN F,GAO B,CUI Q,et al.Learning Deep Representations for Graph Clustering[C]//AAAI.2014:1293-1299. [10]XIE J Y,GIRSHICK R,FARHADI A.Unsupervised deep embedding for clustering analysis[C]//ICML.2016:478-487. [11]GUO X F,GAO L,LIU X W,et al.Improved deep embeddedclustering with local structure preservation[C]//IJCAI.2017:1753-1759. [12]PENG X,XIAO S J,FENG J S,et al.Deep subspace clustering with sparsity prior[C]//IJCAI.2016:101-115. [13]JIANG Z X,ZHENG Y,TAN H C,et al.Variational deep embedding:An unsupervised and generative approach to clustering[C]//IJCAI.2017:4305-4324. [14]NKIPF T,WELLING M.Semi-supervised classification withgraph convolutional networks[C]//ICLR.2017:1-14. [15]NKIPF T,WELLING M.Variational graph auto-encoders[J].NIPS,2016,21(11):1-3. [16]WANG C,PAN S R,HU R Q,et al.Attributed Graph Clustering:A Deep Attentional Embedding Approach[C]//IJCAI.Marina del Rey CA USA:Association for the Advancement of Artificial Intelligence (AAAI),2019:3670-3676. [17]LI X L,ZHANG H Y,ZHANG R.Embedding Graph Auto-Encoder with Joint Clustering via Adjacency Sharing[C]//WWW.2020:1-11. [18]WANG C,PAN S R,LONG G D,et al.MGAE:MarginalizedGraph Autoencoder for Graph Clustering[C]//ACM on Conference on Information and Knowledge Management.2017:889-898. [19]ZHANG X T,LIU H,LI Q M,et al.Attributed Graph Clustering via Adaptive Graph Convolution[C]//IJCAI.2019:4327-4333. [20]BO D Y,WANG X,SHI C,et al.Structural Deep Clustering Network[C]//WWW.2020:1-11. [21]SUN J G,LIU J,ZHAO L Y.Research on clustering algorithm[J].Journal of Software,2008,19(1):48-61. [22]JAIN A K,DUBES R C.Algorithms for clustering data[J].Technometrics,1988,32(2):227-229. [23]REYNOLDS D A.Gaussian mixture models[C]//Encyclopedia of Biometrics.2015:1-23. [24]JOHNSON S C.Hierarchical clustering schemes[J].Psy- chometrika,1967,32(3):241-254. [25]NG A Y,JORDAN M I,WEISS Y.On spectral clustering:Analy-sis and an algorithm[C]//Advances in Neural Information Processing Systems.2002:849-856. [26]HINTON G E,SALAKHUTDINOV R R.Reducing the dimensional ity of data with neural networks[J].Science,2006,7(28):504-507. [27]HINTON G E.Learning multiple layers of representation[J].Science,2007,7(4):428-434. [28]RAMACHANDRAN P,ZOPH B,LE Q V.Searching For Activation Functions[C]//ICLR.2018:1-13. [29]CASANOVA A,ROMERO A,LIO P,et al.Graph AttentionNetworks[C]//IJCAI.2018:1-12. [30]CHEPURI S P,LEUS G.Subsampling For Graph Power Spectrum Estimation[C]//IEEE SAM.2016:1250-1263. [31]VAN DER MAATEN L,HINTON G.Visualizing data using t-sne[J].Journal of Machine Learning Research,2008,9(Nov):2579-2605. [32]DENKER J,GARDNER W R,GRAF H,et al.Neural Network Recognizer for Hand-Written Zip Code Digits[C]//NIPS.1988:323-331. [33]STISEN A,BLUNCK H,BHATTACHARYA S,et al.Smart devices are different:Assessing and mitigatingmobile sensing heterogeneities for activity recognition[C]//SenSys.ACM,2015:127-140. |
[1] | 王冠宇, 钟婷, 冯宇, 周帆. 基于矢量量化编码的协同过滤推荐方法 Collaborative Filtering Recommendation Method Based on Vector Quantization Coding 计算机科学, 2022, 49(9): 48-54. https://doi.org/10.11896/jsjkx.210700109 |
[2] | 黄丽, 朱焱, 李春平. 基于异构网络表征学习的作者学术行为预测 Author’s Academic Behavior Prediction Based on Heterogeneous Network Representation Learning 计算机科学, 2022, 49(9): 76-82. https://doi.org/10.11896/jsjkx.210900078 |
[3] | 李宗民, 张玉鹏, 刘玉杰, 李华. 基于可变形图卷积的点云表征学习 Deformable Graph Convolutional Networks Based Point Cloud Representation Learning 计算机科学, 2022, 49(8): 273-278. https://doi.org/10.11896/jsjkx.210900023 |
[4] | 杜航原, 李铎, 王文剑. 一种面向电商网络的异常用户检测方法 Method for Abnormal Users Detection Oriented to E-commerce Network 计算机科学, 2022, 49(7): 170-178. https://doi.org/10.11896/jsjkx.210600092 |
[5] | 胡艳羽, 赵龙, 董祥军. 一种用于癌症分类的两阶段深度特征选择提取算法 Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification 计算机科学, 2022, 49(7): 73-78. https://doi.org/10.11896/jsjkx.210500092 |
[6] | 郁舒昊, 周辉, 叶春杨, 王太正. SDFA:基于多特征融合的船舶轨迹聚类方法研究 SDFA:Study on Ship Trajectory Clustering Method Based on Multi-feature Fusion 计算机科学, 2022, 49(6A): 256-260. https://doi.org/10.11896/jsjkx.211100253 |
[7] | 彭云聪, 秦小林, 张力戈, 顾勇翔. 面向图像分类的小样本学习算法综述 Survey on Few-shot Learning Algorithms for Image Classification 计算机科学, 2022, 49(5): 1-9. https://doi.org/10.11896/jsjkx.210500128 |
[8] | 韩洁, 陈俊芬, 李艳, 湛泽聪. 基于自注意力的自监督深度聚类算法 Self-supervised Deep Clustering Algorithm Based on Self-attention 计算机科学, 2022, 49(3): 134-143. https://doi.org/10.11896/jsjkx.210100001 |
[9] | 武玉坤, 李伟, 倪敏雅, 许志骋. 单类支持向量机融合深度自编码器的异常检测模型 Anomaly Detection Model Based on One-class Support Vector Machine Fused Deep Auto-encoder 计算机科学, 2022, 49(3): 144-151. https://doi.org/10.11896/jsjkx.210100142 |
[10] | 唐雨潇, 王斌君. 基于深度生成模型的人脸编辑研究进展 Research Progress of Face Editing Based on Deep Generative Model 计算机科学, 2022, 49(2): 51-61. https://doi.org/10.11896/jsjkx.210400108 |
[11] | 张师鹏, 李永忠. 基于降噪自编码器和三支决策的入侵检测方法 Intrusion Detection Method Based on Denoising Autoencoder and Three-way Decisions 计算机科学, 2021, 48(9): 345-351. https://doi.org/10.11896/jsjkx.200500059 |
[12] | 杨蕾, 降爱莲, 强彦. 基于自编码器和流形正则的结构保持无监督特征选择 Structure Preserving Unsupervised Feature Selection Based on Autoencoder and Manifold Regularization 计算机科学, 2021, 48(8): 53-59. https://doi.org/10.11896/jsjkx.200700211 |
[13] | 张仁杰, 陈伟, 杭梦鑫, 吴礼发. 基于变分自编码器的不平衡样本异常流量检测 Detection of Abnormal Flow of Imbalanced Samples Based on Variational Autoencoder 计算机科学, 2021, 48(7): 62-69. https://doi.org/10.11896/jsjkx.200600022 |
[14] | 胡潇炜, 陈羽中. 一种结合自编码器与强化学习的查询推荐方法 Query Suggestion Method Based on Autoencoder and Reinforcement Learning 计算机科学, 2021, 48(6A): 206-212. https://doi.org/10.11896/jsjkx.200900196 |
[15] | 叶洪良, 朱皖宁, 洪蕾. 基于CQT和梅尔频谱的带有人声的音乐风格转换方法 Music Style Transfer Method with Human Voice Based on CQT and Mel-spectrum 计算机科学, 2021, 48(6A): 326-330. https://doi.org/10.11896/jsjkx.200900104 |
|