
计算机科学 ›› 2021, Vol. 48 ›› Issue (11A): 81-87.doi: 10.11896/jsjkx.210300036

康雁, 寇勇奇, 谢思宇, 王飞, 张兰, 吴志伟, 李浩   

  1. 云南大学软件学院 昆明650504
  • 出版日期:2021-11-10 发布日期:2021-11-12
  • 通讯作者: 李浩(lihao707@ynu.edu.cn)
  562530855@qq.com
  • 基金资助:

Deep Clustering Model Based on Fusion Variational Graph Attention Self-encoder

KANG Yan, KOU Yong-qi, XIE Si-yu, WANG Fei, ZHANG Lan, WU Zhi-wei, LI Hao   

  1. School of Software,Yunnan University,Kunming 650504,China
  • Online:2021-11-10 Published:2021-11-12
  • About author:KANG Yan,born in 1972,Ph.D,associate professor.Her main research interests include transfer learning,deep learning and integrated learning.
    LI Hao,born in 1970,Ph.D,professor. His main research interests include distributed computing,grid and cloud computing.
  • Supported by:
    National Natural Science Foundation(61762092),Key Laboratory of Software Engineering(2020SE303) and Major Scientific Research Plan of Yunnan Province(202002AB080001).

摘要: 聚类作为数据挖掘和机器学习中最基本的任务之一,在各种现实世界任务中已得到广泛应用。随着深度学习的发展,深度聚类成为一个研究热点。现有的深度聚类算法主要从节点表征学习或者结构表征学习两个方面入手,较少考虑同时将这两种信息进行融合以完成表征学习。提出一种融合变分图注意自编码器的深度聚类模型FVGTAEDC(Deep Clustering Model Based on Fusion Varitional Graph Attention Self-encoder),此模型通过联合自编码器和变分图注意自编码器进行聚类,模型中自编码器将变分图注意自编码器从网络中学习(低阶和高阶)结构表示进行集成,随后从原始数据中学习特征表示。在两个模块训练的同时,为了适应聚类任务,将自编码器模块融合节点和结构信息的表示特征进行自监督聚类训练。通过综合聚类损失、自编码器重构数据损失、变分图注意自编码器重构邻接矩阵损失、后验概率分布与先验概率分布相对熵损失,该模型可以有效聚合节点的属性和网络的结构,同时优化聚类标签分配和学习适合于聚类的表示特征。综合实验证明,该方法在5个现实数据集上的聚类效果均优于当前先进的深度聚类方法。

关键词: 变分图注意自编码器, 表征学习, 深度聚类, 自编码器, 自监督聚类

Abstract: As one of the most basic tasks in data mining and machine learning,clustering is widely used in various real-world tasks.With the development of deep learning deep clustering has become a research hotspot.Existing deep clustering algorithms are mainly from two aspects of node representation learning or structural representation learning.Less work considers fusing these two kinds of information at the same time to complete representation learning.This paper proposes a deep clustering model FVGTAEDC (Deep Clustering Model Based on Fusion Varitional Graph Attention Self-encoder),this model joints the autoencoderand the variational graph attention autoencoder for clustering.In the model,the autoencoder integrates the variational graph attention autoencoder from the network to learn (low-order and high-order) structural representations,and then the feature representation is learned from the original data.While the two modules are trained,in order to adapt to the clustering task,self-supervised clustering training for the autoencoder module is integrated with the representation features of the node and the structure information.Comprehensive clustering loss,autoencoder reconstruction data loss,and variational graph attention autoencoder reconstruction adjacency matrix loss,the relative entropy loss of the posterior probability distribution and the prior probability distribution.The method can effectively aggregate the attributes of nodes and the structure of the network,while optimizing the assignment of cluster labels and learning the representation features suitable for clustering.Comprehensive experiments prove that the method is better than the current advanced deep clustering method on 5 real data.

Key words: Deep clustering, Representation learning, Self encoder, Self-supervised clustering, Variational graph attention self-encoder


  • TP181
