【论文解读】Finding community structure in networks using the eigenvectors of matrices

原创

wx62cea850b9e28 2022-08-25 11:12:53 ©著作权

文章标签 算法矩阵论邻接矩阵结点权重 文章分类 代码人生

©著作权归作者所有：来自51CTO博客作者wx62cea850b9e28的原创作品，请联系作者获取转载授权，否则将追究法律责任

文章目录

一、论文主体
二、复现代码

2.1 算法大致流程
2.2 Python代码实现

三、匈牙利算法
Reference

一、论文主体

论文题目：《Finding community structure in networks using the eigenvectors of matrices》
Leading eigenvector
这是Newman（一个学物理的大佬） (2006)提出的一种自上而下的分层社区发现算法。该算法的核心是定义了一个模块度矩阵（modularity matrix）。最大化模块度的过程可以体现在模块度矩阵的特征值分解中，模块度矩阵在社区发现中的作用类似于由图拉普拉斯算子在图划分中发挥的作用。

（1）首先计算模块度矩阵的最大正特征值所对应的特征向量，
（2）然后根据特征向量中元素的正负符号将节点分成两个社区。
如果特征向量中的所有元素都具有相同的符号，则说明该网络没有底层的社区结构。该算法的复杂度为O((m+n)n)。

基础知识回顾：
定义一（图的邻接矩阵）：

给定一个图 $【论文解读】Finding community structure in networks using the eigenvectors of matrices_矩阵论$ ，其对应的邻接矩阵被记为 $【论文解读】Finding community structure in networks using the eigenvectors of matrices_权重_02$ 。 $【论文解读】Finding community structure in networks using the eigenvectors of matrices_权重_03$ 表示存在从结点 $【论文解读】Finding community structure in networks using the eigenvectors of matrices_结点_04$ 到 $【论文解读】Finding community structure in networks using the eigenvectors of matrices_矩阵论_05$ 的边，反之表示不存在从结点 $【论文解读】Finding community structure in networks using the eigenvectors of matrices_结点_04$ 到 $【论文解读】Finding community structure in networks using the eigenvectors of matrices_矩阵论_05$ 的边。
在无向图中，从结点 $【论文解读】Finding community structure in networks using the eigenvectors of matrices_结点_04$ 到 $【论文解读】Finding community structure in networks using the eigenvectors of matrices_矩阵论_05$ 的边存在，意味着从结点 $【论文解读】Finding community structure in networks using the eigenvectors of matrices_矩阵论_05$ 到 $【论文解读】Finding community structure in networks using the eigenvectors of matrices_结点_04$ 的边也存在。因而无向图的邻接矩阵是对称的。
在无权图中，各条边的权重被认为是等价的，即认为各条边的权重为 $【论文解读】Finding community structure in networks using the eigenvectors of matrices_算法_12$ 。
对于有权图，其对应的邻接矩阵通常被记为 $【论文解读】Finding community structure in networks using the eigenvectors of matrices_算法_13$ ，其中 $【论文解读】Finding community structure in networks using the eigenvectors of matrices_结点_14$ 表示从结点 $【论文解读】Finding community structure in networks using the eigenvectors of matrices_结点_04$ 到 $【论文解读】Finding community structure in networks using the eigenvectors of matrices_矩阵论_05$ 的边的权重。若边不存在时，边的权重为 $【论文解读】Finding community structure in networks using the eigenvectors of matrices_矩阵论_17$ 。
一个无向无权图的例子：

【论文解读】Finding community structure in networks using the eigenvectors of matrices_算法_18

其邻接矩阵为：
$【论文解读】Finding community structure in networks using the eigenvectors of matrices_邻接矩阵_19$

定义二（拉普拉斯矩阵，Laplacian Matrix）：

给定一个图 $【论文解读】Finding community structure in networks using the eigenvectors of matrices_权重_20$ ，其邻接矩阵为 $【论文解读】Finding community structure in networks using the eigenvectors of matrices_结点_21$ ，其拉普拉斯矩阵定义为 $【论文解读】Finding community structure in networks using the eigenvectors of matrices_结点_22$ ，其中度矩阵 $【论文解读】Finding community structure in networks using the eigenvectors of matrices_矩阵论_23$ 。

定义三（对称归一化的拉普拉斯矩阵，Symmetric normalized Laplacian）：

给定一个图 $【论文解读】Finding community structure in networks using the eigenvectors of matrices_权重_20$ ，其邻接矩阵为 $【论文解读】Finding community structure in networks using the eigenvectors of matrices_结点_21$ ，其规范化的拉普拉斯矩阵定义为

$【论文解读】Finding community structure in networks using the eigenvectors of matrices_权重_26$

二、复现代码

举个小栗子，给出如下的矩阵：

【论文解读】Finding community structure in networks using the eigenvectors of matrices_结点_27

2.1 算法大致流程

（0）算法开头先输入对应的邻接矩阵：

Matrix = [[0,1,1,0,0],
          [1,0,1,0,0],
          [1,1,0,0,0],
          [0,0,0,0,1],
          [0,0,0,1,0]]

（1）通过一个for循环求第i个结点对应的 $【论文解读】Finding community structure in networks using the eigenvectors of matrices_权重_28$ 值：
（2）再求出P矩阵：
$【论文解读】Finding community structure in networks using the eigenvectors of matrices_结点_29$ （3） $【论文解读】Finding community structure in networks using the eigenvectors of matrices_算法_30$
（4）求出B矩阵的特征值： $【论文解读】Finding community structure in networks using the eigenvectors of matrices_算法_31$
（5）对B矩阵对应的特征值进行大到小排序，找到最大的特征值 $【论文解读】Finding community structure in networks using the eigenvectors of matrices_矩阵论_32$
（6）求出（5）的 $【论文解读】Finding community structure in networks using the eigenvectors of matrices_矩阵论_32$ 对应的特征向量
（7）利用匈牙利算法将大于0和小于0的数进行处理（好像不用匈牙利也行）

2.2 Python代码实现

# -*- coding: utf-8 -*-
"""
Created on Mon Sep 20 09:52:03 2021

@author: 86493
"""
import numpy as np
# 0.计算邻接矩阵
Matrix = [[0,1,1,0,0],
          [1,0,1,0,0],
          [1,1,0,0,0],
          [0,0,0,0,1],
          [0,0,0,1,0]]
VertexNum = len(Matrix[0])
A = np.array(Matrix)
# 1.计算第i个结点对应的度数ki
K = [0] * 100 
for i in range(VertexNum):
    # print(i)
    K[i] = A[i].sum()
    print("第{0}个结点的度数为：".format(i),K[i]) 
    # 打印每行的累加结果,尽量用format(比%好一丢丢)
TwoM = sum(K) # 计算所有结点的度数之和，即2m
print(TwoM) 
# 2.计算P矩阵,记得先初始化
PMatrix = [[0 for i in range(VertexNum)] for i in range(VertexNum)] 
for i in range(0, VertexNum):
    for j in range(0, VertexNum):
        PMatrix[i][j] = K[i] * K[j] / TwoM 
# 3.B=A-P
B = A - PMatrix
# 4.计算B矩阵的特征值
EigValue, EigVector = np.linalg.eig(B)
# 5.对所有特征值进行排序
np.set_printoptions(suppress=True)
print("排序前的特征值为：\n", EigValue)
EigValue1 = EigValue.copy()
EigValue.sort()
# SortEigValue = EigValue.sort()
np.set_printoptions(suppress=True)
print("排序后的特征值为：\n", EigValue)
# 6.求得对应的特征向量
EndEigValue = EigValue[VertexNum - 1]
# 找到最大特征值所在的位置,并求对应的特征向量
for i in range(VertexNum):
    if(EigValue1[i] == EndEigValue):
        EndEigVector = EigVector[:, i]
        break
print("最大的特征值为：\n", EndEigValue)
np.set_printoptions(suppress=True)
print("对应的特征向量为：\n", EndEigVector)
# 7.刚才特征向量的元素大于0归为1，小于0归为-1
lst1 = []
lst2 = []
for i in range(VertexNum):
    if(EndEigVector[i] > 0):
        EndEigVector[i] = 1
        lst1.append(i)
    elif(EndEigVector[i] < 0):
        EndEigVector[i] = -1
        lst2.append(i)
print("处理后的特征向量为：\n", EndEigVector)
print("第一个社区的元素：",lst1)
print("第二个社区的元素：",lst2)

用我们上面的栗子，可以得到如下的社区分解的结果如下，两个社区也是符合事实的。注意在打印数值时默认是科学记数法的，可以使用np.set_printoptions(suppress=True)去掉。
具体方法：控制文件中小数位数
np.set_printoptions(precision=3, suppress=True) 参数：
precision: 保留几位小数，后面不会补0
supress: 对很大/小的数不使用科学计数法 (true)
formatter: 强制格式化，后面会补0

第0个结点的度数为： 2
第1个结点的度数为： 2
第2个结点的度数为： 2
第3个结点的度数为： 1
第4个结点的度数为： 1
8
排序前的特征值为：
 [ 1.25 -1.   -0.   -1.   -1.  ]
排序后的特征值为：
 [-1.   -1.   -1.   -0.    1.25]
最大的特征值为：
 1.25
对应的特征向量为：
 [-0.36514837 -0.36514837 -0.36514837  0.54772256  0.54772256]
处理后的特征向量为：
 [-1. -1. -1.  1.  1.]
第一个社区的元素： [3, 4]
第二个社区的元素： [0, 1, 2]

三、匈牙利算法

# -*- coding: utf-8 -*-
"""
Created on Mon Sep 20 16:02:41 2021

@author: 86493
"""
import numpy as np
from scipy.optimize import linear_sum_assignment
cost =np.array([[4,1,3],[2,0,5],[3,2,2]])
print(cost,'\n')
row_ind,col_ind=linear_sum_assignment(cost)
print(row_ind)
#开销矩阵对应的行索引
print(col_ind)
#对应行索引的最优指派的列索引
print(cost[row_ind,col_ind])
#提取每个行索引的最优指派列索引所在的元素，形成数组
print(cost[row_ind,col_ind].sum())
#数组求和　　输出：[0 1 2][1 0 2] [1 2 2] 5

从下面的初始矩阵（花费矩阵or利益矩阵），根据上面代码结果的[1 2 2]，即选第一行的1，第二行的2，第三行的2。

[[4,1,3],
[2,0,5],
[3,2,2]]

Reference

（1）网络库：https://networkrepository.com/ca_netscience.php （2）关于社区发现的论文综述：《Community Detection in Networks: A Multidisciplinary Review》Muhammad Aqib Javeda,∗, Muhammad Shahzad Younisa, Siddique Latifa,b, Junaid Qadirb,Adeel Baigc
（3）《Finding community structure in networks using the eigenvectors of matrices》M. E. J. Newman
（4）http://matei.org/ithink/2009/09/12/finding-community-structure-in-networks-using-the-eigenvectors-of-matrices/ （5）社区发现算法的综述推文介绍：https://www.sohu.com/a/477299731_455817
（6）敲黑板！！github上很棒的基于深度学习的社区发现的论文汇总：https://github.com/fanzhenliu/awesome-deep-community-detection，也有用到近几年很火的GCN、GAT算法
（7）社区发现算法-百度百科介绍（8）匈牙利算法流程以及Python程序实现（9）基于匈牙利算法的任务分配问题的python实现