文章目录
- 最小生成树
- Kruskal
- Prim
最小生成树
对于一个带权重的连通无向图和权重函数,该权重函数将每条边映射到实数值的权重上。最小生成树(Minimum Spanning Tree,MST)问题是指,找到一个无环子集,能够将所有的结点连接起来,又具有最小的权重。
解决最小生成树问题有两种算法:Kruskal算法和Prim算法。这两种算法都是贪心算法。贪心算法通常在每一步有多个可能的选择,并推荐选择在当前看来最好的选择。这种策略一般并不能保证找到一个全局最优的解决方案。但是,对于最小生成树问题来说,可以证明,Kruskal算法和Prim算法使用的贪心策略确实能够找到一棵权重最小的生成树。
Kruskal
对于一个带权重的连通无向图,Kruskal算法把图中的每一个结点看作一棵树,所以图中的所有结点可以组成一个森林。该算法按照边的权重大小依次进行考虑,如果一条边可以将两棵不同的树连接起来,它就被加入到森林中,从而完成对两棵树的合并。
在Kruskal算法的实现中,使用了一种叫做并查集的数据结构,其作用是用来维护几个不相交的元素集合。在该算法中,每个集合代表当前森林中的一棵树。
对于一个用邻接链表表示的带权重的连通无向图,Kruskal算法的实现如下所示:
def mst_kruskal(graph, weights):
edges = []
for edge, weight in weights.items():
if edge[0] < edge[1]:
edges.append((edge, weight))
edges.sort(key=lambda x: x[1])
parents = {node: node for node in graph} # 并查集,每个结点默认的父结点为自己
def find_parent(node):
if node != parents[node]:
parents[node] = find_parent(parents[node])
return parents[node]
minimum_cost = 0
minimum_spanning_tree = []
for edge in edges:
parent_from_node = find_parent(edge[0][0])
parent_to_node = find_parent(edge[0][1])
if parent_from_node != parent_to_node:
minimum_cost += edge[1]
minimum_spanning_tree.append(edge)
parents[parent_from_node] = parent_to_node
return minimum_spanning_tree, minimum_cost
if __name__ == "__main__":
# 算法导论图23-4
graph = {
"a": ["b", "h"],
"b": ["a", "c", "h"],
"c": ["b", "d", "f", "i"],
"d": ["c", "e", "f"],
"e": ["d", "f"],
"f": ["c", "d", "e", "g"],
"g": ["f", "h", "i"],
"h": ["a", "b", "g", "i"],
"i": ["c", "g", "h"],
}
weights = {
("a", "b"): 4, ("a", "h"): 8,
("b", "a"): 4, ("b", "c"): 8, ("b", "h"): 11,
("c", "b"): 8, ("c", "d"): 7, ("c", "f"): 4, ("c", "i"): 2,
("d", "c"): 7, ("d", "e"): 9, ("d", "f"): 14,
("e", "d"): 9, ("e", "f"): 10,
("f", "c"): 4, ("f", "d"): 14, ("f", "e"): 10, ("f", "g"): 2,
("g", "f"): 2, ("g", "h"): 1, ("g", "i"): 6,
("h", "a"): 8, ("h", "b"): 11, ("h", "g"): 1, ("h", "i"): 7,
("i", "c"): 2, ("i", "g"): 6, ("i", "h"): 7,
}
minimum_spanning_tree, minimum_cost = mst_kruskal(graph, weights)
print(minimum_spanning_tree)
print(minimum_cost)
# [(('g', 'h'), 1), (('c', 'i'), 2), (('f', 'g'), 2), (('a', 'b'), 4), (('c', 'f'), 4), (('c', 'd'), 7), (('a', 'h'), 8), (('d', 'e'), 9)]
# 37
Kruskal算法的运行时间依赖于不相交集合数据结构的实现方式。如果使用不相交集合森林(并查集)实现,Kruskal算法的总运行时间为。
Prim
对于一个带权重的连通无向图,Prim算法从图中任意一个结点开始建立最小生成树,这棵树一直长大到覆盖中的所有结点为止。与Kruskal算法不同,该算法始终保持只有一棵树,每一步选择与当前的树相邻的权重最小的一条边(也就是选择与当前的树最近的一个结点),加入到这棵树中。当算法终止时,所有已选择的边形成一棵最小生成树。本策略也属于贪心策略,因为每一步所加入的边都必须是使树的总权重增加量最小的边。
在Prim算法的实现中,需要使用最小优先队列来快速选择一条新的边,以便加入到已选择的边构成的树中。所以,在算法的执行过程中,对于不在当前的树中的每一个结点,需要记录其和树中结点的所有边中最小边的权重。
对于一个用邻接链表表示的带权重的连通无向图,Prim算法的实现如下所示:
class MinHeap:
def __init__(self, nodes, keys):
"""
:param nodes: 保存结点元素
:param keys: 保存结点的关键值
item_pos: 保存结点元素在堆中的下标
"""
self.heap = nodes
self.size = len(nodes)
self.keys = keys
self.item_pos = {item: i for i, item in enumerate(self.heap)}
self._heapify()
def __len__(self):
return self.size
def _siftup(self, pos):
"""当前元素上筛"""
old_item = self.heap[pos]
while pos > 0:
parent_pos = (pos - 1) >> 1
parent_item = self.heap[parent_pos]
if self.keys[old_item] < self.keys[parent_item]:
self.heap[pos] = parent_item
self.item_pos[parent_item] = pos
pos = parent_pos
else:
break
self.heap[pos] = old_item
self.item_pos[old_item] = pos
def _siftdown(self, pos):
"""当前元素下筛"""
old_item = self.heap[pos]
child_pos = 2 * pos + 1 # left child position
while child_pos < self.size:
child_item = self.heap[child_pos]
right_child_pos = child_pos + 1
right_child_item = self.heap[right_child_pos]
if right_child_pos < self.size and \
self.keys[child_item] > self.keys[right_child_item]:
child_pos = right_child_pos
child_item = self.heap[child_pos]
if self.keys[old_item] > self.keys[child_item]:
self.heap[pos] = child_item
self.item_pos[child_item] = pos
pos = child_pos
child_pos = 2 * pos + 1 # 更新循环判断条件
else:
break
self.heap[pos] = old_item
self.item_pos[old_item] = pos
def _heapify(self):
for i in reversed(range(self.size // 2)):
self._siftdown(i)
def extract_min(self):
old_item = self.heap[0]
self.heap[0] = self.heap[self.size - 1]
self.item_pos[self.heap[0]] = 0
self.heap[self.size - 1] = old_item
self.item_pos[old_item] = self.size - 1
self.size -= 1
self._siftdown(0)
return old_item
def decrease_key(self, item):
pos = self.item_pos[item]
self._siftup(pos)
def exist(self, item):
return self.item_pos[item] < self.size
def mst_prim(graph, weights, start):
keys = {} # 保存每个结点的关键值(与树的最小距离)
predecessors = {} # 保存每个结点在最小生成树中的父结点
for node in graph.keys():
keys[node] = float("INF")
predecessors[node] = None
keys[start] = 0
priority_queue = MinHeap(list(graph.keys()), keys)
minimum_spanning_tree = []
minimum_cost = 0
while len(priority_queue) > 0:
node = priority_queue.extract_min()
minimum_spanning_tree.append((node, predecessors[node]))
edge = (node, predecessors[node])
if edge in weights:
minimum_cost += weights[edge]
for adj_node in graph[node]:
if priority_queue.exist(adj_node) and weights[(node, adj_node)] < keys[adj_node]:
predecessors[adj_node] = node
keys[adj_node] = weights[(node, adj_node)]
priority_queue.decrease_key(adj_node)
return minimum_spanning_tree, minimum_cost
if __name__ == "__main__":
# 算法导论图23-5
graph = {
"a": ["b", "h"],
"b": ["a", "c", "h"],
"c": ["b", "d", "f", "i"],
"d": ["c", "e", "f"],
"e": ["d", "f"],
"f": ["c", "d", "e", "g"],
"g": ["f", "h", "i"],
"h": ["a", "b", "g", "i"],
"i": ["c", "g", "h"],
}
weights = {
("a", "b"): 4, ("a", "h"): 8,
("b", "a"): 4, ("b", "c"): 8, ("b", "h"): 11,
("c", "b"): 8, ("c", "d"): 7, ("c", "f"): 4, ("c", "i"): 2,
("d", "c"): 7, ("d", "e"): 9, ("d", "f"): 14,
("e", "d"): 9, ("e", "f"): 10,
("f", "c"): 4, ("f", "d"): 14, ("f", "e"): 10, ("f", "g"): 2,
("g", "f"): 2, ("g", "h"): 1, ("g", "i"): 6,
("h", "a"): 8, ("h", "b"): 11, ("h", "g"): 1, ("h", "i"): 7,
("i", "c"): 2, ("i", "g"): 6, ("i", "h"): 7,
}
minimum_spanning_tree, minimum_cost = mst_prim(graph, weights, "a")
print(minimum_spanning_tree)
print(minimum_cost)
# [('a', None), ('b', 'a'), ('h', 'a'), ('g', 'h'), ('f', 'g'), ('c', 'f'), ('i', 'c'), ('d', 'c'), ('e', 'd')]
# 37
Prim算法的运行时间取决于最小优先队列的实现方式。如果最小优先队列使用二叉最小优先队列(最小堆),该算法的时间复杂度为。从渐进意义上来说,它与Kruskal算法的运行时间相同。如果使用斐波那契堆来实现最小优先队列,则Prim算法的运行时间将改进到。