哈夫曼树是一种特殊的树,结合前面做书上动态规划题的了解,哈夫曼树就是最优二叉树。
建立一颗哈夫曼树前需要明确条件,比如一颗词典树(节点值为单词),我们希望能通过我们的查找习惯建立一颗更快、更合适的二叉树,那么,这里的条件就是树中每个单词的搜索频率,显然,搜索频率越高的单词越靠近树根,查找效率会更好,通过搜索频率(权值)与节点离根节点的路径距离计算出WPL(带权路径长),当词典树的形态为某种情况的时候(哈夫曼树总是一颗满二叉树 — 除叶节点外,内部节点都是儿孙满堂的),WPL最小,那么这样的一颗二叉树就是最优二叉树,也就是我们想要的树的形态了。
可通过动态规划算法证明,上面描述的二叉树的各个节点是否与最优二叉树的各节点相等。当然书上还有更严谨的算法数学证明。
WPL计算很简单,公式:WPL = ∑ Li × Pi (其中L是路径长度,P是权值)。
建立哈夫曼树很简单:初始化节点数据,维护一个最小优先队列,将节点按权值大小加入到优先队列中,然后将队列中的节点弹出,由下而上建立哈夫曼树。
算法伪python代码:
'''
class node:
int f; //权值
type var; //其他数据类型
node left;
ndoe right;
'''
def build_Huffman_tree(nodes):
"""
nodes是一组node类型的节点
"""
priority_queue<node> que = nodes; //加入到优先队列
while que.size > 1:
left = que.top;
right = que.top;
p = new node; // 请求一个新节点
p.f = left.f + right.f;
que.add = p;
return que.top;
哈夫曼编码是一种变长编码的方式,变长编码一般比定长编码压缩率高,所以这里不考虑定长编码,但定长编码也很简单,自己制定一个编码表,通过查表的方式编码,效率高。解码也是查表即可。
制定哈夫曼编码规则:左路径编码为0,右路径编码为1。这样就可以通过遍历二叉树进行编码了。如图:
图片来自百度图片
解码也很简单,只需要根据制定的规则,再进行树的遍历,然后通过查表即可解码。
完整代码:
#include <iostream>
#include <string.h>
#define MAXSIZE 0xffff
#define QUE_LEFT(i) (2*(i) + 1)
class node {
public:
char var;
size_t freq;
node * left;
node * right;
node() {}
node(char c, size_t f) : var(c), freq(f) {}
node(node * l, node * r) : var(0), freq(l->freq + r->freq), left(l), right(r) {}
virtual ~node() {}
};
class queue : public node{
public:
size_t size_s;
node * priority_queue[MAXSIZE];
queue() : size_s(0) {}
~queue() {
while(!empty())
size_s--;
}
bool empty() const {
if (size_s == 0)
return true;
return false;
}
bool full() const {
if (size_s == MAXSIZE)
return true;
return false;
}
size_t size() const {
return size_s;
}
void insert(node * n);
node * pop();
};
void queue::insert(node * n) {
if (full())
exit(1);
int i = size_s++;
for (; i > 0 && priority_queue[i / 2]->freq >= n->freq; i /= 2)
priority_queue = priority_queue[i / 2];
priority_queue = n;
}
node * queue::pop() {
if (empty())
exit(1);
size_s--;
node * root = priority_queue[0];
int i = 0;
for (int l; QUE_LEFT(i) < (int)size_s; i = l) {
l = QUE_LEFT(i);
if (l + 1 < (int)size_s && priority_queue[l + 1]->freq < priority_queue[l]->freq)
l++;
priority_queue = priority_queue[l];
}
priority_queue = priority_queue[size_s];
return root;
}
class HuffmanTree {
public:
node * build_Huffman_tree(std::string str, int * & freq);
void coding(node * root, char * write, char ** code, int len);
std::string encode(node * root, std::string str, char ** code);
void decode(node * root, std::string codes);
void destory(node * root);
};
node * HuffmanTree::build_Huffman_tree(std::string str, int * & freq) {
queue que;
for (auto v : str)
++freq[(int)v];
for (int i = 0; i < 128 + 1; i++)
if (freq) {
node * n = new node(i, freq);
que.insert(n);
}
while (que.size() > 1) {
node * left = que.pop();
node * right = que.pop();
node * parent = new node(left, right);
que.insert(parent);
}
return que.pop();
}
void HuffmanTree::coding(node * pr, char * write, char ** code, int len) {
static char buf[MAXSIZE >> 1], *out = buf;
if (pr->var) {
write[len] = 0;
strcpy(out, write);
code[(int)pr->var] = out;
out += len + 1;
return;
}
write[len] = '0'; coding(pr->left, write, code, len + 1);
write[len] = '1'; coding(pr->right, write, code, len + 1);
}
std::string HuffmanTree::encode(node * root, std::string str, char ** code) {
char * write = new char;
coding(root, write, code, 0);
delete write;
std::string read = "";
for (auto v : str)
read += code[(int)v];
return read;
}
void HuffmanTree::decode(node * root, std::string codes) {
node * n = root;
int i = 0;
while (codes) {
if (codes[i++] == '0')
n = n->left;
else
n = n->right;
if (n->var) putchar(n->var), n = root;
}
}
void HuffmanTree::destory(node * root) {
if (root) {
destory(root->left);
destory(root->right);
delete root;
}
}
void TravelTree(node * root) {
if (root) {
std::cout << root->freq;
if (root->var)
std::cout << ':' << root->var;
std::cout << std::endl;
TravelTree(root->left);
TravelTree(root->right);
}
}
int main()
{
int freq[128 + 1] = { 0 }, *f = freq;
char *code[MAXSIZE + 1];
node * root;
HuffmanTree tree;
std::string str = "hello world!";
root = tree.build_Huffman_tree(str, f);
str = tree.encode(root, str, code);
for (int i = 0; i < 128 + 1; i++)
if (code) {
std::cout << (char)i << ':' << freq <<
" --- " << code << std::endl;
}
std::cout << "编码结果:" << str << std::endl;
std::cout << "解码:";
tree.decode(root, str);
return 0;
}
运行结果: