哈夫曼树是一种特殊的树,结合前面做书上动态规划题的了解,哈夫曼树就是最优二叉树。

  建立一颗哈夫曼树前需要明确条件,比如一颗词典树(节点值为单词),我们希望能通过我们的查找习惯建立一颗更快、更合适的二叉树,那么,这里的条件就是树中每个单词的搜索频率,显然,搜索频率越高的单词越靠近树根,查找效率会更好,通过搜索频率(权值)与节点离根节点的路径距离计算出WPL(带权路径长),当词典树的形态为某种情况的时候(哈夫曼树总是一颗满二叉树 — 除叶节点外,内部节点都是儿孙满堂的),WPL最小,那么这样的一颗二叉树就是最优二叉树,也就是我们想要的树的形态了。

  可通过动态规划算法证明,上面描述的二叉树的各个节点是否与最优二叉树的各节点相等。当然书上还有更严谨的算法数学证明。

  WPL计算很简单,公式:WPL = ∑ Li × Pi (其中L是路径长度,P是权值)。

  建立哈夫曼树很简单:初始化节点数据,维护一个最小优先队列,将节点按权值大小加入到优先队列中,然后将队列中的节点弹出,由下而上建立哈夫曼树。

  算法伪python代码:

'''
class node:
    int f; //权值
    type var; //其他数据类型
    node left;
    ndoe right;
'''
def build_Huffman_tree(nodes):
    """
    nodes是一组node类型的节点
    """
    priority_queue<node> que = nodes; //加入到优先队列
    while que.size > 1:
        left = que.top;
        right = que.top;
        p = new node; // 请求一个新节点
        p.f = left.f + right.f;
        que.add = p;
    return que.top;

  哈夫曼编码是一种变长编码的方式,变长编码一般比定长编码压缩率高,所以这里不考虑定长编码,但定长编码也很简单,自己制定一个编码表,通过查表的方式编码,效率高。解码也是查表即可。

  制定哈夫曼编码规则:左路径编码为0,右路径编码为1。这样就可以通过遍历二叉树进行编码了。如图:

               图片来自百度图片

  解码也很简单,只需要根据制定的规则,再进行树的遍历,然后通过查表即可解码。

  完整代码:

#include <iostream>
#include <string.h>

#define MAXSIZE 0xffff

#define QUE_LEFT(i) (2*(i) + 1)

class node {
public:
        char var;
        size_t freq;
        node * left;
        node * right;

        node() {}
        node(char c, size_t f) : var(c), freq(f) {}
        node(node * l, node * r) : var(0), freq(l->freq + r->freq), left(l), right(r) {}
        virtual ~node() {}
};

class queue : public node{
public:
    size_t size_s;
    node * priority_queue[MAXSIZE];

    queue() : size_s(0) {}
    ~queue() {
        while(!empty())
            size_s--;
    }
    bool empty() const {
        if (size_s == 0)
            return true;
        return false;
    }
    bool full() const {
        if (size_s == MAXSIZE)
            return true;
        return false;
    }
    size_t size() const {
        return size_s;
    }
    void insert(node * n);
    node * pop();
};

void queue::insert(node * n) {
    if (full())
        exit(1);
        int i = size_s++;
        for (; i > 0 && priority_queue[i / 2]->freq >= n->freq; i /= 2)
                priority_queue = priority_queue[i / 2];
        priority_queue = n;
}

node * queue::pop() {
        if (empty())
                exit(1);
        size_s--;
        node * root = priority_queue[0];
        int i = 0;
        for (int l; QUE_LEFT(i) < (int)size_s; i = l) {
            l = QUE_LEFT(i);
                if (l + 1 < (int)size_s && priority_queue[l + 1]->freq < priority_queue[l]->freq)
                        l++;
        priority_queue = priority_queue[l];
        }
        priority_queue = priority_queue[size_s];

        return root;
}

class HuffmanTree {
public:
        node * build_Huffman_tree(std::string str, int * & freq);
        void coding(node * root, char * write, char ** code, int len);
        std::string encode(node * root, std::string str, char ** code);
        void decode(node * root, std::string codes);
        void destory(node * root);
};

node * HuffmanTree::build_Huffman_tree(std::string str, int * & freq) {
    queue que;
        for (auto v : str)
                ++freq[(int)v];

        for (int i = 0; i < 128 + 1; i++)
                if (freq) {
                        node * n = new node(i, freq);
                        que.insert(n);
                }

        while (que.size() > 1) {
                node * left = que.pop();
                node * right = que.pop();
                node * parent = new node(left, right);
                que.insert(parent);
        }
        return que.pop();
}

void HuffmanTree::coding(node * pr, char * write, char ** code, int len) {
    static char buf[MAXSIZE >> 1], *out = buf;
        if (pr->var) {
        write[len] = 0;
                strcpy(out, write);
                code[(int)pr->var] = out;
                out += len + 1;
                return;
        }
        write[len] = '0'; coding(pr->left, write, code, len + 1);
        write[len] = '1'; coding(pr->right, write, code, len + 1);
}

std::string HuffmanTree::encode(node * root, std::string str, char ** code) {
    char * write = new char;
        coding(root, write, code, 0);
        delete write;

        std::string read = "";
        for (auto v : str)
                read += code[(int)v];
        return read;
}

void HuffmanTree::decode(node * root, std::string codes) {
        node * n = root;
        int i = 0;
        while (codes) {
                if (codes[i++] == '0')
                        n = n->left;
                else
                        n = n->right;

                if (n->var) putchar(n->var), n = root;
        }
}

void HuffmanTree::destory(node * root) {
    if (root) {
        destory(root->left);
        destory(root->right);
        delete root;
    }
}

void TravelTree(node * root) {
    if (root) {
        std::cout << root->freq;
        if (root->var)
            std::cout << ':' << root->var;
        std::cout << std::endl;
        TravelTree(root->left);
        TravelTree(root->right);
    }
}

int main()
{
    int freq[128 + 1] = { 0 }, *f = freq;
    char *code[MAXSIZE + 1];

    node * root;
        HuffmanTree tree;

        std::string str = "hello world!";

        root = tree.build_Huffman_tree(str, f);

        str = tree.encode(root, str, code);
        for (int i = 0; i < 128 + 1; i++)
        if (code) {
            std::cout << (char)i << ':' << freq <<
            " --- " << code << std::endl;
        }

        std::cout << "编码结果:" << str << std::endl;

        std::cout << "解码:";
        tree.decode(root, str);

    return 0;
}

  运行结果: