hanlp词频统计词频统计程序

转载

冷月星 2023-08-10 12:51:41

文章标签 hanlp词频统计 python 文件名打开文件 文章分类 NLP 人工智能

软件工程作业词频统计

第一阶段

要求

输出某个文本文件中26个字母和汉字出现的频率，由高到低排列，并显示出现的百分比，精确到小数点后面两位。

命令行参数是：

wf.exe -c <file name>
字母频率 = 这个字母出现的次数/（所有A-Z，a-z字母、汉字出现的总数）

如果两个token出现的频率一样，那么就按照字典序排列。如果S和T出现频率都是10.21%，那么， S要排在T的前面。

如果要处理一本大部头小说（例如 Gone With The Wind), 你的程序效率如何？有没有什么可以优化的地方？

PSP

PSP2.1	Personal Software Process Stage	预估耗时（分钟）	实际耗时（分钟）
Planning	计划	15
Estimate	估计这个任务需要多少时间	10
Development	开发	180
Analysis	需求分析（包括学习新技术）	90
Design Spec	生成设计文档	30
Design Review	设计复审	15
Coding standard	代码规范	10
Design	具体设计	50
Coding	具体编码	60
Code Review	代码复审	30
Test	测试（自我测试，修改代码，提交测试）	120
Reporting	报告	120
Test Report	测试报告	30
Size Measurement	计算工作量	5
Postmortem & Process Improvement Plan	事后总结，并提出过程改进计划	60
	合计	825

解题思路

1)初步分析题目要求，实现字频统计，需要先对输入文本进行处理，去掉多余的符号、数字等，只保留汉字和字母；由于统计的是单个字母和汉字的频率，故只需要按照每个元素统计即可，无需考虑分词；最后将统计结果排序，格式化输出

2)采用面向对象方法，选择python来实现

需求分析

一.新技术学习:

1）在pycharm中将.py程序打包成.exe

2）学习如何使用pylint进行代码质量分析

3）学习使用profile进行性能分析

4）学习markdown的语法

5）学习使用git管理仓库

二.功能需求：字频统计

三.建模

1）静态模型

类：WordFrequency

类名	WordFrequency
属性	文件名，字表，频数表，总字数（字母、汉字）
方法	字频统计

2）功能模型

用例图

hanlp词频统计词频统计程序_文件名

设计

概要设计

函数模块设计

1）初始化 init

2）字频统计 ele_frequency

3）格式化输出 output

4）主函数main

函数名	功能	参数	返回值
init	初始化，保存文件名	filename（文件名）	-
ele_frequency	打开文件、按行读取处理	无	-
output	按照给定格式输出（两位小数百分数）	无	-
cut_count	切割句子，去掉多于符号并统计字频	line	-
word_sort	按字频从高到低排序	无	-
main	实例化对象，调用各个模块，控制流程	argv	-

详细设计

1）init

接受并保存main传来的参数filename（文件名），以便后续根据文件名查找打开文件

声明字表word_list，频数表 ele_num，总字数sum

class WordFrequency():
    #初始化
    def __init__(self,filename):
        # 字表
        self.word_list = {}
        # 排序后的频数表
        self.ele_num = []
        #文件名
        self.filename=filename
        #总字数
        self.sum=0

2)ele_frequency

打开文件读取内容，按行处理，只保留字母、汉字(由于第一阶段的要求中没有说明文本的编码方式，也没有相关参数，暂时默认为UTF-8编码)

统计每个字母、汉字的出现次数（频数），保存在word_list中,大小写字母作为不同的字母处理。

按照频数从大到小排序，频数相同按字典序排序,保存在ele_num中

#字频统计
    def ele_frequency(self):
        #打开文件，一次读入，按行处理
        with open(self.filename,'r',encoding='utf8') as txt:
            for line in txt.readlines():
                #切割句子并统计字频
                self.cut_count(line)
        #排序
        self.word_sort()

3）output

计算总字数sum

计算每个字的频率：频率=频数/sum

按两位小数百分数（.2%）输出

#格式化输出
 def output(self):
        #计算总字数（字母+汉字）
        self.sum = sum(self.word_list.values())
        #按两位小数百分数输出
        for i in self.ele_num:
            ch, num = i
            #转化成频率
            num=num/self.sum
            print("{:<3}:{:.2%}".format(ch, num))

4）cut_count

将传入的句子line去掉多余的符号，只保留汉字、字母，并按每个字切分，统计字频

#切割并统计字频
def cut_count(self,line):
    #只保留字母、汉字
    line = re.sub("[^a-zA-Z\u4e00-\u9fa5]", '', line)
    #如果把大小写看作同种字母，则需先把大写转换成小写
    # line=line.lower()
    for ch in line:
        self.word_list[ch] = self.word_list.get(ch, 0) + 1

5）word_sort

二级排序，先按频数从高到低排序，频数相同按字典序排序

# 排序
def word_sort(self):    
    #先按频数从高到低排序
    self.ele_num = sorted(self.word_list.items(), key=lambda x: x[0])
    #再按字典序排序
    self.ele_num = sorted(self.ele_num, key=lambda y: y[1], reverse=True)
    self.output()

6）主函数main

接受命令行传来的文件名参数

实例化一个WordFrequency类对象wf

调用字频统计方法

def main():    
    fn="test.txt"
    # 如果命令行参数正确（个数为3）
    if len(sys.argv) == 3:
        fn=sys.argv[2]
    else:
        print("请正确输入要处理的文件名")
    wf = WordFrequency(fn)
    # 调用字频统计方法
    wf.ele_frequency()
    # input("输入任意字符结束：")
    # 测试

测试

1）英文文本：哈利波特1-7全集HarryPotter.txt ，共78451个字，448KB

1.输出结果如下（部分）：

hanlp词频统计词频统计程序_文件名_02

2.将结果以频数输出到文本test_result.txt中，与使用在线工具统计结果做对比：

在线工具统计结果：

hanlp词频统计词频统计程序_文件名_03

测试结果：

hanlp词频统计词频统计程序_hanlp词频统计_04

对比可得，输出结果正确

2）中文文本：人民日报语料库rmrb.txt，共1822596个字，7548KB

1.输出结果如下（部分）：

hanlp词频统计词频统计程序_文件名_05

2.将结果以频数输出到文本test_result.txt中，与使用在线工具统计结果做对比：

在线工具统计结果：

hanlp词频统计词频统计程序_打开文件_06

测试结果：

hanlp词频统计词频统计程序_hanlp词频统计_07

3）单元测试

以下是对cut_count模块的单元测试，共10个测试用例

import unittest
from flask import current_app
from wf import WordFrequency
class MyTestCase(unittest.TestCase):
    # 该方法会首先执行，方法名固定
    def setUp(self):
        self.testwf=WordFrequency()

    def test_something0(self):
        self.testwf.cut_count("")
        for i in self.testwf.word_list:
            print(i, self.testwf.word_list[i])

    def test_something1(self):
        self.testwf.cut_count("我")
        for i in self.testwf.word_list:
            print(i, self.testwf.word_list[i])
    def test_something0(self):
        self.testwf.cut_count("s")
        for i in self.testwf.word_list:
            print(i, self.testwf.word_list[i])

    def test_something2(self):
        self.testwf.cut_count("S")
        for i in self.testwf.word_list:
            print(i, self.testwf.word_list[i])

    def test_something3(self):
        self.testwf.cut_count("，。、 \n  -=")
        for i in self.testwf.word_list:
            print(i,self.testwf.word_list[i])

    def test_something4(self):
            self.testwf.cut_count("我的圣诞,oirte5节妇\n女24324日。、。、121easfsefs")
            for i in self.testwf.word_list:
                print(i,self.testwf.word_list[i])
    def test_something5(self):
        self.testwf.cut_count("123243534132423")
        for i in self.testwf.word_list:
            print(i,self.testwf.word_list[i])
    def test_something6(self):
        self.testwf.cut_count("AWSFDssfdg")
        for i in self.testwf.word_list:
            print(i,self.testwf.word_list[i])
    def test_something7(self):
        self.testwf.cut_count("ss我喜欢哈哈哈哈哈哈哈\n")
        for i in self.testwf.word_list:
            print(i, self.testwf.word_list[i])
    def test_something8(self):
        self.testwf.cut_count("我今天吃了三碗饭——早上吃了一碗，中午吃了一碗，晚上又吃了一碗。")
        for i in self.testwf.word_list:
            print(i,self.testwf.word_list[i])

    def test_something9(self):
        self.testwf.cut_count("She is very cute.")
        for i in self.testwf.word_list:
            print(i, self.testwf.word_list[i])
    # 测试应用实例是否存在
    def test_app_exist(self):
        self.assertFalse(current_app is None)


if __name__ == '__main__':
    unittest.main()

经验证，测试结果都正确（由于篇幅限制测试结果此处省略，详见文档unit_test_result.txt)

代码质量分析

1）使用pylint对wf.py进行分析

wf.py源代码如下：

# This is a sample Python script.

# Press Shift+F10 to execute it or replace it with your code.
# Press Double Shift to search everywhere for classes, files, tool windows, actions, and settings.

import time
import sys
import re
def main():
    filename = sys.argv[2]
    wf = WordFrequency(filename)
    wf.ele_frequency()
class WordFrequency():
    def __init__(self,filename):
        # 字表
        self.word_list = {}
        # 频率表
        self.ele_num = []
        #文件名
        self.filename=filename
        self.sum=0

    #格式化输出
    def output(self):
        self.sum = sum(self.word_list.values())
        print("字频统计结果：\n")
        for i in self.ele_num:
            ch, num = i
            num=num/self.sum

            print("{:}:{:.2%}".format(ch, num))


    #字频统计
    def ele_frequency(self):
        #打开文件，一次读入，按行处理
        with open(self.filename,'r',encoding='utf8') as txt:
         for line in txt.readlines():
             #只保留字母、汉字
             line = re.sub("[^a-zA-Z\u4e00-\u9fa5]", '', line)
            #如果把大小写看作同种字母，则需先把大写转换成小写
            # line=line.lower()
             for ch in line:
                  self.word_list[ch] = self.word_list.get(ch, 0) + 1
        #排序
        self.ele_num = sorted(sorted(self.word_list.items(), key=lambda x: x[0]), key=lambda y: y[1], reverse=True)
        #print("time:",etime-stime)
        self.output()

    #将测试结果输入文本中
    def test_result(self):
        with open("test_result.txt", "w") as f:

                for i in self.ele_num:
                    ch, num = i
                    f.write("{:<3}:{:}\n".format(ch, num))  # 这句话自带文件关闭功能，不需要再写f.close()


if __name__ == '__main__':
    #接受命令行参数
    filename = sys.argv[2]
    #实例化
    wf=WordFrequency(filename)
    #调用字频统计方法
    wf.ele_frequency()
    input("输入任意字符结束：")
    #测试
    #wf.test_result()

分析结果如下：

*********** Module wf
E:\anaconda\envs\python39\week1\word-frequency\wf.py:38: [W0311(bad-indentation), ] Bad indentation. Found 9 spaces, expected 12
E:\anaconda\envs\python39\week1\word-frequency\wf.py:40: [W0311(bad-indentation), ] Bad indentation. Found 13 spaces, expected 16
E:\anaconda\envs\python39\week1\word-frequency\wf.py:43: [W0311(bad-indentation), ] Bad indentation. Found 13 spaces, expected 16
E:\anaconda\envs\python39\week1\word-frequency\wf.py:44: [W0311(bad-indentation), ] Bad indentation. Found 18 spaces, expected 20
E:\anaconda\envs\python39\week1\word-frequency\wf.py:46: [C0301(line-too-long), ] Line too long (115/100)
E:\anaconda\envs\python39\week1\word-frequency\wf.py:54: [W0311(bad-indentation), ] Bad indentation. Found 16 spaces, expected 12
E:\anaconda\envs\python39\week1\word-frequency\wf.py:55: [W0311(bad-indentation), ] Bad indentation. Found 20 spaces, expected 16
E:\anaconda\envs\python39\week1\word-frequency\wf.py:56: [W0311(bad-indentation), ] Bad indentation. Found 20 spaces, expected 16
E:\anaconda\envs\python39\week1\word-frequency\wf.py:71: [C0305(trailing-newlines), ] Trailing newlines
E:\anaconda\envs\python39\week1\word-frequency\wf.py:1: [C0114(missing-module-docstring), ] Missing module docstring
E:\anaconda\envs\python39\week1\word-frequency\wf.py:9: [C0116(missing-function-docstring), main] Missing function or method docstring
E:\anaconda\envs\python39\week1\word-frequency\wf.py:10: [W0621(redefined-outer-name), main] Redefining name 'filename' from outer scope (line 61)
E:\anaconda\envs\python39\week1\word-frequency\wf.py:11: [W0621(redefined-outer-name), main] Redefining name 'wf' from outer scope (line 63)
E:\anaconda\envs\python39\week1\word-frequency\wf.py:11: [C0103(invalid-name), main] Variable name "wf" doesn't conform to snake_case naming style
E:\anaconda\envs\python39\week1\word-frequency\wf.py:13: [C0115(missing-class-docstring), WordFrequency] Missing class docstring
E:\anaconda\envs\python39\week1\word-frequency\wf.py:14: [W0621(redefined-outer-name), WordFrequency.__init__] Redefining name 'filename' from outer scope (line 61)
E:\anaconda\envs\python39\week1\word-frequency\wf.py:24: [C0116(missing-function-docstring), WordFrequency.output] Missing function or method docstring
E:\anaconda\envs\python39\week1\word-frequency\wf.py:28: [C0103(invalid-name), WordFrequency.output] Variable name "ch" doesn't conform to snake_case naming style
E:\anaconda\envs\python39\week1\word-frequency\wf.py:35: [C0116(missing-function-docstring), WordFrequency.ele_frequency] Missing function or method docstring
E:\anaconda\envs\python39\week1\word-frequency\wf.py:43: [C0103(invalid-name), WordFrequency.ele_frequency] Variable name "ch" doesn't conform to snake_case naming style
E:\anaconda\envs\python39\week1\word-frequency\wf.py:51: [C0116(missing-function-docstring), WordFrequency.test_result] Missing function or method docstring
E:\anaconda\envs\python39\week1\word-frequency\wf.py:52: [C0103(invalid-name), WordFrequency.test_result] Variable name "f" doesn't conform to snake_case naming style
E:\anaconda\envs\python39\week1\word-frequency\wf.py:55: [C0103(invalid-name), WordFrequency.test_result] Variable name "ch" doesn't conform to snake_case naming style
E:\anaconda\envs\python39\week1\word-frequency\wf.py:6: [W0611(unused-import), ] Unused import time

------------------------------------------------------------------

Your code has been rated at 3.68/10 (previous run: 3.68/10, +0.00)

由分析结果可得，代码评分只有3.68，存在的主要问题有：

1.规范（C）：不符合代码风格标准，主要是缩进的不规范、缺少模块注释、语句过长的问题

2.警告（W）：函数内部与外部变量名重复

2）修改

1.将排序单独写成一个函数模块word_sort()

2.去掉def main()模块，太过冗余

3.修改重复的变量名

修改后代码如下：

#软工大作业第一阶段wf.py
import sys
import re
#定义类
class WordFrequency():
    def __init__(self,filename):
        # 字表
        self.word_list = {}
        # 频率表
        self.ele_num = []
        #文件名
        self.filename=filename
        self.sum=0

    #格式化输出
    def output(self):
        self.sum = sum(self.word_list.values())
        print("字频统计结果：\n")
        for i in self.ele_num:
            ch, num = i
            num=num/self.sum

            print("{:}:{:.2%}".format(ch, num))


    #字频统计
    def ele_frequency(self):
        #打开文件，一次读入，按行处理
        with open(self.filename,'r',encoding='utf8') as txt:
            for line in txt.readlines():
             #只保留字母、汉字
                line = re.sub("[^a-zA-Z\u4e00-\u9fa5]", '', line)
            #如果把大小写看作同种字母，则需先把大写转换成小写
            # line=line.lower()
                for ch in line:
                    self.word_list[ch] = self.word_list.get(ch, 0) + 1
        self.word_sort()
    # 排序
    def word_sort(self):    #排序
        self.ele_num = sorted(self.word_list.items(), key=lambda x: x[0])
        self.ele_num = sorted(self.ele_num, key=lambda y: y[1], reverse=True)
        self.output()

    #将测试结果输入文本中
    def test_result(self):
        with open("test_result.txt", "w") as f:
            for i in self.ele_num:
                ch, num = i
                f.write("{:<3}:{:}\n".format(ch, num))  # 这句话自带文件关闭功能，不需要再写f.close()


if __name__ == '__main__':
    #接受命令行参数并实例化
    wf=WordFrequency(sys.argv[2])
    #调用字频统计方法
    wf.ele_frequency()
    input("输入任意字符结束：")
    #测试
    #wf.test_result()

再次使用pylint分析：

************* Module wf
E:\anaconda\envs\python39\week1\word-frequency\wf.py:1: [C0114(missing-module-docstring), ] Missing module docstring
E:\anaconda\envs\python39\week1\word-frequency\wf.py:5: [C0115(missing-class-docstring), WordFrequency] Missing class docstring
E:\anaconda\envs\python39\week1\word-frequency\wf.py:16: [C0116(missing-function-docstring), WordFrequency.output] Missing function or method docstring
E:\anaconda\envs\python39\week1\word-frequency\wf.py:20: [C0103(invalid-name), WordFrequency.output] Variable name "ch" doesn't conform to snake_case naming style
E:\anaconda\envs\python39\week1\word-frequency\wf.py:27: [C0116(missing-function-docstring), WordFrequency.ele_frequency] Missing function or method docstring
E:\anaconda\envs\python39\week1\word-frequency\wf.py:35: [C0103(invalid-name), WordFrequency.ele_frequency] Variable name "ch" doesn't conform to snake_case naming style
E:\anaconda\envs\python39\week1\word-frequency\wf.py:38: [C0116(missing-function-docstring), WordFrequency.word_sort] Missing function or method docstring
E:\anaconda\envs\python39\week1\word-frequency\wf.py:44: [C0116(missing-function-docstring), WordFrequency.test_result] Missing function or method docstring
E:\anaconda\envs\python39\week1\word-frequency\wf.py:45: [C0103(invalid-name), WordFrequency.test_result] Variable name "f" doesn't conform to snake_case naming style
E:\anaconda\envs\python39\week1\word-frequency\wf.py:47: [C0103(invalid-name), WordFrequency.test_result] Variable name "ch" doesn't conform to snake_case naming style

------------------------------------------------------------------
Your code has been rated at 7.14/10 (previous run: 7.14/10, +0.00)

由分析结果可知，修改后消除了所有警告，但仍存在代码规范问题，总体评分提高到了7.14

关于性能分析的可视化呈现还在摸索中，会在后续中更新

代码性能分析

1.处理HarryPotter.txt

分析结果(部分）：

Wed Jan 19 10:55:22 2022    tem_result.txt

   347659 function calls (347658 primitive calls) in 0.906 seconds

   Random listing order was used

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        0    0.000             0.000          profile:0(profiler)
        1    0.000    0.000    0.906    0.906 profile:0(wf.ele_frequency())
        1    0.016    0.016    0.906    0.906 :0(exec)
        1    0.000    0.000    0.891    0.891 <string>:1(<module>)
        1    0.453    0.453    0.891    0.891 E:\anaconda\envs\python39\week1\word-frequency\wf.py:27(ele_frequency)
       E:\anaconda\lib\encodings\__init__.py:70(search_function)
   335009    0.375    0.000    0.375    0.000 :0(get)
        1    0.000    0.000    0.000    0.000 E:\anaconda\lib\encodings\__init__.py:43(normalize_encoding)
     3042    0.000    0.000    0.000    0.000 :0(isinstance)
        4    0.000    0.000    0.000    0.000 :0(isalnum)
       31    0.000    0.000    0.000    0.000 :0(append)
        1    0.000    0.000    0.000    0.000 :0(readlines)
       57    0.000    0.000    0.000    0.000 E:\anaconda\lib\codecs.py:319(decode)
       57    0.000    0.000    0.000    0.000 :0(utf_8_decode)
     3033    0.031    0.000    0.062    0.000 E:\anaconda\lib\re.py:203(sub)
     3033    0.016    0.000    0.031    0.000 E:\anaconda\lib\re.py:289(_compile)
        2    0.000    0.000    0.000    0.000 
     E:\anaconda\lib\sre_parse.py:435(_parse_sub)
        2    0.000    0.000    0.000    0.000 E:\anaconda\lib\sre_parse.py:286(tell)
    29/28    0.000    0.000    0.000    0.000 :0(len)     
        1    0.000    0.000    0.000    0.000 E:\anaconda\lib\sre_compile.py:492(_get_charset_prefix)
        2    0.016    0.008    0.016    0.008 E:\anaconda\lib\sre_compile.py:276(_optimize_charset)
       10    0.000    0.000    0.000    0.000 :0(find)
        2    0.000    0.000    0.000    0.000 E:\anaconda\lib\sre_compile.py:411(_mk_bitmap)
        2    0.000    0.000    0.000    0.000 :0(translate)
        2    0.000    0.000    0.000    0.000 E:\anaconda\lib\sre_compile.py:413(<listcomp>)
        2    0.000    0.000    0.000    0.000 
     3033    0.000    0.000    0.000    0.000 :0(sub)
        1    0.000    0.000    0.000    0.000 E:\anaconda\envs\python39\week1\word-frequency\wf.py:39(word_sort)
        2    0.000    0.000    0.000    0.000 :0(sorted)
       52    0.000    0.000    0.000    0.000 E:\anaconda\envs\python39\week1\word-frequency\wf.py:40(<lambda>)
       52    0.000    0.000    0.000    0.000 E:\anaconda\envs\python39\week1\word-frequency\wf.py:41(<lambda>)

此版本执行347659次函数调用花费时间0.906s,在列表中同样还有调用次数，函数的总时间花费，每次调用的时间，函数的累积花费时间和累积时间在原生调用中所占比率。可以看出，主要时间花费在执行函数ele_frequency上，其中读入文本、去除多余字符时调用sub、统计结果排序sort占据了较多时间

2.处理人民日报语料库rmrb.txt

分析结果（部分）：

Wed Jan 19 11:07:25 2022    tem_result.txt

   1686343 function calls (1686342 primitive calls) in 4.219 seconds

   Random listing order was used

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        0    0.000             0.000          profile:0(profiler)
        1    0.000    0.000    4.219    4.219 profile:0(wf.ele_frequency())
        1    0.000    0.000    4.219    4.219 :0(exec)
        1    0.000    0.000    4.219    4.219 <string>:1(<module>)
        1    1.984    1.984    4.219    4.219 E:\anaconda\envs\python39\week1\word-frequency\wf.py:27(ele_frequency)
E:\anaconda\lib\encodings\__init__.py:70(search_function)
  1589737    1.594    0.000    1.594    0.000 :0(get)
        1    0.000    0.000    0.000    0.000       E:\anaconda\lib\encodings\__init__.py:43(normalize_encoding)
    19065    0.016    0.000    0.016    0.000 :0(isinstance)
        4    0.000    0.000    0.000    0.000 :0(isalnum)
       31    0.000    0.000    0.000    0.000 :0(append)
E:\anaconda\lib\encodings\utf_8.py:33(getregentry)
      945    0.000    0.000    0.016    0.000 E:\anaconda\lib\codecs.py:319(decode)
      945    0.016    0.000    0.016    0.000 :0(utf_8_decode)
    19056    0.109    0.000    0.531    0.000 E:\anaconda\lib\re.py:203(sub)
    19056    0.078    0.000    0.094    0.000 E:\anaconda\lib\re.py:289(_compile)
        2    0.000    0.000    0.000    0.000 
E:\anaconda\lib\sre_parse.py:435(_parse_sub)
    29/28    0.000    0.000    0.000    0.000 :0(len)   
E:\anaconda\lib\sre_compile.py:249(_compile_charset)
     19056    0.328    0.000    0.328    0.000 :0(sub)
     4574    0.000    0.000    0.000    0.000 E:\anaconda\envs\python39\week1\word-frequency\wf.py:40(<lambda>)
     4574    0.000    0.000    0.000    0.000 E:\anaconda\envs\python39\week1\word-frequency\wf.py:41(<lambda>)
     4575    0.016    0.000    0.016    0.000 :0(print)
     4574    0.000    0.000    0.000    0.000 :0(format)

执行1686343次函数调用花费时间 4.219s，与处理英文文本时一样，时间花费主要在读取文件、去除多余字符时调用sub、统计结果排序sort

PSP

PSP2.1	Personal Software Process Stage	预估耗时（分钟）	实际耗时（分钟）
Planning	计划	15	20
Estimate	估计这个任务需要多少时间	10	10
Development	开发	180	180
Analysis	需求分析（包括学习新技术）	90	100
Design Spec	生成设计文档	30	20
Design Review	设计复审	15	30
Coding standard	代码规范	10	20
Design	具体设计	50	30
Coding	具体编码	60	60
Code Review	代码复审	30	45
Test	测试（自我测试，修改代码，提交测试）	120	70
Reporting	报告	120	100
Test Report	测试报告	30	50
Size Measurement	计算工作量	5	10
Postmortem & Process Improvement Plan	事后总结，并提出过程改进计划	60	30
	合计	825	775