目录
- 一、题目
- 二、解题思路
- 三、代码实现
- 四、输出结果
- 五、问题及解决方法
- 1、pycharm 安装模块包失败
- 2、中文乱码
- 3、各曲线显示于同一坐标图上
- 六、评价
一、题目
- 根据以下表格画出 ROC 曲线
待测样本 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
样本标记1 | + | + | + | - | + | - | - | - | - |
样本标记2 | - | + | - | - | + | - | + | + | - |
二、解题思路
- 输入一个真实结果
- 划分阈值:将前 i 个划分为正例,后 n-i 个划分为负例,作为预测结果
- 计算在每个划分下的TP,TN,FN,FP 及 TPR,FPR
- 画 ROC 曲线图:以在不同划分下得出的 TPR 为 x 轴,FPR 为 y 轴画图
- 计算 AUC :计算 ROC 曲线的面积
三、代码实现
# coding = utf - 8
import matplotlib.pyplot as plt
import matplotlib as mpl
import numpy
class TestData:
def __init__(self, x):
self.x = x
self.TPR = numpy.zeros(len(self.x) - 1)
self.FPR = numpy.zeros(len(self.x) - 1)
def roc(self):
y = numpy.zeros(len(self.x)) # 预测情况
for i in range(len(self.x) - 1):
tp = numpy.zeros(len(self.x) - 1)
fp = numpy.zeros(len(self.x) - 1)
tn = numpy.zeros(len(self.x) - 1)
fn = numpy.zeros(len(self.x) - 1)
for j in range(len(self.x)):
if j <= i: # 划分阈值
y[j] = 1
else:
y[j] = 0
print('x =', self.x)
print('y =', y)
for k in range(len(self.x)):
if self.x[k] == 1:
if y[k] == 1:
tp[i] += 1
else:
fn[i] += 1
else:
if y[k] == 1:
fp[i] += 1
else:
tn[i] += 1
self.TPR[i] = tp[i] / (tp[i] + fn[i]) # 计算 TPR
self.FPR[i] = fp[i] / (tn[i] + fp[i]) # 计算 FPR
print('TP =', tp[i], ' FN =', fn[i], ' TPR =', self.TPR[i])
print('FP =', fp[i], ' TN =', tn[i], ' FPR =', self.FPR[i])
# 计算 AUC
print('TPR =', self.TPR)
print('FPR =', self.FPR)
auc = 0
for k in range(len(self.x)-2):
auc = auc + (self.TPR[k+1] - self.TPR[k]) * (self.FPR[k] + self.FPR[k+1])
auc = 0.5 * auc
print('AUC =', auc, '\n')
if __name__ == "__main__":
x1 = [1, 1, 1, 0, 1, 0, 0, 0, 0] # 真实结果1
x2 = [0, 1, 0, 0, 1, 0, 1, 1, 0] # 真实结果2
xx1 = TestData(x1)
xx1.roc()
xx2 = TestData(x2)
xx2.roc()
# 画 ROC 曲线
plt.figure(1)
plt.xlabel('TPR')
plt.ylabel('FPR')
plt.xlim(0, 1)
plt.ylim(0, 1)
choose_font = mpl.font_manager.FontProperties(fname='C:\Windows\Fonts\simfang.ttf') # 为显示中文,指定默认字体
plt.title(u'ROC曲线', fontproperties=choose_font)
plt.plot(xx1.TPR, xx1.FPR, linestyle='-', marker='o', color='r', label='x1')
plt.plot(xx2.TPR, xx2.FPR, linestyle='--', marker='^', color='b', label='x2')
plt.legend(loc='best') # 使其在同一一个坐标图上显示
plt.show()
四、输出结果
x = [1, 1, 1, 0, 1, 0, 0, 0, 0]
y = [1. 0. 0. 0. 0. 0. 0. 0. 0.]
TP = 1.0 FN = 3.0 TPR = 0.25
FP = 0.0 TN = 5.0 FPR = 0.0
x = [1, 1, 1, 0, 1, 0, 0, 0, 0]
y = [1. 1. 0. 0. 0. 0. 0. 0. 0.]
TP = 2.0 FN = 2.0 TPR = 0.5
FP = 0.0 TN = 5.0 FPR = 0.0
x = [1, 1, 1, 0, 1, 0, 0, 0, 0]
y = [1. 1. 1. 0. 0. 0. 0. 0. 0.]
TP = 3.0 FN = 1.0 TPR = 0.75
FP = 0.0 TN = 5.0 FPR = 0.0
x = [1, 1, 1, 0, 1, 0, 0, 0, 0]
y = [1. 1. 1. 1. 0. 0. 0. 0. 0.]
TP = 3.0 FN = 1.0 TPR = 0.75
FP = 1.0 TN = 4.0 FPR = 0.2
x = [1, 1, 1, 0, 1, 0, 0, 0, 0]
y = [1. 1. 1. 1. 1. 0. 0. 0. 0.]
TP = 4.0 FN = 0.0 TPR = 1.0
FP = 1.0 TN = 4.0 FPR = 0.2
x = [1, 1, 1, 0, 1, 0, 0, 0, 0]
y = [1. 1. 1. 1. 1. 1. 0. 0. 0.]
TP = 4.0 FN = 0.0 TPR = 1.0
FP = 2.0 TN = 3.0 FPR = 0.4
x = [1, 1, 1, 0, 1, 0, 0, 0, 0]
y = [1. 1. 1. 1. 1. 1. 1. 0. 0.]
TP = 4.0 FN = 0.0 TPR = 1.0
FP = 3.0 TN = 2.0 FPR = 0.6
x = [1, 1, 1, 0, 1, 0, 0, 0, 0]
y = [1. 1. 1. 1. 1. 1. 1. 1. 0.]
TP = 4.0 FN = 0.0 TPR = 1.0
FP = 4.0 TN = 1.0 FPR = 0.8
TPR = [0.25 0.5 0.75 0.75 1. 1. 1. 1. ]
FPR = [0. 0. 0. 0.2 0.2 0.4 0.6 0.8]
AUC = 0.05
x = [0, 1, 0, 0, 1, 0, 1, 1, 0]
y = [1. 0. 0. 0. 0. 0. 0. 0. 0.]
TP = 0.0 FN = 4.0 TPR = 0.0
FP = 1.0 TN = 4.0 FPR = 0.2
x = [0, 1, 0, 0, 1, 0, 1, 1, 0]
y = [1. 1. 0. 0. 0. 0. 0. 0. 0.]
TP = 1.0 FN = 3.0 TPR = 0.25
FP = 1.0 TN = 4.0 FPR = 0.2
x = [0, 1, 0, 0, 1, 0, 1, 1, 0]
y = [1. 1. 1. 0. 0. 0. 0. 0. 0.]
TP = 1.0 FN = 3.0 TPR = 0.25
FP = 2.0 TN = 3.0 FPR = 0.4
x = [0, 1, 0, 0, 1, 0, 1, 1, 0]
y = [1. 1. 1. 1. 0. 0. 0. 0. 0.]
TP = 1.0 FN = 3.0 TPR = 0.25
FP = 3.0 TN = 2.0 FPR = 0.6
x = [0, 1, 0, 0, 1, 0, 1, 1, 0]
y = [1. 1. 1. 1. 1. 0. 0. 0. 0.]
TP = 2.0 FN = 2.0 TPR = 0.5
FP = 3.0 TN = 2.0 FPR = 0.6
x = [0, 1, 0, 0, 1, 0, 1, 1, 0]
y = [1. 1. 1. 1. 1. 1. 0. 0. 0.]
TP = 2.0 FN = 2.0 TPR = 0.5
FP = 4.0 TN = 1.0 FPR = 0.8
x = [0, 1, 0, 0, 1, 0, 1, 1, 0]
y = [1. 1. 1. 1. 1. 1. 1. 0. 0.]
TP = 3.0 FN = 1.0 TPR = 0.75
FP = 4.0 TN = 1.0 FPR = 0.8
x = [0, 1, 0, 0, 1, 0, 1, 1, 0]
y = [1. 1. 1. 1. 1. 1. 1. 1. 0.]
TP = 4.0 FN = 0.0 TPR = 1.0
FP = 4.0 TN = 1.0 FPR = 0.8
TPR = [0. 0.25 0.25 0.25 0.5 0.5 0.75 1. ]
FPR = [0.2 0.2 0.4 0.6 0.6 0.8 0.8 0.8]
AUC = 0.6000000000000001
五、问题及解决方法
1、pycharm 安装模块包失败
- 更换下载地址镜像,镜像地址有多种,这里采用清华的镜像地址:
https://pypi.tuna.tsinghua.edu.cn/simple/
2、中文乱码
- 尝试多种方式均失败,最后采用添加如下代码的方式解决
choose_font = mpl.font_manager.FontProperties(fname='C:\Windows\Fonts\simfang.ttf') # 为显示中文,指定默认字体
plt.title(u'ROC曲线', fontproperties=choose_font)
3、各曲线显示于同一坐标图上
- 起初尝试设置一个函数,输入变量,即可得到计算结果与 ROC 曲线图,并且这些曲线能在同一个坐标图上,但此时输出结果为一条曲线一张图,无法在同一个坐标图上显示。原因可能为,如下所示的几行代码需紧挨着执行。
plt.plot(xx1.TPR, xx1.FPR, linestyle='-', marker='o', color='r', label='x1')
plt.plot(xx2.TPR, xx2.FPR, linestyle='--', marker='^', color='b', label='x2')
plt.legend(loc='best') # 使其在同一一个坐标图上显示
- 反复尝试,暂时找不到较为简便且不增加算法复杂度的方式,于是只好把画 ROC 曲线的部分放到主函数里来执行。
六、评价
- 优点:设置类,并将计算部分封装在函数内,有利于对象输入与调用
- 缺点:绘制 ROC 曲线的过程未封装在函数内,不够便捷,降低了算法的通用性