前几天听说torchvision 0.3发布了,它支持分割模型、检测模型。而由于工作原因,刚好在寻找一种比容易使用的图像分割工具,不需要复杂的图像处理步骤、配置、训练代码,所以自然而然试试torchvision 0.3的功能了。
下面记录一下小编我使用torchvision 0.3训练图像分割目标检测模型的过程。
首先当然是安装torchvision 0.3啦,目前0.3版本的还不支持Windows系统,使用Windows的小伙伴可能要等一段时间啦。由于小编机器上已经有torchvision 0.2,因此用下面的命令进行安装0.3版本:
pip install --upgrade torchvision
如果之前没安装过的话,那么这样进行安装:
pip install torchvision
或者用conda安装:
conda install torchvision -c pytorch
记得还要安装
pycocotools
,因为将会使用它来计算各种评估指标。Linux和Mac OS上直接通过pip安装即可,Windows上步骤比较麻烦,需要先装好Microsoft Visual C++ Build Tools。
使用到的数据集为PennFudanPed
,目录结构如下:
PennFudanPed/
Annotation/
FudanPed00001.txt
FudanPed00002.txt
FudanPed00003.txt
FudanPed00004.txt
...
PedMasks/
FudanPed00001_mask.png
FudanPed00002_mask.png
FudanPed00003_mask.png
FudanPed00004_mask.png
...
PNGImages/
FudanPed00001.png
FudanPed00002.png
FudanPed00003.png
FudanPed00004.png
看一下图片是什么样的:
我们看一看标注文件是什么样的:
# Compatible with PASCAL Annotation Version 1.00
Image filename : "PennFudanPed/PNGImages/FudanPed00001.png"
Image size (X x Y x C) : 559 x 536 x 3
Database : "The Penn-Fudan-Pedestrian Database"
Objects with ground truth : 2 { "PASpersonWalking" "PASpersonWalking" }
# Note there may be some objects not included in the ground truth list for they are severe-occluded
# or have very small size.
# Top left pixel co-ordinates : (1, 1)
# Details for pedestrian 1 ("PASpersonWalking")
Original label for object 1 "PASpersonWalking" : "PennFudanPed"
Bounding box for object 1 "PASpersonWalking" (Xmin, Ymin) - (Xmax, Ymax) : (160, 182) - (302, 431)
Pixel mask for object 1 "PASpersonWalking" : "PennFudanPed/PedMasks/FudanPed00001_mask.png"
# Details for pedestrian 2 ("PASpersonWalking")
Original label for object 2 "PASpersonWalking" : "PennFudanPed"
Bounding box for object 2 "PASpersonWalking" (Xmin, Ymin) - (Xmax, Ymax) : (420, 171) - (535, 486)
Pixel mask for object 2 "PASpersonWalking" : "PennFudanPed/PedMasks/FudanPed00001_mask.png"
可以看到标注文件里记录了这张图片的大小,人的位置信息,mask图像的存放信息。
再来看一张mask图片:
这黑乎乎的是什么?怎么什么都没有,其实如果将图片读入数组把值打印出来,你会发现在人物的mask区域的值是1,2,···,背景区域是0,因此图片看上去全是黑的,但是如果给这个mask加一个调色板,那人物区域就显示出来啦:
接下来写一个类继承torch.utils.data.Dataset
,然后实现__getitem__
和__len__
函数来读取图片、bounding box,mask等信息。
训练之前还需要从官方github上获取一些工具来用:
git clone https://github.com/pytorch/vision.git
将references/detection
下的文件拷贝到训练代码同级目录下,然后开始训练,可以看到下面类似的输出。
Epoch: [9] [ 0/60] eta: 0:00:31 lr: 0.000005 loss: 0.1171 (0.1171) loss_classifier: 0.0151 (0.0151) loss_box_reg: 0.0050 (0.0050) loss_mask: 0.0912 (0.0912) loss_objectness: 0.0006 (0.0006) loss_rpn_box_reg: 0.0052 (0.0052) time: 0.5295 data: 0.2165 max mem: 5575
Epoch: [9] [10/60] eta: 0:00:18 lr: 0.000005 loss: 0.1534 (0.1607) loss_classifier: 0.0247 (0.0240) loss_box_reg: 0.0099 (0.0103) loss_mask: 0.1088 (0.1181) loss_objectness: 0.0007 (0.0020) loss_rpn_box_reg: 0.0051 (0.0063) time: 0.3638 data: 0.0247 max mem: 5575
Epoch: [9] [20/60] eta: 0:00:14 lr: 0.000005 loss: 0.1449 (0.1537) loss_classifier: 0.0245 (0.0232) loss_box_reg: 0.0085 (0.0093) loss_mask: 0.1083 (0.1131) loss_objectness: 0.0004 (0.0017) loss_rpn_box_reg: 0.0048 (0.0064) time: 0.3564 data: 0.0059 max mem: 5575
Epoch: [9] [30/60] eta: 0:00:11 lr: 0.000005 loss: 0.1351 (0.1559) loss_classifier: 0.0220 (0.0238) loss_box_reg: 0.0072 (0.0102) loss_mask: 0.1083 (0.1139) loss_objectness: 0.0004 (0.0014) loss_rpn_box_reg: 0.0055 (0.0067) time: 0.3733 data: 0.0061 max mem: 5575
Epoch: [9] [40/60] eta: 0:00:07 lr: 0.000005 loss: 0.1344 (0.1545) loss_classifier: 0.0218 (0.0240) loss_box_reg: 0.0091 (0.0099) loss_mask: 0.1062 (0.1128) loss_objectness: 0.0004 (0.0012) loss_rpn_box_reg: 0.0063 (0.0067) time: 0.3693 data: 0.0059 max mem: 5575
Epoch: [9] [50/60] eta: 0:00:03 lr: 0.000005 loss: 0.1496 (0.1587) loss_classifier: 0.0236 (0.0246) loss_box_reg: 0.0096 (0.0111) loss_mask: 0.1082 (0.1145) loss_objectness: 0.0003 (0.0011) loss_rpn_box_reg: 0.0065 (0.0073) time: 0.3696 data: 0.0059 max mem: 5575
Epoch: [9] [59/60] eta: 0:00:00 lr: 0.000005 loss: 0.1555 (0.1591) loss_classifier: 0.0246 (0.0245) loss_box_reg: 0.0094 (0.0112) loss_mask: 0.1099 (0.1150) loss_objectness: 0.0003 (0.0011) loss_rpn_box_reg: 0.0070 (0.0072) time: 0.3700 data: 0.0059 max mem: 5575
Epoch: [9] Total time: 0:00:22 (0.3682 s / it)
creating index...
index created!
Test: [ 0/50] eta: 0:00:12 model_time: 0.0721 (0.0721) evaluator_time: 0.0131 (0.0131) time: 0.2515 data: 0.1637 max mem: 5575
Test: [49/50] eta: 0:00:00 model_time: 0.0600 (0.0606) evaluator_time: 0.0029 (0.0043) time: 0.0700 data: 0.0032 max mem: 5575
Test: Total time: 0:00:03 (0.0733 s / it)
Averaged stats: model_time: 0.0600 (0.0606) evaluator_time: 0.0029 (0.0043)
Accumulating evaluation results...
DONE (t=0.01s).
Accumulating evaluation results...
DONE (t=0.01s).
选一张图片看看图像分割结果怎么样
看看在这张图片上的分割效果:
从最后几张可以发现,远处的很小的人都被分割出来了,而模型只在120张图片训练了10轮,效果已经很好了。
现在人物已经分割出来了,还想在人物上打上bounding box呢?
原图:
打上bounding box效果图,类别标签还没做映射:
到此,使用torchvision 0.3进行图像分割,目标检测的效果已经介绍完啦,但是小编总觉得那个分割的图像不太好看,黑白色的图片,因此小编在此基础上做了修改,美化了一下mask并将它打在了原图上,有兴趣的小伙伴可以自己实现呀。
原图:
美化mask后的效果图:
题外话
如何评估模型训练效果呢,查看源码发现这一部分调用的是pycocotools
中的工具,能够给出下面这样的评估结果:
IoU metric: bbox
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.838
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.988
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.955
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.342
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.721
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.857
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.394
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.877
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.877
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.500
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.825
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.892
IoU metric: segm
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.787
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.988
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.961
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.377
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.587
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.800
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.359
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.826
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.826
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.633
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.775
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.836
但是这么一串字符看着眼花缭乱,想画出一个Precision-Recall图表来,于是小编阅读了pycocotools
中关于评估指标的部分代码之后,自己实现了绘图代码,效果如下:
bounding box的PR曲线
segmentation的PR曲线