前几天听说torchvision 0.3发布了,它支持分割模型、检测模型。而由于工作原因,刚好在寻找一种比容易使用的图像分割工具,不需要复杂的图像处理步骤、配置、训练代码,所以自然而然试试torchvision 0.3的功能了。

下面记录一下小编我使用torchvision 0.3训练图像分割目标检测模型的过程。
首先当然是安装torchvision 0.3啦,目前0.3版本的还不支持Windows系统,使用Windows的小伙伴可能要等一段时间啦。由于小编机器上已经有torchvision 0.2,因此用下面的命令进行安装0.3版本:

pip install --upgrade torchvision

如果之前没安装过的话,那么这样进行安装:

pip install torchvision

或者用conda安装:

conda install torchvision -c pytorch

记得还要安装pycocotools,因为将会使用它来计算各种评估指标。Linux和Mac OS上直接通过pip安装即可,Windows上步骤比较麻烦,需要先装好Microsoft Visual C++ Build Tools。

使用到的数据集为PennFudanPed,目录结构如下:

PennFudanPed/
  Annotation/
    FudanPed00001.txt
    FudanPed00002.txt
    FudanPed00003.txt
    FudanPed00004.txt
    ...
  PedMasks/
    FudanPed00001_mask.png
    FudanPed00002_mask.png
    FudanPed00003_mask.png
    FudanPed00004_mask.png
    ...
  PNGImages/
    FudanPed00001.png
    FudanPed00002.png
    FudanPed00003.png
    FudanPed00004.png

看一下图片是什么样的:

python分水岭图像分割 图像分割 pytorch_torchvision

我们看一看标注文件是什么样的:

# Compatible with PASCAL Annotation Version 1.00
Image filename : "PennFudanPed/PNGImages/FudanPed00001.png"
Image size (X x Y x C) : 559 x 536 x 3
Database : "The Penn-Fudan-Pedestrian Database"
Objects with ground truth : 2 { "PASpersonWalking" "PASpersonWalking" }
# Note there may be some objects not included in the ground truth list for they are severe-occluded
# or have very small size.
# Top left pixel co-ordinates : (1, 1)
# Details for pedestrian 1 ("PASpersonWalking")
Original label for object 1 "PASpersonWalking" : "PennFudanPed"
Bounding box for object 1 "PASpersonWalking" (Xmin, Ymin) - (Xmax, Ymax) : (160, 182) - (302, 431)
Pixel mask for object 1 "PASpersonWalking" : "PennFudanPed/PedMasks/FudanPed00001_mask.png"

# Details for pedestrian 2 ("PASpersonWalking")
Original label for object 2 "PASpersonWalking" : "PennFudanPed"
Bounding box for object 2 "PASpersonWalking" (Xmin, Ymin) - (Xmax, Ymax) : (420, 171) - (535, 486)
Pixel mask for object 2 "PASpersonWalking" : "PennFudanPed/PedMasks/FudanPed00001_mask.png"

可以看到标注文件里记录了这张图片的大小,人的位置信息,mask图像的存放信息。

再来看一张mask图片:

python分水岭图像分割 图像分割 pytorch_python分水岭图像分割_02

这黑乎乎的是什么?怎么什么都没有,其实如果将图片读入数组把值打印出来,你会发现在人物的mask区域的值是1,2,···,背景区域是0,因此图片看上去全是黑的,但是如果给这个mask加一个调色板,那人物区域就显示出来啦:

python分水岭图像分割 图像分割 pytorch_torchvision_03

接下来写一个类继承torch.utils.data.Dataset,然后实现__getitem____len__函数来读取图片、bounding box,mask等信息。

训练之前还需要从官方github上获取一些工具来用:

git clone https://github.com/pytorch/vision.git

references/detection下的文件拷贝到训练代码同级目录下,然后开始训练,可以看到下面类似的输出。

Epoch: [9]  [ 0/60]  eta: 0:00:31  lr: 0.000005  loss: 0.1171 (0.1171)  loss_classifier: 0.0151 (0.0151)  loss_box_reg: 0.0050 (0.0050)  loss_mask: 0.0912 (0.0912)  loss_objectness: 0.0006 (0.0006)  loss_rpn_box_reg: 0.0052 (0.0052)  time: 0.5295  data: 0.2165  max mem: 5575
Epoch: [9]  [10/60]  eta: 0:00:18  lr: 0.000005  loss: 0.1534 (0.1607)  loss_classifier: 0.0247 (0.0240)  loss_box_reg: 0.0099 (0.0103)  loss_mask: 0.1088 (0.1181)  loss_objectness: 0.0007 (0.0020)  loss_rpn_box_reg: 0.0051 (0.0063)  time: 0.3638  data: 0.0247  max mem: 5575
Epoch: [9]  [20/60]  eta: 0:00:14  lr: 0.000005  loss: 0.1449 (0.1537)  loss_classifier: 0.0245 (0.0232)  loss_box_reg: 0.0085 (0.0093)  loss_mask: 0.1083 (0.1131)  loss_objectness: 0.0004 (0.0017)  loss_rpn_box_reg: 0.0048 (0.0064)  time: 0.3564  data: 0.0059  max mem: 5575
Epoch: [9]  [30/60]  eta: 0:00:11  lr: 0.000005  loss: 0.1351 (0.1559)  loss_classifier: 0.0220 (0.0238)  loss_box_reg: 0.0072 (0.0102)  loss_mask: 0.1083 (0.1139)  loss_objectness: 0.0004 (0.0014)  loss_rpn_box_reg: 0.0055 (0.0067)  time: 0.3733  data: 0.0061  max mem: 5575
Epoch: [9]  [40/60]  eta: 0:00:07  lr: 0.000005  loss: 0.1344 (0.1545)  loss_classifier: 0.0218 (0.0240)  loss_box_reg: 0.0091 (0.0099)  loss_mask: 0.1062 (0.1128)  loss_objectness: 0.0004 (0.0012)  loss_rpn_box_reg: 0.0063 (0.0067)  time: 0.3693  data: 0.0059  max mem: 5575
Epoch: [9]  [50/60]  eta: 0:00:03  lr: 0.000005  loss: 0.1496 (0.1587)  loss_classifier: 0.0236 (0.0246)  loss_box_reg: 0.0096 (0.0111)  loss_mask: 0.1082 (0.1145)  loss_objectness: 0.0003 (0.0011)  loss_rpn_box_reg: 0.0065 (0.0073)  time: 0.3696  data: 0.0059  max mem: 5575
Epoch: [9]  [59/60]  eta: 0:00:00  lr: 0.000005  loss: 0.1555 (0.1591)  loss_classifier: 0.0246 (0.0245)  loss_box_reg: 0.0094 (0.0112)  loss_mask: 0.1099 (0.1150)  loss_objectness: 0.0003 (0.0011)  loss_rpn_box_reg: 0.0070 (0.0072)  time: 0.3700  data: 0.0059  max mem: 5575
Epoch: [9] Total time: 0:00:22 (0.3682 s / it)
creating index...
index created!
Test:  [ 0/50]  eta: 0:00:12  model_time: 0.0721 (0.0721)  evaluator_time: 0.0131 (0.0131)  time: 0.2515  data: 0.1637  max mem: 5575
Test:  [49/50]  eta: 0:00:00  model_time: 0.0600 (0.0606)  evaluator_time: 0.0029 (0.0043)  time: 0.0700  data: 0.0032  max mem: 5575
Test: Total time: 0:00:03 (0.0733 s / it)
Averaged stats: model_time: 0.0600 (0.0606)  evaluator_time: 0.0029 (0.0043)
Accumulating evaluation results...
DONE (t=0.01s).
Accumulating evaluation results...
DONE (t=0.01s).

选一张图片看看图像分割结果怎么样

python分水岭图像分割 图像分割 pytorch_python分水岭图像分割_04

看看在这张图片上的分割效果:

python分水岭图像分割 图像分割 pytorch_python分水岭图像分割_05


python分水岭图像分割 图像分割 pytorch_目标检测_06


python分水岭图像分割 图像分割 pytorch_图像分割_07


python分水岭图像分割 图像分割 pytorch_lua_08


python分水岭图像分割 图像分割 pytorch_图像分割_09


python分水岭图像分割 图像分割 pytorch_lua_10


python分水岭图像分割 图像分割 pytorch_torchvision_11


python分水岭图像分割 图像分割 pytorch_torchvision_12


从最后几张可以发现,远处的很小的人都被分割出来了,而模型只在120张图片训练了10轮,效果已经很好了。

现在人物已经分割出来了,还想在人物上打上bounding box呢?

原图:

python分水岭图像分割 图像分割 pytorch_目标检测_13

打上bounding box效果图,类别标签还没做映射:

python分水岭图像分割 图像分割 pytorch_图像分割_14

到此,使用torchvision 0.3进行图像分割,目标检测的效果已经介绍完啦,但是小编总觉得那个分割的图像不太好看,黑白色的图片,因此小编在此基础上做了修改,美化了一下mask并将它打在了原图上,有兴趣的小伙伴可以自己实现呀。

原图:

python分水岭图像分割 图像分割 pytorch_图像分割_15

美化mask后的效果图:

python分水岭图像分割 图像分割 pytorch_图像分割_16

题外话

如何评估模型训练效果呢,查看源码发现这一部分调用的是pycocotools中的工具,能够给出下面这样的评估结果:

IoU metric: bbox
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.838
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.988
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.955
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.342
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.721
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.857
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.394
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.877
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.877
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.500
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.825
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.892
IoU metric: segm
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.787
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.988
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.961
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.377
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.587
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.800
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.359
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.826
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.826
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.633
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.775
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.836

但是这么一串字符看着眼花缭乱,想画出一个Precision-Recall图表来,于是小编阅读了pycocotools中关于评估指标的部分代码之后,自己实现了绘图代码,效果如下:

bounding box的PR曲线

python分水岭图像分割 图像分割 pytorch_目标检测_17

segmentation的PR曲线

python分水岭图像分割 图像分割 pytorch_图像分割_18