resnet50使用要求 resnet50 fpn

转载

mob64ca1400bfa8 2024-03-27 15:53:31

文章标签 resnet50使用要求计算机视觉 pytorch 深度学习 List 文章分类 架构后端开发

总体架构¹

ROI对从RPN中选出来的1000个Proposal Boxes，以及从FPN中输出的多层特征图进行ROI Pool，对于box中的对象进行分类，并再次进行Proposal Boxes偏移（offset/delta）数值回归，产生新的分数和再次微调的box，以及得到标签，最后再次进行非极大值抑制(NMS)：

resnet50使用要求 resnet50 fpn_List

基于FPN的ROI处理会比传统的Faster RCNN多出一些步骤，要更加复杂一些。

主要包含如下步骤：

Box ROI Pool，根据1000个Proposal box的面积，确定选择在哪一层特征图上进行ROI Pool操作
Box Head，由两个全连接层组成，对ROI Align处理出来的7x7的bounding-box所包含的特征图进一步处理
Box Predicator，在Box Head处理得到的结果在进一步进行分类和Box的位置偏移(offset)做数值回归
Postprocess Detection，做Softmax，进行最后分类，并将Box的位置偏移回归结果和Proposal boxes进行合并，得到调整后的detection boxes，最后进行极大值抑制(NMS)过滤出有效的detection结果（scores, boxes和labels）。

Box ROI Pool

本模型可以同时对多个图像进行处理，分别检测出各个图片中对象，所以首先需要通过convert_to_roi_format将各个图像的每层特征图合并在一起，统一进行ROI Align处理。

setup_scales则对输出的前4个特征图（最小的pool层不用），对mapper对象进行配置。Mapper是FPN中引入的一个新的概念，主要是计算Proposal Box的面积，并根据面积算出在哪一个特征图层进行ROI Align处理，具体可以参考论文中：

resnet50使用要求 resnet50 fpn_pytorch_02

对应的实现代码为：

class LevelMapper(object):
    """Determine which FPN level each RoI in a set of RoIs should map to based
    on the heuristic in the FPN paper.

    Arguments:
        k_min (int)
        k_max (int)
        canonical_scale (int)
        canonical_level (int)
        eps (float)
    """

    def __init__(self, k_min, k_max, canonical_scale=224, canonical_level=4, eps=1e-6):
        # type: (int, int, int, int, float) -> None
        self.k_min = k_min
        self.k_max = k_max
        self.s0 = canonical_scale
        self.lvl0 = canonical_level
        self.eps = eps

    def __call__(self, boxlists):
        # type: (List[Tensor]) -> Tensor
        """
        Arguments:
            boxlists (list[BoxList])
        """
        # Compute level ids
        s = torch.sqrt(torch.cat([box_area(boxlist) for boxlist in boxlists]))

        # Eqn.(1) in FPN paper
        target_lvls = torch.floor(self.lvl0 + torch.log2(s / self.s0) + torch.tensor(self.eps, dtype=s.dtype))
        target_lvls = torch.clamp(target_lvls, min=self.k_min, max=self.k_max)
        return (target_lvls.to(torch.int64) - self.k_min).to(torch.int64)

比如有一个Proposal Box对应宽高分别为：100, 120，那么根据上述公式：
$resnet50使用要求 resnet50 fpn_pytorch_03$
下表是ResNet50和FPN的对应关系，参考libtorch学习笔记（17）- ResNet50 FPN以及如何应用于Faster-RCNN

ResNet Layer Name	ResNet Level(k)	FPN Level	Minimum Area()
conv1	1	n/a	n/a
conv2_x	2	0
conv3_x	3	1
conv4_x	4	2
conv5_x	5	3
n/a	n/a	pool

所以这个proposal box会从Feature Map Level#0(2 - 2 = 0)中取出特征图进行RoI Align²处理。

Box Head

这部分包括两个全连接层，并用于后续的预测模块用来做分类和bouding-box delta预测：

class TwoMLPHead(nn.Module):
    """
    Standard heads for FPN-based models

    Arguments:
        in_channels (int): number of input channels
        representation_size (int): size of the intermediate representation
    """

    def __init__(self, in_channels, representation_size):
        super(TwoMLPHead, self).__init__()

        self.fc6 = nn.Linear(in_channels, representation_size)
        self.fc7 = nn.Linear(representation_size, representation_size)

    def forward(self, x):
        x = x.flatten(start_dim=1)

        x = F.relu(self.fc6(x))
        x = F.relu(self.fc7(x))

        return x

Box Predicator

这部分主要用来对1000个proposal boxes进行分类，并再次进行调整得到更精确的boxes。

class FastRCNNPredictor(nn.Module):
    """
    Standard classification + bounding box regression layers
    for Fast R-CNN.

    Arguments:
        in_channels (int): number of input channels
        num_classes (int): number of output classes (including background)
    """

    def __init__(self, in_channels, num_classes):
        super(FastRCNNPredictor, self).__init__()
        self.cls_score = nn.Linear(in_channels, num_classes)
        self.bbox_pred = nn.Linear(in_channels, num_classes * 4)

    def forward(self, x):
        if x.dim() == 4:
            assert list(x.shape[2:]) == [1, 1]
        x = x.flatten(start_dim=1)
        scores = self.cls_score(x)
        bbox_deltas = self.bbox_pred(x)

        return scores, bbox_deltas

Postprocess Detection

首先将Box Predicator预测出来的bbox的delta值和proposal boxes进行合并，得到每个proposal boxes的左上角和右下角坐标，和RPN的算法相似，可以参考Box-Coder.Decode，里面由详细介绍。

def postprocess_detections(self,
                               class_logits,    # type: Tensor
                               box_regression,  # type: Tensor
                               proposals,       # type: List[Tensor]
                               image_shapes     # type: List[Tuple[int, int]]
                               ):
        # type: (...) -> Tuple[List[Tensor], List[Tensor], List[Tensor]]
        device = class_logits.device
        num_classes = class_logits.shape[-1]

        boxes_per_image = [boxes_in_image.shape[0] for boxes_in_image in proposals]
        pred_boxes = self.box_coder.decode(box_regression, proposals)

然后对回归的前景对象分类进行softmax

pred_scores = F.softmax(class_logits, -1)

接着去取每张图片的boxes, scores和image_shape：

pred_boxes_list = pred_boxes.split(boxes_per_image, 0)
        pred_scores_list = pred_scores.split(boxes_per_image, 0)

        all_boxes = []
        all_scores = []
        all_labels = []
        for boxes, scores, image_shape in zip(pred_boxes_list, pred_scores_list, image_shapes):

然后将800x1216坐标clip到800x1202，具体参考torchvision Faster-RCNN ResNet-50 FPN代码解析（图片转换和坐标）

boxes = box_ops.clip_boxes_to_image(boxes, image_shape)

创建一个labels的张量，用来存放过滤出来的detection bbox的label index:

# create labels for each prediction
            labels = torch.arange(num_classes, device=device)
            labels = labels.view(1, -1).expand_as(scores)

移除背景labels, scores和boxes:

# remove predictions with the background label
            boxes = boxes[:, 1:]
            scores = scores[:, 1:]
            labels = labels[:, 1:]

移除低分的detection，这里score_thresh为0.05：

# batch everything, by making every class prediction be a separate instance
            boxes = boxes.reshape(-1, 4)
            scores = scores.reshape(-1)
            labels = labels.reshape(-1)

            # remove low scoring boxes
            inds = torch.nonzero(scores > self.score_thresh).squeeze(1)
            boxes, scores, labels = boxes[inds], scores[inds], labels[inds]

移除空的boxes:

# remove empty boxes
            keep = box_ops.remove_small_boxes(boxes, min_size=1e-2)
            boxes, scores, labels = boxes[keep], scores[keep], labels[keep]

经过这些步骤后得到大概这样的boxes和labels:

resnet50使用要求 resnet50 fpn_pytorch_09

最后用极大值抑制(NMS³)剔除那些重叠box：

# non-maximum suppression, independently done per class
            keep = box_ops.batched_nms(boxes, scores, labels, self.nms_thresh)
            # keep only topk scoring predictions
            keep = keep[:self.detections_per_img]
            boxes, scores, labels = boxes[keep], scores[keep], labels[keep]

这样得到的boxes和labels是：

resnet50使用要求 resnet50 fpn_pytorch_10

结语

经过ROI处理之后，可以检测到的对象已经比较精确了，而且这里还带有检测对象的分数，比如：

[
	0.9996865, 0.999302, 0.9909377, 
	0.964582, 0.8458481, 0.79095364, 
	0.3160024, 0.16850659, 0.16231589, 
	0.106609166, 0.07780073, 0.07285354, 0.06343418
]

这里还可以继续过滤一些分数比较低的detection，比如设置一个阈值为0.5，分数大于这个阈值，就是最终检测到的对象。