resnet添加se模块

转载

mob64ca14031c97 2024-10-04 11:11:20

文章标签 resnet添加se模块函数返回值 List 并集 文章分类 架构后端开发

类似RPN区域生成网络(region proposal network)具有平移不变性的anchor boxes. 从P3到P7层的anchors的面积从32*32一次增加到了512*512(为什么?怎么算的?),每层anchors长宽比{1:2,1:1,2:1},每层增加尺寸

$2^0,2^\frac{1}{3},2^\frac{2}{3}$

,这样每层有9个anchors, ....

anchors.py

anchor_targets_bbox(),为box检测生成anchor 目标

def anchor_targets_bbox(
    anchors,
    image_group,
    annotations_group,#真实标注的x1,y1,x2,y2,label， 注意这里 annotations_group
    num_classes,
    negative_overlap=0.4,
    positive_overlap=0.5
):
    """ Generate anchor targets for bbox detection.

    Args
        anchors: np.array of annotations of shape (N, 4) for (x1, y1, x2, y2).
        image_group: List of BGR images.
        annotations_group: List of annotations (np.array of shape (N, 5) for (x1, y1, x2, y2, label)).
        num_classes: Number of classes to predict.
        mask_shape: If the image is padded with zeros, mask_shape can be used to mark the relevant part of the image.
        negative_overlap: IoU overlap for negative anchors (all anchors with overlap < negative_overlap are negative).
        positive_overlap: IoU overlap or positive anchors (all anchors with overlap > positive_overlap are positive).

    Returns
        labels_batch: batch that contains labels & anchor states (np.array of shape (batch_size, N, num_classes + 1),
                      where N is the number of anchors for an image and the last column defines the anchor state (-1 for ignore, 0 for bg, 1 for fg).
        regression_batch: batch that contains bounding-box regression targets for an image & anchor states (np.array of shape (batch_size, N, 4 + 1),
                      where N is the number of anchors for an image, the first 4 columns define regression targets for (x1, y1, x2, y2) and the
                      last column defines anchor states (-1 for ignore, 0 for bg, 1 for fg).
    """

    assert(len(image_group) == len(annotations_group)), "The length of the images and annotations need to be equal."
    assert(len(annotations_group) > 0), "No data received to compute anchor targets for."
    for annotations in annotations_group:
        assert('bboxes' in annotations), "Annotations should contain bboxes."
        assert('labels' in annotations), "Annotations should contain labels."

    batch_size = len(image_group)#计算batch_size

    regression_batch  = np.zeros((batch_size, anchors.shape[0], 4 + 1), dtype=keras.backend.floatx())#构造一个3维矩阵,batch_sizexanchors.shape[0]x5
    #其中anchors.shape[0]很大,有可能是43803或39492,每个批次还都不一样，这个43803是前边根据函数中anchors_for_shape()计算得来的，也就是对一张416*560的图片来说，对这个图构造了43803个anchor
    labels_batch      = np.zeros((batch_size, anchors.shape[0], num_classes + 1), dtype=keras.backend.floatx())#回归类别，本质是刻画anchor的类别特征，可以认为 labels_batch  
#中共有batch_size个元素,假设batch_size=8，网络的检测目标=3（人，车，飞机）则第1个元素的维度=[43803,4],其中4刻画了[人，车，飞机，正负样本状态]
    # 构造一个3维矩阵,batch_sizexanchors.shape[0]xnum_classes + 1

    # compute labels and regression targets
    for index, (image, annotations) in enumerate(zip(image_group, annotations_group)):#这里是对一个batch_size中的每张图片进行遍历，当然，
#每张图片可能包含了多个检测目标，所以annotations['bboxes'].shape[0]>=1
        if annotations['bboxes'].shape[0]:#annotations:{'labels': array([ 0.,  0.]), 'bboxes': array([[  67.97791573,  103.88162763,  448.83239265,  367.84012947],
       # [ 439.76378026,  195.41451562,  569.55807188,  263.01028949]])}
            # obtain indices of gt annotations with the greatest overlap
#这里是把43803个anchor与一张图片进行iou的计算
            positive_indices, ignore_indices, argmax_overlaps_inds = compute_gt_annotations(anchors, annotations['bboxes'], negative_overlap, positive_overlap)#计算每个anchor与真实标注之间的iou，函数的详细分析见下面

            labels_batch[index, ignore_indices, -1]       = -1#对于类别回归来讲,0.4<iou<0.5就是忽略这个anchor
            labels_batch[index, positive_indices, -1]     = 1#对于类别回归来讲,iou>0.5,这个anchor label=1，index是对每一幅图片的刻画，假设这里遍历到一个batch_size中的第3张图片，就把labels_batch中的第3个元素值做下改变，当然，这里的labels_batch的第3个元素值本身就是一个[43803,4]的二维矩阵（注意，此处假设网络的检测目标=3（人，车，飞机）），现在把这个[43803,4]矩阵的最后一列值=-1或1，具体是什么，要看下这个anchor与实际目标的iou

            regression_batch[index, ignore_indices, -1]   = -1#对于box回归来讲,0.4<iou<0.5就是忽略这个anchor
            regression_batch[index, positive_indices, -1] = 1#对于box回归来讲,iou>0.5这个anchor label=1
            # compute target class labels,因为一个anchor有4个值[c_人，c_车，c_飞机，状态],前面是对状态进行了赋值，这里就需要对这个anchor到底是哪类进行确定，这里根据这个anchor到底与这张图片中的哪个目标的iou最大，再根据annotations['labels']去查这个目标是哪个类别，进而确定这个anchor是哪个类别。，比如若确定这个anchor对应的是车，此时这个anchor的值=[0，0，1，1]
            labels_batch[index, positive_indices, annotations['labels'][argmax_overlaps_inds[positive_indices]].astype(int)] = 1

            regression_batch[index, :, :-1] = bbox_transform(anchors, annotations['bboxes'][argmax_overlaps_inds, :])#计算预测值与实际box的误差,特别注意，这里尽管一张图片实际上可能只有2个目标，即annotations['bboxes'].shape=[2,4],但因为这里传入函数的值为annotations['bboxes'][argmax_overlaps_inds, :],这里等于变相的把函数的入参也搞成了与anchors的维度一样，具有43803行，具体就是前面已经算过了每个anchor与每个目标的iou，假如这个anchor与目标3的iou最大，哪对应这个anchor就传入box3的值，具体取法可以看下下面的例子

‘’‘
import numpy as np
boxes=np.array([[1,2,3,4],[5,6,7,8]])#假设这里代表一张图片中有两个检测目标
argmax_overlaps_inds=np.array([0,1,1,1])#这里假设有4个anchor,具体哪个anchor与哪个目标的 iou最大，借助于argmax_overlaps_inds里面的元素值来体现
print(a[argmax_overlaps_inds,:])#对于boxes,取出argmax_overlaps_inds中各元素指定的行，这里是取出boxes[0],boxes[1],boxes[1],boxes[1]
#[[1 2 3 4]
 # [5 6 7 8]
 # [5 6 7 8]
 # [5 6 7 8]]
’‘’

        # ignore annotations outside of image
        if image.shape:
            anchors_centers = np.vstack([(anchors[:, 0] + anchors[:, 2]) / 2, (anchors[:, 1] + anchors[:, 3]) / 2]).T#计算长方形anchor的中心
            indices = np.logical_or(anchors_centers[:, 0] >= image.shape[1], anchors_centers[:, 1] >= image.shape[0])#计算长方形anchor的中心是否落在了图像的外面

            labels_batch[index, indices, -1]     = -1#若这个anchor中心落在了图像外面,这个anchor标记为-1,就是忽略这个anchor
            regression_batch[index, indices, -1] = -1

    return regression_batch, labels_batch

从上述代码可以看出,anchor有3个状态,

1:positive sample: iou>0.5,

-1:hard sample:0.4<iou<0.5,

0: negateive sample :iou<0.4

备注：

若某个anchor与哪个目标的iou都是0，则默认的给这个anchor对应上box1,个人感觉这样做有一定缺陷，比如一个box1位于是图片左上角，而一个图片右下角的anchor，让这个anchor去回归box1,显然是困难的

上面的anchor与真实box之间的对应过程可以用一个图片概述

resnet添加se模块_resnet添加se模块_02

对于所有的标注,进行遍历Sample retention ratio

step 1. 调用compute_gt_annotations(anchors, annotations['bboxes'], negative_overlap=0.4, positive_overlap=0.5) 获取具有最大重叠度的gt 标注,即若anchor与真实box之间的iou,若iou<0.4就是负, 0.4<iou<0.5, 就是忽略, iou>0.5就是正

def compute_gt_annotations(
    anchors,
    annotations,
    negative_overlap=0.4,
    positive_overlap=0.5
):#anchors与annotations的行数不一样，但列数一样=4
    """ Obtain indices of gt annotations with the greatest overlap.

    Args
        anchors: np.array of annotations of shape (N, 4) for (x1, y1, x2, y2).
        annotations: np.array of shape (M, 4) for (x1, y1, x2, y2).真实标注
        negative_overlap: IoU overlap for negative anchors (all anchors with overlap < negative_overlap are negative).
        positive_overlap: IoU overlap or positive anchors (all anchors with overlap > positive_overlap are positive).

    Returns
        positive_indices: indices of positive anchors
        ignore_indices: indices of ignored anchors
        argmax_overlaps_inds: ordered overlaps indices
    """
   
    overlaps = compute_overlap(anchors.astype(np.float64), annotations.astype(np.float64))#这里入参anchors有很多行，比如是43803个,annotations是真实标注,
    #数量不会跟多,一般是一张图片上的目标的个数,返回值overlaps是每个anchor与每个实际标注之间的iou, 比如现在一张图片有8个目标,则overlaps=[43803,8],
    
    argmax_overlaps_inds = np.argmax(overlaps, axis=1)#这里返回的是overlaps中最大值对应的索引号,比如这张图片中实际有3个目标,现在用43803个anchor跟这3个box分别计算iou
    #那么对于某一行来说，代表一个anchor,找出这个anchor与3个目标box哪个的iou最大，也就是哪anchor最能代表实际目标,因此最终得到的argmax_overlaps_inds的维度为[43803],比如argmax_overlaps_inds中的第2
    #个元素值=3,代表第2个anchor最能代表第3个检测目标argmax_overlaps_inds=[0 0 0 ... 3 3 3]
    max_overlaps = overlaps[np.arange(overlaps.shape[0]), argmax_overlaps_inds]#这里把前面计算的overlaps中的具体值给取了出来,就是取出第几行第几列

    # assign "dont care" labels  
    positive_indices = max_overlaps >= positive_overlap#max_overlaps为一个行向量 (43803,)就是1行43803列,这里判断下这个anchor与实际标注比较,所得到的iou是否大于阈值,
    #若大于阈值0.5,就是true,否则就是false， positive_indices = [False ...,True, False]
#这里是对一个anchor是否包含检测目标的刻画，若这个anchor里包含了检测目标，这个anchor就=TRUE，并不关心这个anchor是对目标1还是目标2重合度高
    ignore_indices = (max_overlaps > negative_overlap) & ~positive_indices#这里判断下这个anchor与实际标注比较,所得到的iou是否0.4<iou<0.5
    #若是,就是true,否则就是false
    return positive_indices, ignore_indices, argmax_overlaps_inds#这三个返回值都是1行43803的行向量,值为true或false,代表这个，argmax_overlaps_inds刻画了具体某个anchor与哪个检测目标iou最大，比如里面的第9行元素值=5， 代表了第9个anchor与检测目标5的iou最大
#anchor是正样本还是负样本,还是要忽略的, argmax_overlaps_inds里面的元素质刻画了当前anchor与哪个目标最逼近

step1.1: 调用函数compute_overlap(anchors, annotations['bboxes'])计算anchor与实际box的重合度，函数返回值为每个anchor与每个真实box之间的iou，比如有100个anchor,一张图片上的检测目标（人）有3个，则函数返回值shape=[100,3],100代表有多少个anchor, 3代表有多少个box，

$overlap=\begin{bmatrix} IOU_{an_1,box_1} & IOU_{an_1,box_2} &IOU_{an_1,box_3} \\ IOU_{an_2,box_1} & IOU_{an_2,box_2} &IOU_{an_2,box_3} \\ IOU_{an_3,box_1} & IOU_{an_3,box_2} &IOU_{an_3,box_3} \\ ... & ...&... \\ IOU_{an_{100},box_1} & IOU_{an_{100},box_2} &IOU_{an_{100},box_3} \end{bmatrix}$

def compute_overlap(a, b):
    #a [N,4]
    #b [M,4]
    area = (b[:, 2] - b[:, 0] + 1) * (b[:, 3] - b[:, 1] + 1)#计算长方形的面积
    iw = np.minimum(np.expand_dims(a[:, 2], axis=1), b[:, 2]) - np.maximum(np.expand_dims(a[:, 0], axis=1), b[:, 0]) + 1
    ih = np.minimum(np.expand_dims(a[:, 3], axis=1), b[:, 3]) - np.maximum(np.expand_dims(a[:, 1], axis=1), b[:, 1]) + 1
    # 假设a的数目是N，b的数目是M
    # np.expand_dims((N,),axis=1)将(N,)变成(N,1)
    # np.minimum((N,1),(M,)) 得到 (N M) 的矩阵 代表a和b逐一比较的结果
    # 取x和y中较小的值 来计算intersection
    # iw和ih分别是intersection的宽和高 iw和ih的shape都是(N,M), 代表每个anchor和groundTruth之间的intersection
    iw = np.maximum(iw, 0)
    ih = np.maximum(ih, 0)  #不允许iw或者ih小于0

    ua = np.expand_dims((a[:, 2] - a[:, 0] + 1) *(a[:, 3] - a[:, 1] + 1), axis=1) + area - iw * ih#并集的面积
    # 并集的计算 S_a+S_b-interection_ab
    ua = np.maximum(ua, np.finfo(float).eps)

    intersection = iw * ih#交集的面积
    return intersection / ua  # (N,M)计算交并比

step 2. 调用bbox_transform(anchors, annotations['bboxes'][argmax_overlaps_inds, :])计算预测anchor和实际标注之间的差值

def bbox_transform(anchors, gt_boxes, mean=None, std=None):
    """Compute bounding-box regression targets for an image."""

    if mean is None:
        mean = np.array([0, 0, 0, 0])
    if std is None:
        std = np.array([0.2, 0.2, 0.2, 0.2])

    if isinstance(mean, (list, tuple)):
        mean = np.array(mean)
    elif not isinstance(mean, np.ndarray):
        raise ValueError('Expected mean to be a np.ndarray, list or tuple. Received: {}'.format(type(mean)))

    if isinstance(std, (list, tuple)):
        std = np.array(std)
    elif not isinstance(std, np.ndarray):
        raise ValueError('Expected std to be a np.ndarray, list or tuple. Received: {}'.format(type(std)))

    anchor_widths  = anchors[:, 2] - anchors[:, 0]#计算anchor的宽
    anchor_heights = anchors[:, 3] - anchors[:, 1]#计算anchor的高

    targets_dx1 = (gt_boxes[:, 0] - anchors[:, 0]) / anchor_widths#预测的x1与实际的x1之间的偏差除以预测的宽度
    targets_dy1 = (gt_boxes[:, 1] - anchors[:, 1]) / anchor_heights#预测的y1与实际的y1之间的偏差除以预测的高度
    targets_dx2 = (gt_boxes[:, 2] - anchors[:, 2]) / anchor_widths#预测的x2与实际的x2之间的偏差除以预测的宽度
    targets_dy2 = (gt_boxes[:, 3] - anchors[:, 3]) / anchor_heights#预测的y2与实际的y2之间的偏差除以预测的高度

    targets = np.stack((targets_dx1, targets_dy1, targets_dx2, targets_dy2))
    targets = targets.T
    targets = (targets - mean) / std#mean [0 0 0 0], 这里将预测的位置与实际的位置做了误差对比,为什么又除以std呢?

    return targets

generate_anchors()函数

generate_anchors(base_size,ratios,scales)

函数功能:通过枚举ratios*scalse生成anchor(参考)窗口,这个函数在模型转换时,会被_misc.py调用,暂时还不知道是否会在训练时用到

答:训练过程有用到,体现在train.py-->callbacks

callbacks = create_callbacks(model, prediction_model, validation_generator, args)

这里的预测模型prediction_model是包含了由generate_anchors()生成的anchors_0,anchors_1,anchors_2,anchors_3,四层

参数:

base_size: 这个值来源于配置中的sizes=[32,64,128,256],每次调用函数generate_anchors()时会传入一个size,这个参数指定了最初的类似感受野的区域大小,因为经过多层卷积池化之后,feature map上一点的感受野对应到原始图像就会是一个区域,若这里设置的是32, 也就是feature map上一点对应到原图的大小为32*32的区域.

ratios=[0.5,1,2]# 这个参数指的是要将32*32的区域,保持面积不变,宽高w:h按照给定的比例[0.5,1,2]=[1:2, 1:1, 2:1]进行变换, 这个比例是要根据实际需要来设置的,比如要检测人头和安全帽,他们都是正方形的,则ratios=[1],就是所有的感受野都是正方形

scales=[1,1.25,1.58],这个参数,是要将输入的区域的宽,高进行三种倍数1,1.25,1.58倍的放大, 如32*32的区域变成(32*1.25)*(32*1.25)=40*40的区域,(32*1.58)*(32*1.58)=51*51的区域

num_anchors:指生成的锚点的个数=len(ratios)*len(scales)=9, #根据宋博的配置文件=3

step1: 生成一个9*4的零矩阵anchors=np.zeros((num_anchors, 4))

step2: 把anchors的第2列之后的全换掉,即相当于box中的(x_max,y_max)的值替换掉,具体替换值为base_size*scales,比如当base_size=32时, anchors前三行的x_max就变成

$x_{max}=base_{size}*scales=32*\begin{bmatrix} 1\\ 1.25\\ 1.58 \end{bmatrix}= \begin{bmatrix} 32\\ 40\\ 50.56 \end{bmatrix}$

y_max值与x_max的值相同, anchors的当中三行与最后三行与前三行相同,于是anchors就变成

$anchors=\begin{bmatrix} 0 & 0& 32& 32\\ 0 & 0& 40 & 40\\ 0 & 0 & 50.56 &50.56 \\ 0 & 0& 32& 32\\ 0 & 0& 40 & 40\\ 0 & 0 & 50.56 &50.56 \\ 0 & 0& 32& 32\\ 0 & 0& 40 & 40\\ 0 & 0 & 50.56 &50.56 \end{bmatrix}$

则可以算出每个anchor(或称box)的面积=x_max*y_max, 共有三种规格32*32=1024, 40*40=1600, 50.56*50.56=2556

step3: 更换anchors的第3列即x_max的值,这样做之后,可以让x_max都不一样,

$x'_{max}=\frac{x_{max}}{\sqrt{ratios}} =\frac{32,40,50.56,32,40,50.56,32,40,50.56}{\sqrt{0.5,0.5,0.5,1,1,1,2,2,2}} =(45.25,56.56,71.5,32,40,50.56,22.6,28.2,35.7)$

step4: 更换anchors的第4列即y_max的值,更换准则是按照给定的宽高比来计算y_max=x_max*ratios

$y_{max}=x'_{max}.*(0.5,0.5,0.5,1,1,1,2,2,2)=(45.25,56.56,71.5,32,40,50.56,22.6,28.2,35.7).*(0.5,0.5,0.5,1,1,1,2,2,2)=(22.6,28.28,35.75,32,40,50.56,45.2,56.4,71.4)$

step5: 上面得到的anchor,box坐标是(0,0,x_max,y_max), 可以认为是(x_ctr, y_ctr, w, h)的形式,现在把其转换成一般的形式,即移动一下中心,变成 (x1, y1, x2, y2)的形式,可以看出,改变位置后的anchor的宽高是保持不变的

总结: 可以看出base_size的作用是先给出一个基本的正方形框的大小,比如是32*32的,

scales的作用就是让base_size大小的box按照比例scales进行缩放,但缩放之后的box仍然是正方形的

ratios,控制box的宽高比,把上面的正方形的box变成长方形的

程序实现

anchors[:, 2:] = base_size * np.tile(scales, (2, len(ratios))).T#把scales象瓷砖一样在行上复制两遍,在列上复制len(ratios)=3遍,然后整个矩阵分别乘以32,64,128,256
    #在进行转置,生成一个9x2的矩阵
    #把anchors的第2列之后的全换掉,即(x2,y2)的值替换掉
    # compute areas of anchors
    areas = anchors[:, 2] * anchors[:, 3]#计算每个anchor或box的面积

    # correct for ratios np.repeat(ratios, len(scales))= [ 0.5  0.5  0.5  1.   1.   1.   2.   2.   2. ]
    #area=np.sqrt(x_max*x_max/ratios)=x_max/np.sqrt(ratios)
    anchors[:, 2] = np.sqrt(areas / np.repeat(ratios, len(scales)))#把ratios每个元素复制3次,每个anchor除以宽高比后再开方,出来的什么意思?=x2

    anchors[:, 3] = anchors[:, 2] * np.repeat(ratios, len(scales))#ratios代表一个box的宽高比# 根据前面得到的x2, 结合宽高比计算出y2

经过程序后,由一个base_size共生成了9个anchors,这9个anchor有一个共同点就是中心坐标点一样

resnet添加se模块_并集_09

总结:对于一张416*560的图片,经过函数generate_anchors()后,生成了43803个anchor

#对第一次循环
# base_size=32, ,先生成9个面积在32*32的anchor,这9个anchor的中心点都在左上角,再根据特征图1=[52,70],生成52*70=3640个anchor,
# 这3640个anchor的面积都是0,但是均匀分布在原始图像上,相邻两个anchor的距离为8个像素,然后根据这3640个点,以其为中心,每个点生成9个anchor,
# 这轮循环共生成3640*9=32760个anchor
#第二轮循环:
# base_size=64, ,先生成9个面积在64*64的anchor,这9个anchor的中心点都在左上角,再根据特征图2=[26,35],生成26*35=910个anchor,
# 这910个anchor的面积都是0,但是均匀分布在原始图像上,相邻两个anchor的距离为16个像素,这轮循环共生成9*910=8190个anchor
# 第三轮循环:
# base_size=128 ,先生成9个面积在128*128的anchor,这9个anchor的中心点都在左上角,再根据特征图3=[13,18],生成13*18=234个anchor,
# 这234个anchor的面积都是0,但是均匀分布在原始图像上,相邻两个anchor的距离为32个像素,这轮循环共生成9*234=2106个anchor
#第四轮循环:这轮循环共生成9*7*9=567个anchor
# 第五轮循环:这轮循环共生成9*4*5=180个anchor
#则经过5轮循环后,共生成32760+8190+2106+567+180=43803个anchor

返回到特征图与原图的对应关系就是如下图所示,一般经过各种卷积池化之后,一个base_size生成9个anchros, 现在有4个basesize=[32,64,128,256],则共生成4*9=36个anchor,也就是对特征图上的一个点,返回原图还原出36个anchor, 那么若我们得到的特征图大小是60*40,可以看到返回到原图就是60*40*36=86400个anchor,这个数量还是比较惊人的

resnet添加se模块_List_10