类似RPN区域生成网络(region proposal network)具有平移不变性的anchor boxes. 从P3到P7层的anchors的面积从32*32一次增加到了512*512(为什么?怎么算的?),每层anchors长宽比{1:2,1:1,2:1},每层增加尺寸
,这样每层有9个anchors, ....
anchors.py
anchor_targets_bbox(),为box检测生成anchor 目标
def anchor_targets_bbox(
anchors,
image_group,
annotations_group,#真实标注的x1,y1,x2,y2,label, 注意这里 annotations_group
num_classes,
negative_overlap=0.4,
positive_overlap=0.5
):
""" Generate anchor targets for bbox detection.
Args
anchors: np.array of annotations of shape (N, 4) for (x1, y1, x2, y2).
image_group: List of BGR images.
annotations_group: List of annotations (np.array of shape (N, 5) for (x1, y1, x2, y2, label)).
num_classes: Number of classes to predict.
mask_shape: If the image is padded with zeros, mask_shape can be used to mark the relevant part of the image.
negative_overlap: IoU overlap for negative anchors (all anchors with overlap < negative_overlap are negative).
positive_overlap: IoU overlap or positive anchors (all anchors with overlap > positive_overlap are positive).
Returns
labels_batch: batch that contains labels & anchor states (np.array of shape (batch_size, N, num_classes + 1),
where N is the number of anchors for an image and the last column defines the anchor state (-1 for ignore, 0 for bg, 1 for fg).
regression_batch: batch that contains bounding-box regression targets for an image & anchor states (np.array of shape (batch_size, N, 4 + 1),
where N is the number of anchors for an image, the first 4 columns define regression targets for (x1, y1, x2, y2) and the
last column defines anchor states (-1 for ignore, 0 for bg, 1 for fg).
"""
assert(len(image_group) == len(annotations_group)), "The length of the images and annotations need to be equal."
assert(len(annotations_group) > 0), "No data received to compute anchor targets for."
for annotations in annotations_group:
assert('bboxes' in annotations), "Annotations should contain bboxes."
assert('labels' in annotations), "Annotations should contain labels."
batch_size = len(image_group)#计算batch_size
regression_batch = np.zeros((batch_size, anchors.shape[0], 4 + 1), dtype=keras.backend.floatx())#构造一个3维矩阵,batch_sizexanchors.shape[0]x5
#其中anchors.shape[0]很大,有可能是43803或39492,每个批次还都不一样,这个43803是前边根据函数中anchors_for_shape()计算得来的,也就是对一张416*560的图片来说,对这个图构造了43803个anchor
labels_batch = np.zeros((batch_size, anchors.shape[0], num_classes + 1), dtype=keras.backend.floatx())#回归类别,本质是刻画anchor的类别特征,可以认为 labels_batch
#中共有batch_size个元素,假设batch_size=8,网络的检测目标=3(人,车,飞机)则第1个元素的维度=[43803,4],其中4刻画了[人,车,飞机,正负样本状态]
# 构造一个3维矩阵,batch_sizexanchors.shape[0]xnum_classes + 1
# compute labels and regression targets
for index, (image, annotations) in enumerate(zip(image_group, annotations_group)):#这里是对一个batch_size中的每张图片进行遍历,当然,
#每张图片可能包含了多个检测目标,所以annotations['bboxes'].shape[0]>=1
if annotations['bboxes'].shape[0]:#annotations:{'labels': array([ 0., 0.]), 'bboxes': array([[ 67.97791573, 103.88162763, 448.83239265, 367.84012947],
# [ 439.76378026, 195.41451562, 569.55807188, 263.01028949]])}
# obtain indices of gt annotations with the greatest overlap
#这里是把43803个anchor与一张图片进行iou的计算
positive_indices, ignore_indices, argmax_overlaps_inds = compute_gt_annotations(anchors, annotations['bboxes'], negative_overlap, positive_overlap)#计算每个anchor与真实标注之间的iou,函数的详细分析见下面
labels_batch[index, ignore_indices, -1] = -1#对于类别回归来讲,0.4<iou<0.5就是忽略这个anchor
labels_batch[index, positive_indices, -1] = 1#对于类别回归来讲,iou>0.5,这个anchor label=1,index是对每一幅图片的刻画,假设这里遍历到一个batch_size中的第3张图片,就把labels_batch中的第3个元素值做下改变,当然,这里的labels_batch的第3个元素值本身就是一个[43803,4]的二维矩阵(注意,此处假设网络的检测目标=3(人,车,飞机)),现在把这个[43803,4]矩阵的最后一列值=-1或1,具体是什么,要看下这个anchor与实际目标的iou
regression_batch[index, ignore_indices, -1] = -1#对于box回归来讲,0.4<iou<0.5就是忽略这个anchor
regression_batch[index, positive_indices, -1] = 1#对于box回归来讲,iou>0.5这个anchor label=1
# compute target class labels,因为一个anchor有4个值[c_人,c_车,c_飞机,状态],前面是对状态进行了赋值,这里就需要对这个anchor到底是哪类进行确定,这里根据这个anchor到底与这张图片中的哪个目标的iou最大,再根据annotations['labels']去查这个目标是哪个类别,进而确定这个anchor是哪个类别。,比如若确定这个anchor对应的是车,此时这个anchor的值=[0,0,1,1]
labels_batch[index, positive_indices, annotations['labels'][argmax_overlaps_inds[positive_indices]].astype(int)] = 1
regression_batch[index, :, :-1] = bbox_transform(anchors, annotations['bboxes'][argmax_overlaps_inds, :])#计算预测值与实际box的误差,特别注意,这里尽管一张图片实际上可能只有2个目标,即annotations['bboxes'].shape=[2,4],但因为这里传入函数的值为annotations['bboxes'][argmax_overlaps_inds, :],这里等于变相的把函数的入参也搞成了与anchors的维度一样,具有43803行,具体就是前面已经算过了每个anchor与每个目标的iou,假如这个anchor与目标3的iou最大,哪对应这个anchor就传入box3的值,具体取法可以看下下面的例子
‘’‘
import numpy as np
boxes=np.array([[1,2,3,4],[5,6,7,8]])#假设这里代表一张图片中有两个检测目标
argmax_overlaps_inds=np.array([0,1,1,1])#这里假设有4个anchor,具体哪个anchor与哪个目标的 iou最大,借助于argmax_overlaps_inds里面的元素值来体现
print(a[argmax_overlaps_inds,:])#对于boxes,取出argmax_overlaps_inds中各元素指定的行,这里是取出boxes[0],boxes[1],boxes[1],boxes[1]
#[[1 2 3 4]
# [5 6 7 8]
# [5 6 7 8]
# [5 6 7 8]]
’‘’
# ignore annotations outside of image
if image.shape:
anchors_centers = np.vstack([(anchors[:, 0] + anchors[:, 2]) / 2, (anchors[:, 1] + anchors[:, 3]) / 2]).T#计算长方形anchor的中心
indices = np.logical_or(anchors_centers[:, 0] >= image.shape[1], anchors_centers[:, 1] >= image.shape[0])#计算长方形anchor的中心是否落在了图像的外面
labels_batch[index, indices, -1] = -1#若这个anchor中心落在了图像外面,这个anchor标记为-1,就是忽略这个anchor
regression_batch[index, indices, -1] = -1
return regression_batch, labels_batch
从上述代码可以看出,anchor有3个状态,
1:positive sample: iou>0.5,
-1:hard sample:0.4<iou<0.5,
0: negateive sample :iou<0.4
备注:
若某个anchor与哪个目标的iou都是0,则默认的给这个anchor对应上box1,个人感觉这样做有一定缺陷,比如一个box1位于是图片左上角,而一个图片右下角的anchor,让这个anchor去回归box1,显然是困难的
上面的anchor与真实box之间的对应过程可以用一个图片概述
对于所有的标注,进行遍历Sample retention ratio
step 1. 调用compute_gt_annotations(anchors, annotations['bboxes'], negative_overlap=0.4, positive_overlap=0.5) 获取具有最大重叠度的gt 标注,即若anchor与真实box之间的iou,若iou<0.4就是负, 0.4<iou<0.5, 就是忽略, iou>0.5就是正
def compute_gt_annotations(
anchors,
annotations,
negative_overlap=0.4,
positive_overlap=0.5
):#anchors与annotations的行数不一样,但列数一样=4
""" Obtain indices of gt annotations with the greatest overlap.
Args
anchors: np.array of annotations of shape (N, 4) for (x1, y1, x2, y2).
annotations: np.array of shape (M, 4) for (x1, y1, x2, y2).真实标注
negative_overlap: IoU overlap for negative anchors (all anchors with overlap < negative_overlap are negative).
positive_overlap: IoU overlap or positive anchors (all anchors with overlap > positive_overlap are positive).
Returns
positive_indices: indices of positive anchors
ignore_indices: indices of ignored anchors
argmax_overlaps_inds: ordered overlaps indices
"""
overlaps = compute_overlap(anchors.astype(np.float64), annotations.astype(np.float64))#这里入参anchors有很多行,比如是43803个,annotations是真实标注,
#数量不会跟多,一般是一张图片上的目标的个数,返回值overlaps是每个anchor与每个实际标注之间的iou, 比如现在一张图片有8个目标,则overlaps=[43803,8],
argmax_overlaps_inds = np.argmax(overlaps, axis=1)#这里返回的是overlaps中最大值对应的索引号,比如这张图片中实际有3个目标,现在用43803个anchor跟这3个box分别计算iou
#那么对于某一行来说,代表一个anchor,找出这个anchor与3个目标box哪个的iou最大,也就是哪anchor最能代表实际目标,因此最终得到的argmax_overlaps_inds的维度为[43803],比如argmax_overlaps_inds中的第2
#个元素值=3,代表第2个anchor最能代表第3个检测目标argmax_overlaps_inds=[0 0 0 ... 3 3 3]
max_overlaps = overlaps[np.arange(overlaps.shape[0]), argmax_overlaps_inds]#这里把前面计算的overlaps中的具体值给取了出来,就是取出第几行第几列
# assign "dont care" labels
positive_indices = max_overlaps >= positive_overlap#max_overlaps为一个行向量 (43803,)就是1行43803列,这里判断下这个anchor与实际标注比较,所得到的iou是否大于阈值,
#若大于阈值0.5,就是true,否则就是false, positive_indices = [False ...,True, False]
#这里是对一个anchor是否包含检测目标的刻画,若这个anchor里包含了检测目标,这个anchor就=TRUE,并不关心这个anchor是对目标1还是目标2重合度高
ignore_indices = (max_overlaps > negative_overlap) & ~positive_indices#这里判断下这个anchor与实际标注比较,所得到的iou是否0.4<iou<0.5
#若是,就是true,否则就是false
return positive_indices, ignore_indices, argmax_overlaps_inds#这三个返回值都是1行43803的行向量,值为true或false,代表这个,argmax_overlaps_inds刻画了具体某个anchor与哪个检测目标iou最大,比如里面的第9行元素值=5, 代表了第9个anchor与检测目标5的iou最大
#anchor是正样本还是负样本,还是要忽略的, argmax_overlaps_inds里面的元素质刻画了当前anchor与哪个目标最逼近
step1.1: 调用函数compute_overlap(anchors, annotations['bboxes'])计算anchor与实际box的重合度,函数返回值为每个anchor与每个真实box之间的iou, 比如有100个anchor,一张图片上的检测目标(人)有3个,则函数返回值shape=[100,3],100代表有多少个anchor, 3代表有多少个box,
def compute_overlap(a, b):
#a [N,4]
#b [M,4]
area = (b[:, 2] - b[:, 0] + 1) * (b[:, 3] - b[:, 1] + 1)#计算长方形的面积
iw = np.minimum(np.expand_dims(a[:, 2], axis=1), b[:, 2]) - np.maximum(np.expand_dims(a[:, 0], axis=1), b[:, 0]) + 1
ih = np.minimum(np.expand_dims(a[:, 3], axis=1), b[:, 3]) - np.maximum(np.expand_dims(a[:, 1], axis=1), b[:, 1]) + 1
# 假设a的数目是N,b的数目是M
# np.expand_dims((N,),axis=1)将(N,)变成(N,1)
# np.minimum((N,1),(M,)) 得到 (N M) 的矩阵 代表a和b逐一比较的结果
# 取x和y中较小的值 来计算intersection
# iw和ih分别是intersection的宽和高 iw和ih的shape都是(N,M), 代表每个anchor和groundTruth之间的intersection
iw = np.maximum(iw, 0)
ih = np.maximum(ih, 0) #不允许iw或者ih小于0
ua = np.expand_dims((a[:, 2] - a[:, 0] + 1) *(a[:, 3] - a[:, 1] + 1), axis=1) + area - iw * ih#并集的面积
# 并集的计算 S_a+S_b-interection_ab
ua = np.maximum(ua, np.finfo(float).eps)
intersection = iw * ih#交集的面积
return intersection / ua # (N,M)计算交并比
step 2. 调用bbox_transform(anchors, annotations['bboxes'][argmax_overlaps_inds, :])计算预测anchor和实际标注之间的差值
def bbox_transform(anchors, gt_boxes, mean=None, std=None):
"""Compute bounding-box regression targets for an image."""
if mean is None:
mean = np.array([0, 0, 0, 0])
if std is None:
std = np.array([0.2, 0.2, 0.2, 0.2])
if isinstance(mean, (list, tuple)):
mean = np.array(mean)
elif not isinstance(mean, np.ndarray):
raise ValueError('Expected mean to be a np.ndarray, list or tuple. Received: {}'.format(type(mean)))
if isinstance(std, (list, tuple)):
std = np.array(std)
elif not isinstance(std, np.ndarray):
raise ValueError('Expected std to be a np.ndarray, list or tuple. Received: {}'.format(type(std)))
anchor_widths = anchors[:, 2] - anchors[:, 0]#计算anchor的宽
anchor_heights = anchors[:, 3] - anchors[:, 1]#计算anchor的高
targets_dx1 = (gt_boxes[:, 0] - anchors[:, 0]) / anchor_widths#预测的x1与实际的x1之间的偏差除以预测的宽度
targets_dy1 = (gt_boxes[:, 1] - anchors[:, 1]) / anchor_heights#预测的y1与实际的y1之间的偏差除以预测的高度
targets_dx2 = (gt_boxes[:, 2] - anchors[:, 2]) / anchor_widths#预测的x2与实际的x2之间的偏差除以预测的宽度
targets_dy2 = (gt_boxes[:, 3] - anchors[:, 3]) / anchor_heights#预测的y2与实际的y2之间的偏差除以预测的高度
targets = np.stack((targets_dx1, targets_dy1, targets_dx2, targets_dy2))
targets = targets.T
targets = (targets - mean) / std#mean [0 0 0 0], 这里将预测的位置与实际的位置做了误差对比,为什么又除以std呢?
return targets
generate_anchors()函数
generate_anchors(base_size,ratios,scales)
函数功能:通过枚举ratios*scalse生成anchor(参考)窗口,这个函数在模型转换时,会被_misc.py调用,暂时还不知道是否会在训练时用到
答:训练过程有用到,体现在train.py-->callbacks
callbacks = create_callbacks(model, prediction_model, validation_generator, args)
这里的预测模型prediction_model是包含了由generate_anchors()生成的anchors_0,anchors_1,anchors_2,anchors_3,四层
参数:
base_size: 这个值来源于配置中的sizes=[32,64,128,256],每次调用函数generate_anchors()时会传入一个size,这个参数指定了最初的类似感受野的区域大小,因为经过多层卷积池化之后,feature map上一点的感受野对应到原始图像就会是一个区域,若这里设置的是32, 也就是feature map上一点对应到原图的大小为32*32的区域.
ratios=[0.5,1,2]# 这个参数指的是要将32*32的区域,保持面积不变,宽高w:h按照给定的比例[0.5,1,2]=[1:2, 1:1, 2:1]进行变换, 这个比例是要根据实际需要来设置的,比如要检测人头和安全帽,他们都是正方形的,则ratios=[1],就是所有的感受野都是正方形
scales=[1,1.25,1.58],这个参数,是要将输入的区域的宽,高进行三种倍数1,1.25,1.58倍的放大, 如32*32的区域变成(32*1.25)*(32*1.25)=40*40的区域,(32*1.58)*(32*1.58)=51*51的区域
num_anchors:指生成的锚点的个数=len(ratios)*len(scales)=9, #根据宋博的配置文件=3
step1: 生成一个9*4的零矩阵anchors=np.zeros((num_anchors, 4))
step2: 把anchors的第2列之后的全换掉,即相当于box中的(x_max,y_max)的值替换掉,具体替换值为base_size*scales,比如当base_size=32时, anchors前三行的x_max就变成
y_max值与x_max的值相同, anchors的当中三行与最后三行与前三行相同,于是anchors就变成
则可以算出每个anchor(或称box)的面积=x_max*y_max, 共有三种规格32*32=1024, 40*40=1600, 50.56*50.56=2556
step3: 更换anchors的第3列即x_max的值,这样做之后,可以让x_max都不一样,
step4: 更换anchors的第4列即y_max的值,更换准则是按照给定的宽高比来计算y_max=x_max*ratios
step5: 上面得到的anchor,box坐标是(0,0,x_max,y_max), 可以认为是(x_ctr, y_ctr, w, h)的形式,现在把其转换成一般的形式,即移动一下中心,变成 (x1, y1, x2, y2)的形式,可以看出,改变位置后的anchor的宽高是保持不变的
总结: 可以看出base_size的作用是先给出一个基本的正方形框的大小,比如是32*32的,
scales的作用就是让base_size大小的box按照比例scales进行缩放,但缩放之后的box仍然是正方形的
ratios,控制box的宽高比,把上面的正方形的box变成长方形的
程序实现
anchors[:, 2:] = base_size * np.tile(scales, (2, len(ratios))).T#把scales象瓷砖一样在行上复制两遍,在列上复制len(ratios)=3遍,然后整个矩阵分别乘以32,64,128,256
#在进行转置,生成一个9x2的矩阵
#把anchors的第2列之后的全换掉,即(x2,y2)的值替换掉
# compute areas of anchors
areas = anchors[:, 2] * anchors[:, 3]#计算每个anchor或box的面积
# correct for ratios np.repeat(ratios, len(scales))= [ 0.5 0.5 0.5 1. 1. 1. 2. 2. 2. ]
#area=np.sqrt(x_max*x_max/ratios)=x_max/np.sqrt(ratios)
anchors[:, 2] = np.sqrt(areas / np.repeat(ratios, len(scales)))#把ratios每个元素复制3次,每个anchor除以宽高比后再开方,出来的什么意思?=x2
anchors[:, 3] = anchors[:, 2] * np.repeat(ratios, len(scales))#ratios代表一个box的宽高比# 根据前面得到的x2, 结合宽高比计算出y2
经过程序后,由一个base_size共生成了9个anchors,这9个anchor有一个共同点就是中心坐标点一样
总结:对于一张416*560的图片,经过函数generate_anchors()后,生成了43803个anchor
#对第一次循环
# base_size=32, ,先生成9个面积在32*32的anchor,这9个anchor的中心点都在左上角,再根据特征图1=[52,70],生成52*70=3640个anchor,
# 这3640个anchor的面积都是0,但是均匀分布在原始图像上,相邻两个anchor的距离为8个像素,然后根据这3640个点,以其为中心,每个点生成9个anchor,
# 这轮循环共生成3640*9=32760个anchor
#第二轮循环:
# base_size=64, ,先生成9个面积在64*64的anchor,这9个anchor的中心点都在左上角,再根据特征图2=[26,35],生成26*35=910个anchor,
# 这910个anchor的面积都是0,但是均匀分布在原始图像上,相邻两个anchor的距离为16个像素,这轮循环共生成9*910=8190个anchor
# 第三轮循环:
# base_size=128 ,先生成9个面积在128*128的anchor,这9个anchor的中心点都在左上角,再根据特征图3=[13,18],生成13*18=234个anchor,
# 这234个anchor的面积都是0,但是均匀分布在原始图像上,相邻两个anchor的距离为32个像素,这轮循环共生成9*234=2106个anchor
#第四轮循环:这轮循环共生成9*7*9=567个anchor
# 第五轮循环:这轮循环共生成9*4*5=180个anchor
#则经过5轮循环后,共生成32760+8190+2106+567+180=43803个anchor
返回到特征图与原图的对应关系就是如下图所示,一般经过各种卷积池化之后,一个base_size生成9个anchros, 现在有4个basesize=[32,64,128,256],则共生成4*9=36个anchor,也就是对特征图上的一个点,返回原图还原出36个anchor, 那么若我们得到的特征图大小是60*40,可以看到返回到原图就是60*40*36=86400个anchor,这个数量还是比较惊人的