ScarfNet: Multi-scale Features with Deeply Fused and Redistributed Semantics

原创

wx5ba0c87f1984b 2021-08-13 09:35:08 博主文章分类：计算机视觉 ©著作权

©著作权归作者所有：来自51CTO博客作者wx5ba0c87f1984b的原创作品，请联系作者获取转载授权，否则将追究法律责任

1、已经存在的特征金字塔方法

为了检测到变化尺寸的目标，基于特征金字塔的检测器，在不同特征层之间，基于在k特征图上的决策，例如下图(a)所示，baseline检测器使用在特征层 $ScarfNet: Multi-scale Features with Deeply Fused and Redistributed Semantics_卷积$ 上的特征图 $ScarfNet: Multi-scale Features with Deeply Fused and Redistributed Semantics_权重_02$ 。

ScarfNet: Multi-scale Features with Deeply Fused and Redistributed Semantics_权重_03

$ScarfNet: Multi-scale Features with Deeply Fused and Redistributed Semantics_语义信息_04$

其中 $ScarfNet: Multi-scale Features with Deeply Fused and Redistributed Semantics_权重_05$ 。其中 $ScarfNet: Multi-scale Features with Deeply Fused and Redistributed Semantics_权重_06$ 是骨干网络产生的特征图， $ScarfNet: Multi-scale Features with Deeply Fused and Redistributed Semantics_权重_07$ 是从后来的卷积层由底向上得到。 $ScarfNet: Multi-scale Features with Deeply Fused and Redistributed Semantics_语义信息_08$ 代表了第l个卷积层执行的操作。 $ScarfNet: Multi-scale Features with Deeply Fused and Redistributed Semantics_卷积_09$ 代表检测子网络，通常采用一个单一的 $ScarfNet: Multi-scale Features with Deeply Fused and Redistributed Semantics_自顶向下_10$ 的卷积层来产生分类和框回归的输出。由于从金字塔层输入的深度不同，较浅的底层特征缺乏语义信息。

为了减少不同金字塔层之间的语义差距，有一些工作提出了使用横向连接的自顶向下结构，如图(c)所示。该结构通过横向连接将高层语义从顶层传播到底层，提高了分辨率，同时保持了空间上的高分辨率。第l层的特征图 $ScarfNet: Multi-scale Features with Deeply Fused and Redistributed Semantics_语义信息_11$ 产生的方式为

ScarfNet: Multi-scale Features with Deeply Fused and Redistributed Semantics_卷积_12

其中 $ScarfNet: Multi-scale Features with Deeply Fused and Redistributed Semantics_权重_05$ ， $ScarfNet: Multi-scale Features with Deeply Fused and Redistributed Semantics_自顶向下_14$ 是第l层的横向连接， $ScarfNet: Multi-scale Features with Deeply Fused and Redistributed Semantics_卷积_15$ 是第l的自顶向下的连接。操作符 $ScarfNet: Multi-scale Features with Deeply Fused and Redistributed Semantics_语义信息_16$ 代表两个特征图的组合，例如通道连接和相加。不同的方法仅仅采用了不同的 $ScarfNet: Multi-scale Features with Deeply Fused and Redistributed Semantics_卷积_15$ 和 $ScarfNet: Multi-scale Features with Deeply Fused and Redistributed Semantics_自顶向下_14$ 。

对特征金字塔这些方法比较抽象，他们依然有一些限制。因为自顶向下的连接以没有方向的方式传播语义，这些语义在各层上是不均匀分布的。结果是金字塔特征层之间的语义分隔依然存在。其次，这种特征的单向处理能力有限，无法生成丰富的上下文信息，从而提高所有尺度上的语义水平。为了解决这个问题我们开发了一个使用biLSTM在所有特征层之间以单向横向连接产生深度融合的语义。接下来的章节将展示我们提出方法的细节。

3.2、ScarfNet：整个结构

ScarfNet用两步来解决语义信息的不符：(1)、使用biLSTM来组合打散的语义信息。(2)、使用逐通道注意模块将融合的特征重新分布到每个特征层。整个结构如下图所示：

ScarfNet: Multi-scale Features with Deeply Fused and Redistributed Semantics_自顶向下_19

将第k个金字塔特征 $ScarfNet: Multi-scale Features with Deeply Fused and Redistributed Semantics_权重_07$ 作为输入，ScarfNet产生新的第l个特征图 $ScarfNet: Multi-scale Features with Deeply Fused and Redistributed Semantics_语义信息_11$ 为：

ScarfNet: Multi-scale Features with Deeply Fused and Redistributed Semantics_权重_22

其中 $ScarfNet: Multi-scale Features with Deeply Fused and Redistributed Semantics_权重_05$ ，如式(6)所示ScarfNet由两部分组成：语义重组网络(ScNet)和注意重分布网络(ArNet)：

ScNet通过biLSTM来融合金字塔特征 $ScarfNet: Multi-scale Features with Deeply Fused and Redistributed Semantics_权重_07$ ，并且用融合的语义产生输出特征。
ArNet收集从biLSTM的输出特征，并且用逐通道注意力来产生高质量的语义多尺度特征，连接到原始的特征金子塔上。最终，结果特征图用检测子网络 $ScarfNet: Multi-scale Features with Deeply Fused and Redistributed Semantics_卷积_09$ 单独处理来产生最终的检测结果。

3、语义组合网络(ScNet)

通过ScNet产生的特征图 $ScarfNet: Multi-scale Features with Deeply Fused and Redistributed Semantics_卷积_26$ 为：

ScarfNet: Multi-scale Features with Deeply Fused and Redistributed Semantics_语义信息_27

$ScarfNet: Multi-scale Features with Deeply Fused and Redistributed Semantics_卷积_28$ 是第l层的输出特征图，细节如下图所示，描述了ScNet的细节。

ScNet使用biLSTM在不同的金字塔之间均匀的融合打散的特征。

biLSTM通过门函数，在多尺度层上选择融合语义信息。

ScNet由匹配模块和biLSTM组成。

匹配模块首先对金字塔特征 $ScarfNet: Multi-scale Features with Deeply Fused and Redistributed Semantics_权重_07$ 的尺寸进行变换，使他们的尺寸相同。

然后使用 $ScarfNet: Multi-scale Features with Deeply Fused and Redistributed Semantics_自顶向下_30$ 的卷积层来调整通道维度。

结果，匹配模块产生通道数和尺寸都相同的特征图。尺寸变换操作通过双线性插值来完成。

biLSTM和参考文献[23]相同。基于全局池化的结果，对输入连接和门参数的计算使用卷积层，来显著的节省计算。

ScarfNet: Multi-scale Features with Deeply Fused and Redistributed Semantics_权重_31

特别地，biLSTM的操作可以简化为：

ScarfNet: Multi-scale Features with Deeply Fused and Redistributed Semantics_语义信息_32

其中 $ScarfNet: Multi-scale Features with Deeply Fused and Redistributed Semantics_操作符_33$ 代表哈达玛积，biLSTM的状态在前向和后向都更新。上式为前向更新，后向更新的表达式类似。

4、注意力重分布网络(ArNet)

ArNet产生高层的语义特征图，连接到原始的金字塔特征图 $ScarfNet: Multi-scale Features with Deeply Fused and Redistributed Semantics_权重_02$ 上，表达式为：

ScarfNet: Multi-scale Features with Deeply Fused and Redistributed Semantics_自顶向下_35

操作符 $ScarfNet: Multi-scale Features with Deeply Fused and Redistributed Semantics_语义信息_16$ 代表逐通道连接。ArNet的具体结构下图所示。ArNet连接biLSTM的输出 $ScarfNet: Multi-scale Features with Deeply Fused and Redistributed Semantics_卷积_26$ ，对他们应用逐通道注意力机制。注意力机制的权重通过构建 $ScarfNet: Multi-scale Features with Deeply Fused and Redistributed Semantics_自顶向下_30$ 的向量获得，具体方式为使用全局平均池化，并且将将它传递到两个全连接层，最后再接一个sigmoid函数。注意，这些逐通道注意力模块允许选择将语义传播到金字塔的每层。一旦注意力的权重使用了，匹配模块将特征图的结果进行下采样，并且应用 $ScarfNet: Multi-scale Features with Deeply Fused and Redistributed Semantics_自顶向下_30$ 的卷积来匹配通道维数，利用这些原始的金字塔特征。最终，输出的匹配模块连接到原始的特征图 $ScarfNet: Multi-scale Features with Deeply Fused and Redistributed Semantics_权重_02$ 上，来产生高的语义特征 $ScarfNet: Multi-scale Features with Deeply Fused and Redistributed Semantics_语义信息_11$ 。

ScarfNet: Multi-scale Features with Deeply Fused and Redistributed Semantics_语义信息_42