Abstract
Convolutional Neural Network (CNN) is the basis of many computer vision tasks. In order to depict complex boundaries in visual tasks, it is essential to fully explore the feature distribution to realize the potential of CNN. However, most of the current research focuses on the design of deeper architectures, and rarely explores high-level feature statistics. To solve this problem, we propose a simple and effective neural network attention insertion module, named Attention Module using First and Second order information fusion (AFS). Our method combines the first-order pooling and the second-order pooling, corresponding to the two independent dimensions of space and channel respectively, and verifies the effectiveness of our AFS through the two connection ways. The feature map outputed by the middle convolutional layer infers the attention map in turn along the channel and space, and then multiplies the attention map with the input feature map for adaptive feature refinement. Our AFS can be integrated with any feedforward CNNs and can be trained end-to-end with negligible overhead. We have conducted a large number of experiments on CIFAR-10 and CIFAR-100, and the experimental results show that our AFS module significantly improves the classification and detection performance on different models.
Supported in part by the Natural Science Foundation of Chongqing under Grant cstc2020jcyj-msxmX0057.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: CVPR, pp. 7263–7271 (2017)
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: CVPR, pp. 3431–3440 (2015)
Wang, X., Girshick, R., Gupta, A., et al.: Non-local neural networks. In: CVPR, pp. 7794–7803 (2018)
LeCun, Y., Bottou, L., Bengio, Y., et al.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Szegedy, C., Liu, W., Jia, Y., et al.: Going deeper with convolutions. In: CVPR, pp. 1–9 (2015)
He, K., Zhang, X., Ren, S., Sun, J.: Identity mappings in deep residual networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 630–645. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_38
Ionescu, C., Vantzos, O., Sminchisescu, C.: Matrix backpropagation for deep networks with structured layers. In: ICCV, pp. 2965–2973 (2015)
Wang, Y., Xie, L., Liu, C., et al.: SORT: second-order response transform for visual recognition. In: ICCV, pp. 1359–1368 (2017)
Li, P., Xie, J., Wang, Q., Zuo, W.: Is second-order information helpful for large-scale visual recognition? In: ICCV, pp. 2070–2078 (2017)
Gregor, K., Danihelka, I., Graves, A., et al.: DRAW: a recurrent neural network for image generation. In: ICML, pp. 1462–1471. PMLR (2015)
Jaderberg, M., Simonyan, K., Zisserman, A.: Spatial transformer networks. In: NIPS 28 (2015)
Xu, K., Ba, J., Kiros, R., et al.: Show, attend and tell: neural image caption generation with visual attention. In: ICML, pp. 2048–2057. PMLR (2015)
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: CVPR, pp. 7132–7141 (2018)
Itti, L., Koch, C., Niebur, E.: A model of saliency-based visual attention for rapid scene analysis. IEEE TPAMI 20(11), 1254–1259 (1998)
Rensink, R.A.: The dynamic representation of scenes. Vis. Cogn. 7(1–3), 17–42 (2000)
Corbetta, M., Shulman, G.L.: Control of goal-directed and stimulus-driven attention in the brain. Nat. Rev. Neurosci. 3(3), 201–215 (2002)
Larochelle, H., Hinton, G.E.: Learning to combine foveal glimpses with a third-order Boltzmann machine. In: NIPS (2010)
Mnih, V., Heess, N., Graves, A., et al.: Recurrent models of visual attention. In: NIPS 27 (2014)
Chen, L., Zhang, H., Xiao, J., et al. : SCA-CNN: spatial and channel-wise attention in convolutional networks for image captioning. In: CVPR, pp. 5659–5667 (2017)
Vaswani, A., Shazeer, N., Parmar, N., et al.: Attention is all you need. In: NIPS 30 (2017)
Li, P., Wang, Q., Zeng, H., et al.: Local log-Euclidean multivariate Gaussian descriptor and its application to image classification. IEEE TAPMI 39(4), 803–817 (2016)
Wang, Q., Li, P., Zuo, W., et al.: RAID-G: robust estimation of approximate infinite dimensional Gaussian with application to material recognition. In: CVPR, pp. 4433–4441 (2016)
Cui, Y., Zhou, F., Wang, J., et al.: Kernel pooling for convolutional neural networks. In: CVPR (2017)
Li, P., Xie, J., Wang, Q., et al.: Towards faster training of global covariance pooling networks by iterative matrix square root normalization. In: CVPR, pp. 947–955 (2018)
Xiao, H., Feng, J., Lin, G., et al.: MoNet: deep motion exploitation for video object segmentation. In: CVPR, pp. 1140–1148 (2018)
Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images. Technical report, Citeseer (2009)
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Pytorch. http://pytorch.org/
Cui, Y., Zhou, F., Wang, J., et al.: Kernel pooling for convolutional neural networks. In: CVPR, pp. 2921–2930 (2017)
Zagoruyko, S., Komodakis, N.: Wide residual networks. arXiv preprint arXiv:1605.07146 (2016)
Howard, A.G., Zhu, M., Chen, B., et al.: MobileNets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Zuo, Y., Lv, J., Wang, H. (2022). AFS: Attention Using First and Second Order Information to Enrich Features. In: Pimenidis, E., Angelov, P., Jayne, C., Papaleonidas, A., Aydin, M. (eds) Artificial Neural Networks and Machine Learning – ICANN 2022. ICANN 2022. Lecture Notes in Computer Science, vol 13532. Springer, Cham. https://doi.org/10.1007/978-3-031-15937-4_45
Download citation
DOI: https://doi.org/10.1007/978-3-031-15937-4_45
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-15936-7
Online ISBN: 978-3-031-15937-4
eBook Packages: Computer ScienceComputer Science (R0)