Normalized Attention Inter-Channel Pooling (NAIP) for Deep Convolutional Neural Network Regularization

Setiawan, Feri; Yahya, Bernardo Nugroho; Lee, Seok-Lyong

doi:10.1007/s11063-023-11203-6

Normalized Attention Inter-Channel Pooling (NAIP) for Deep Convolutional Neural Network Regularization

Published: 06 March 2023

Volume 55, pages 9315–9333, (2023)
Cite this article

Neural Processing Letters Aims and scope Submit manuscript

Feri Setiawan¹,
Bernardo Nugroho Yahya² &
Seok-Lyong Lee²

175 Accesses
1 Altmetric
Explore all metrics

Abstract

Deep learning methods have recently attracted attention in solving various tasks through their learning and determining capability for powerful features. It has been known that the absence of a pooling layer in the deep learning architectures made the network performance deteriorate significantly. Existing pooling methods assume that each value in the small region of the local feature map contributes equally to generate the pooled feature map. However, some values in the pooling region may contribute partially for generating the map. In addition, existing methods usually focused on the spatial dimension and ignored the inter-channel relationship during the pooling process. Besides, most of the pooling methods utilize a static pooling strategy which is designed based on the experts’ knowledge rather than learnt from the data. This study proposes a pooling method that is able to learn from the training data by utilizing a learnable attention mechanism, called Normalized Attention Inter-Channel Pooling (NAIP). The mechanism of the pooling process mainly focuses on the most important feature region for generating the pooled feature map. The proposed method is compared with the state-of-the-art works which use graph convolutional neural network and convolutional neural network architectures for skeleton-based human action recognition and image classification tasks. The experiment demonstrates that the NAIP approach outperforms the existing methods under the same circumstances.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

Pixel Convolutional Networks for Skeleton-Based Human Action Recognition

Multi-scale residual network model combined with Global Average Pooling for action recognition

Article 01 October 2021

A Multi-scale Convolutional Neural Network for Skeleton-Based Human Action Recognition with Insufficient Training Samples

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Krizhevsky A, Sutskever I, Hinton GE (2017) ImageNet classification with deep convolutional neural networks. Commun ACM 60(6):84–90. https://doi.org/10.1145/3065386
Article Google Scholar
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: 3rd international conference on learning representations, ICLR 2015—conference track proceedings
Szegedy C et al. (2015) Going deeper with convolutions. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, vol. 07–12-June-2015. https://doi.org/10.1109/CVPR.2015.7298594
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, vol 2016-December. https://doi.org/10.1109/CVPR.2016.90
Singh T, Vishwakarma DK (2020) A deeply coupled ConvNet for human activity recognition using dynamic and RGB images. Neural Comput Appl 33(1):469–485. https://doi.org/10.1007/S00521-020-05018-Y
Article Google Scholar
Tong Z, Tanaka G (2019) Hybrid pooling for enhancement of generalization ability in deep convolutional neural networks. Neurocomputing. https://doi.org/10.1016/j.neucom.2018.12.036
Article Google Scholar
Lee CY, Gallagher PW, Tu Z (2016) Generalizing pooling functions in convolutional neural networks: mixed, gated, and tree. In: Proceedings of the 19th international conference on artificial intelligence and statistics, AISTATS 2016
Yu D, Wang H, Chen P, Wei Z (2014) Mixed pooling for convolutional neural networks. In: Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics), vol 8818. https://doi.org/10.1007/978-3-319-11740-9_34
Zeiler MD, Fergus R (2013) Stochastic pooling for regularization of deep convolutional neural networks. In: 1st international conference on learning representations, ICLR 2013—conference track proceedings
Wu H, Gu X (2015) Max-pooling dropout for regularization of convolutional neural networks. In: Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics), vol 9489. https://doi.org/10.1007/978-3-319-26532-2_6
Shi Z, Ye Y, Wu Y (2016) Rank-based pooling for deep convolutional neural networks. Neural Netw. https://doi.org/10.1016/j.neunet.2016.07.003
Article Google Scholar
Springenberg JT, Dosovitskiy A, Brox T, Riedmiller M (2015) Striving for simplicity: the all convolutional net. In: 3rd international conference on learning representations, ICLR 2015—workshop track proceedings
Lecun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324. https://doi.org/10.1109/5.726791
Article Google Scholar
Sermanet P, Chintala S, Lecun Y (2012) Convolutional neural networks applied to house numbers digit classification. In: Proceedings—international conference on pattern recognition
Rocke DM, Michalewicz Z (2000) Genetic algorithms + data structures = evolution programs. J Am Stat Assoc. https://doi.org/10.2307/2669583
Article Google Scholar
Chen LC, Zhu Y, Papandreou G, Schroff F, Adam H (2018) Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics), vol 11211 LNCS. https://doi.org/10.1007/978-3-030-01234-2_49
Fu J, et al. “Dual attention network for scene segmentation. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, vol 2019-June. https://doi.org/10.1109/CVPR.2019.00326
Fernando B, Gavves E, José Oramas M, Ghodrati A, Tuytelaars T (2015) Modeling video evolution for action recognition. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, vol 07–12-June-2015. https://doi.org/10.1109/CVPR.2015.7299176
Bahdanau D, Cho KH, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. In: 3rd international conference on learning representations, ICLR 2015—conference track proceedings
Parikh AP, Täckström O, Das D, Uszkoreit J (2016) A decomposable attention model for natural language inference. In: EMNLP 2016—conference on empirical methods in natural language processing, proceedings. https://doi.org/10.18653/v1/d16-1244
J Cheng, L Dong, M Lapata (2020) Long short-term memory-networks for machine reading. In: Proceedings of the 30th annual conference of the Japanese society for artificial intelligence vol 2, no 3, pp 2–4. Accessed 17 Nov 2020. [Online]. http://arxiv.org/abs/1601.06733
Zhang H, Goodfellow I, Metaxas D, Odena A (2019) Self-attention generative adversarial networks. In: 36th international conference on machine learning, ICML 2019, vol 2019-June
Veličković P, Casanova A, Liò P, Cucurull G, Romero A, Bengio Y (2018) Graph attention networks. In: 6th international conference on learning representations, ICLR 2018—conference track proceedings
Zhang J, Shi X, Xie J, Ma H, King I, Yeung DY (2018) GaAN: gated attention networks for learning on large and spatiotemporal graphs. In: 34th conference on uncertainty in artificial intelligence 2018, UAI 2018
Liu J, Shahroudy A, Perez ML, Wang G, Duan L-Y, Kot Chichung A (2019) NTU RGB+D 120: a large-scale benchmark for 3D human activity understanding. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/tpami.2019.2916873
Article Google Scholar
Kay W, et al. (2020) The kinetics human action video dataset. Accessed 26 May 2020. [Online]. http://arxiv.org/abs/1705.06950
Cao Z, Hidalgo Martinez G, Simon T, Wei S-E, Sheikh YA (2019) OpenPose: realtime multi-person 2D pose estimation using part affinity fields. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/tpami.2019.2929257
Article Google Scholar
Yan S, Xiong Y, Lin D (2018) Spatial temporal graph convolutional networks for skeleton-based action recognition. In: 32nd AAAI conference on artificial intelligence, AAAI 2018
Krizhevsky A (2009) Learning multiple layers of features from tiny images. University of Toronto, Toronto, 2009. Available: https://www.cs.toronto.edu/~kriz/learning-features-2009-TR.pdf. Accessed 5 Mar 2023
Bruna J, Zaremba W, Szlam A, LeCun Y (2014) Spectral networks and deep locally connected networks on graphs. In: 2nd international conference on learning representations, ICLR 2014—conference track proceedings
Kingma DP, Ba JL (2015) Adam: a method for stochastic optimization. In: 3rd international conference on learning representations, ICLR 2015—conference track proceedings
Parikh N (2014) Accurate, large Minibatch SGD: training ImageNet in 1 Hour. arXiv:1706.02677
Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) ImageNet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition pp 248–255. Accessed 28 Jul 2022. http://www.image-net.org
Choi D, Shallue CJ, Nado Z, Lee J, Maddison CJ, Dahl GE (2019) On empirical comparisons of optimizers for deep learning. https://doi.org/10.48550/arxiv.1910.05446
Lin M, Chen Q, Yan S (2014) Network in network. In: 2nd international conference on learning representations, ICLR 2014—conference track proceedings
Vemulapalli R, Arrate F, Chellappa R (2014) Human action recognition by representing 3D skeletons as points in a lie group. Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit. https://doi.org/10.1109/CVPR.2014.82
Article Google Scholar
Hu JF, Zheng WS, Lai J, Zhang J (2017) Jointly learning heterogeneous features for RGB-D activity recognition. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/TPAMI.2016.2640292
Article Google Scholar
Du Y, Wang W, Wang L (2015) Hierarchical recurrent neural network for skeleton based action recognition. Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit. https://doi.org/10.1109/CVPR.2015.7298714
Article Google Scholar
Liu J, Shahroudy A, Xu D, Wang G (2016) Spatio-temporal LSTM with trust gates for 3D human action recognition. In: Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics). https://doi.org/10.1007/978-3-319-46487-9_50
Kim TS, Reiter A (2017) Interpretable 3D human action analysis with temporal convolutional networks. IEEE Comput Soc Conf Comput Vis Pattern Recognit Workshops. https://doi.org/10.1109/CVPRW.2017.207
Article Google Scholar
Liu M, Liu H, Chen C (2017) Enhanced skeleton visualization for view invariant human action recognition. Pattern Recognit. https://doi.org/10.1016/j.patcog.2017.02.030
Article Google Scholar
Akhtar N, Ragavendran U (2019) Interpretation of intelligence in CNN-pooling processes: a methodological survey. Neural Comput Appl 32(3):879–898. https://doi.org/10.1007/S00521-019-04296-5
Article Google Scholar
Ayinde BO, Inanc T, Zurada JM (2019) Regularizing deep neural networks by enhancing diversity in feature extraction. IEEE Trans Neural Netw Learn Syst. https://doi.org/10.1109/TNNLS.2018.2885972
Article Google Scholar

Download references

Acknowledgements

This research was supported by Hankuk University of Foreign Studies Research Fund and also supported by the Ministry of Science and ICT of the Republic of Korea and the National Research Foundation of Korea (NRF-2021R1F1A1047577).

Author information

Authors and Affiliations

Department of Statistics and Data Science, Yonsei University, Seoul, 120-749, Republic of Korea
Feri Setiawan
Department of Industrial and Management Engineering, Hankuk University of Foreign Studies, Yongin-si, 17035, Republic of Korea
Bernardo Nugroho Yahya & Seok-Lyong Lee

Authors

Feri Setiawan
View author publications
You can also search for this author inPubMed Google Scholar
Bernardo Nugroho Yahya
View author publications
You can also search for this author inPubMed Google Scholar
Seok-Lyong Lee
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding authors

Correspondence to Bernardo Nugroho Yahya or Seok-Lyong Lee.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Setiawan, F., Yahya, B.N. & Lee, SL. Normalized Attention Inter-Channel Pooling (NAIP) for Deep Convolutional Neural Network Regularization. Neural Process Lett 55, 9315–9333 (2023). https://doi.org/10.1007/s11063-023-11203-6

Download citation

Accepted: 24 February 2023
Published: 06 March 2023
Issue Date: December 2023
DOI: https://doi.org/10.1007/s11063-023-11203-6

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

Normalized Attention Inter-Channel Pooling (NAIP) for Deep Convolutional Neural Network Regularization

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Pixel Convolutional Networks for Skeleton-Based Human Action Recognition

Multi-scale residual network model combined with Global Average Pooling for action recognition

A Multi-scale Convolutional Neural Network for Skeleton-Based Human Action Recognition with Insufficient Training Samples

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now