Abstract
Deep learning methods have recently attracted attention in solving various tasks through their learning and determining capability for powerful features. It has been known that the absence of a pooling layer in the deep learning architectures made the network performance deteriorate significantly. Existing pooling methods assume that each value in the small region of the local feature map contributes equally to generate the pooled feature map. However, some values in the pooling region may contribute partially for generating the map. In addition, existing methods usually focused on the spatial dimension and ignored the inter-channel relationship during the pooling process. Besides, most of the pooling methods utilize a static pooling strategy which is designed based on the experts’ knowledge rather than learnt from the data. This study proposes a pooling method that is able to learn from the training data by utilizing a learnable attention mechanism, called Normalized Attention Inter-Channel Pooling (NAIP). The mechanism of the pooling process mainly focuses on the most important feature region for generating the pooled feature map. The proposed method is compared with the state-of-the-art works which use graph convolutional neural network and convolutional neural network architectures for skeleton-based human action recognition and image classification tasks. The experiment demonstrates that the NAIP approach outperforms the existing methods under the same circumstances.







Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Krizhevsky A, Sutskever I, Hinton GE (2017) ImageNet classification with deep convolutional neural networks. Commun ACM 60(6):84–90. https://doi.org/10.1145/3065386
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: 3rd international conference on learning representations, ICLR 2015—conference track proceedings
Szegedy C et al. (2015) Going deeper with convolutions. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, vol. 07–12-June-2015. https://doi.org/10.1109/CVPR.2015.7298594
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, vol 2016-December. https://doi.org/10.1109/CVPR.2016.90
Singh T, Vishwakarma DK (2020) A deeply coupled ConvNet for human activity recognition using dynamic and RGB images. Neural Comput Appl 33(1):469–485. https://doi.org/10.1007/S00521-020-05018-Y
Tong Z, Tanaka G (2019) Hybrid pooling for enhancement of generalization ability in deep convolutional neural networks. Neurocomputing. https://doi.org/10.1016/j.neucom.2018.12.036
Lee CY, Gallagher PW, Tu Z (2016) Generalizing pooling functions in convolutional neural networks: mixed, gated, and tree. In: Proceedings of the 19th international conference on artificial intelligence and statistics, AISTATS 2016
Yu D, Wang H, Chen P, Wei Z (2014) Mixed pooling for convolutional neural networks. In: Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics), vol 8818. https://doi.org/10.1007/978-3-319-11740-9_34
Zeiler MD, Fergus R (2013) Stochastic pooling for regularization of deep convolutional neural networks. In: 1st international conference on learning representations, ICLR 2013—conference track proceedings
Wu H, Gu X (2015) Max-pooling dropout for regularization of convolutional neural networks. In: Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics), vol 9489. https://doi.org/10.1007/978-3-319-26532-2_6
Shi Z, Ye Y, Wu Y (2016) Rank-based pooling for deep convolutional neural networks. Neural Netw. https://doi.org/10.1016/j.neunet.2016.07.003
Springenberg JT, Dosovitskiy A, Brox T, Riedmiller M (2015) Striving for simplicity: the all convolutional net. In: 3rd international conference on learning representations, ICLR 2015—workshop track proceedings
Lecun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324. https://doi.org/10.1109/5.726791
Sermanet P, Chintala S, Lecun Y (2012) Convolutional neural networks applied to house numbers digit classification. In: Proceedings—international conference on pattern recognition
Rocke DM, Michalewicz Z (2000) Genetic algorithms + data structures = evolution programs. J Am Stat Assoc. https://doi.org/10.2307/2669583
Chen LC, Zhu Y, Papandreou G, Schroff F, Adam H (2018) Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics), vol 11211 LNCS. https://doi.org/10.1007/978-3-030-01234-2_49
Fu J, et al. “Dual attention network for scene segmentation. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, vol 2019-June. https://doi.org/10.1109/CVPR.2019.00326
Fernando B, Gavves E, José Oramas M, Ghodrati A, Tuytelaars T (2015) Modeling video evolution for action recognition. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, vol 07–12-June-2015. https://doi.org/10.1109/CVPR.2015.7299176
Bahdanau D, Cho KH, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. In: 3rd international conference on learning representations, ICLR 2015—conference track proceedings
Parikh AP, Täckström O, Das D, Uszkoreit J (2016) A decomposable attention model for natural language inference. In: EMNLP 2016—conference on empirical methods in natural language processing, proceedings. https://doi.org/10.18653/v1/d16-1244
J Cheng, L Dong, M Lapata (2020) Long short-term memory-networks for machine reading. In: Proceedings of the 30th annual conference of the Japanese society for artificial intelligence vol 2, no 3, pp 2–4. Accessed 17 Nov 2020. [Online]. http://arxiv.org/abs/1601.06733
Zhang H, Goodfellow I, Metaxas D, Odena A (2019) Self-attention generative adversarial networks. In: 36th international conference on machine learning, ICML 2019, vol 2019-June
Veličković P, Casanova A, Liò P, Cucurull G, Romero A, Bengio Y (2018) Graph attention networks. In: 6th international conference on learning representations, ICLR 2018—conference track proceedings
Zhang J, Shi X, Xie J, Ma H, King I, Yeung DY (2018) GaAN: gated attention networks for learning on large and spatiotemporal graphs. In: 34th conference on uncertainty in artificial intelligence 2018, UAI 2018
Liu J, Shahroudy A, Perez ML, Wang G, Duan L-Y, Kot Chichung A (2019) NTU RGB+D 120: a large-scale benchmark for 3D human activity understanding. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/tpami.2019.2916873
Kay W, et al. (2020) The kinetics human action video dataset. Accessed 26 May 2020. [Online]. http://arxiv.org/abs/1705.06950
Cao Z, Hidalgo Martinez G, Simon T, Wei S-E, Sheikh YA (2019) OpenPose: realtime multi-person 2D pose estimation using part affinity fields. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/tpami.2019.2929257
Yan S, Xiong Y, Lin D (2018) Spatial temporal graph convolutional networks for skeleton-based action recognition. In: 32nd AAAI conference on artificial intelligence, AAAI 2018
Krizhevsky A (2009) Learning multiple layers of features from tiny images. University of Toronto, Toronto, 2009. Available: https://www.cs.toronto.edu/~kriz/learning-features-2009-TR.pdf. Accessed 5 Mar 2023
Bruna J, Zaremba W, Szlam A, LeCun Y (2014) Spectral networks and deep locally connected networks on graphs. In: 2nd international conference on learning representations, ICLR 2014—conference track proceedings
Kingma DP, Ba JL (2015) Adam: a method for stochastic optimization. In: 3rd international conference on learning representations, ICLR 2015—conference track proceedings
Parikh N (2014) Accurate, large Minibatch SGD: training ImageNet in 1 Hour. arXiv:1706.02677
Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) ImageNet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition pp 248–255. Accessed 28 Jul 2022. http://www.image-net.org
Choi D, Shallue CJ, Nado Z, Lee J, Maddison CJ, Dahl GE (2019) On empirical comparisons of optimizers for deep learning. https://doi.org/10.48550/arxiv.1910.05446
Lin M, Chen Q, Yan S (2014) Network in network. In: 2nd international conference on learning representations, ICLR 2014—conference track proceedings
Vemulapalli R, Arrate F, Chellappa R (2014) Human action recognition by representing 3D skeletons as points in a lie group. Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit. https://doi.org/10.1109/CVPR.2014.82
Hu JF, Zheng WS, Lai J, Zhang J (2017) Jointly learning heterogeneous features for RGB-D activity recognition. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/TPAMI.2016.2640292
Du Y, Wang W, Wang L (2015) Hierarchical recurrent neural network for skeleton based action recognition. Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit. https://doi.org/10.1109/CVPR.2015.7298714
Liu J, Shahroudy A, Xu D, Wang G (2016) Spatio-temporal LSTM with trust gates for 3D human action recognition. In: Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics). https://doi.org/10.1007/978-3-319-46487-9_50
Kim TS, Reiter A (2017) Interpretable 3D human action analysis with temporal convolutional networks. IEEE Comput Soc Conf Comput Vis Pattern Recognit Workshops. https://doi.org/10.1109/CVPRW.2017.207
Liu M, Liu H, Chen C (2017) Enhanced skeleton visualization for view invariant human action recognition. Pattern Recognit. https://doi.org/10.1016/j.patcog.2017.02.030
Akhtar N, Ragavendran U (2019) Interpretation of intelligence in CNN-pooling processes: a methodological survey. Neural Comput Appl 32(3):879–898. https://doi.org/10.1007/S00521-019-04296-5
Ayinde BO, Inanc T, Zurada JM (2019) Regularizing deep neural networks by enhancing diversity in feature extraction. IEEE Trans Neural Netw Learn Syst. https://doi.org/10.1109/TNNLS.2018.2885972
Acknowledgements
This research was supported by Hankuk University of Foreign Studies Research Fund and also supported by the Ministry of Science and ICT of the Republic of Korea and the National Research Foundation of Korea (NRF-2021R1F1A1047577).
Author information
Authors and Affiliations
Corresponding authors
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Setiawan, F., Yahya, B.N. & Lee, SL. Normalized Attention Inter-Channel Pooling (NAIP) for Deep Convolutional Neural Network Regularization. Neural Process Lett 55, 9315–9333 (2023). https://doi.org/10.1007/s11063-023-11203-6
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11063-023-11203-6