Abstract
Scene recognition is a computer vision task that categorizes scenes from photographs. In this paper, we introuduce the Adaptive Local Recalibration Network (ALR-Net), a novel scene recognition method based on convolutional neural networks (CNNs). In comparison to the object classification task, the scene classification images have a more dispersed distribution of information. To solve this issue, we suggest an attention mechanism for locating the discriminative regions for scene recognition. Along with normal data augmentation, we use the regions to guide two additional data augmentation approaches, namely adaptive cropping and adaptive hiding, in order to capture local information more efficiently and specifically. Attention maps are also used to adaptively recalibrate scene feature maps so that discriminative regions receive more attention than others. In addition, we bring in a scene distribution label for each image, which is used to assist the training of attention maps. Extensive studies on two scene recognition benchmarks verified the proposed model’s effectiveness: MIT67 (88.37%) and SUN397 (74.24%).
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Zhou B, Lapedriza A, Xiao J, Torralba A, Oliva A (2014) Learning deep features for scene recognition using places database. Advances in Neural Information Processing Systems (NIPS) 27:2014
Liu T, Wang J, Yang B, Wang X (2021) Ngdnet: Nonuniform gaussian-label distribution learning for infrared head pose estimation and ontask behavior understanding in the classroom. Neurocomputing, 436:210–220
Li Z, Liu H, Zhang Z, Liu T, Xiong NN (2021) Learning knowledge graph embedding with heterogeneous relation attention networks. IEEE Trans Neural Netw Learn Syst 33(8):3961–3973
H Liu, C Zheng, D Li, X Shen, K Lin, J Wang, Z Zhang, Z Zhang, NN Xiong. Edmf: Efficient deep matrix factorization with review feature learning for industrial recommender system. IEEE Transactions on Industrial Informatics, 18(7):4361–4371, 2021
Wang Z, Wang L, Wang Y, Zhang B, Qiao Y (2017) Weakly supervised patchnets: Describing and aggregating local patches for scene recognition. IEEE Trans Image Process 26(4):2028–2041
Wu R, Wang B, Wang W, Yu Y (2015) Harvesting discriminative meta objects with deep cnn features for scene classification. In Proceedings of the IEEE International Conference on Computer Vision, pages 1287–1295
Cheng X, Lu J, Feng J, Yuan B, Zhou J (2018) Scene recognition with objectness. Pattern Recognition 74:474–487
Zhao Z and Larson M (2018) From volcano to toyshop: Adaptive discriminative region discovery for scene recognition. In Proceedings of the 26th ACM international conference on Multimedia, pages 1760–1768
Zhou B, Khosla A, Lapedriza A, Oliva A, Torralba A (2016) Learning deep features for discriminative localization. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2921–2929
Simon M and Rodner E (2015) Neural activation constellations: Unsupervised part model discovery with convolutional networks. In Proceedings of the IEEE international conference on computer vision, pages 1143–1151
Song X, Jiang S, Herranz L (2017) Multi-scale multi-feature context modeling for scene recognition in the semantic manifold. IEEE Transactions on Image Processing, 26(6):2721–2735
Zeng H, Song X, Chen G, Jiang S (2019) Learning scene attribute for scene recognition. IEEE Transactions on Multimedia 22(6):1519–1530
Yu L, Jin M, Zhou K (2020) Multi-channel biomimetic visual transformation for object feature extraction and recognition of complex scenes. Applied Intelligence 50(3):792–811
Patterson G, Hays J (2012) Sun attribute database: Discovering, annotating, and recognizing scene attributes. In 2012 IEEE Conference on Computer Vision and Pattern Recognition, pages 2751–2758. IEEE
Patterson G, Xu C, Su H, Hays J (2014) The sun attribute database: Beyond categories for deeper scene understanding. International Journal of Computer Vision, 108(1-2):59–81
Wang L, Guo S, Huang W, Xiong Y, Qiao Y (2017) Knowledge guided disambiguation for large-scale scene classification with multiresolution cnns. IEEE Transactions on Image Processing 26(4):2055–2068
Gao BB, Xing C, Xie CW, Wu J, Geng X (2017) Deep label distribution learning with label ambiguity. IEEE Transactions on Image Processing, 26(6):2825–2838
Tanaka D, Ikami D, Yamasaki T, Aizawa K (2018) Joint optimization framework for learning with noisy labels. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5552–5560
Yi K, Wu J (2019) Probabilistic end-to-end noise correction for learning with noisy labels. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7017–7025
Liu JB, Huang YP, Zou Q, Wang SC (2019) Learning representative features via constrictive annular loss for image classification. Applied Intelligence, 49(8):3082–3092
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems 25:1097–1105
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1–9
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778
Yuan C, Wu Y, Qin X, Qiao S, Pan Y, Huang P, Liu D, Han N (2019) An effective image classification method for shallow densely connected convolution networks through squeezing and splitting techniques. Applied Intelligence 49(10):3570–3586
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 7132–7141
Park J, Woo S, Lee JY, Kweon IS (2018) Bam: Bottleneck attention module. arXiv:1807.06514
Woo S, Park J, Lee JY, Kweon IS (2018) Cbam: Convolutional block attention module. In Proceedings of the European conference on computer vision (ECCV), pages 3–19
Liu H, Nie H, Zhang Z, Li YF (2021) Anisotropic angle distribution learning for head pose estimation and attention understanding in human-computer interaction. Neurocomputing 433:310–322
Liu H, Fang S, Zhang Z, Li D, Lin K, Wang J (2021) Mfdnet: Collaborative poses perception and matrix fisher distribution for head pose estimation. IEEE Trans Multimedia 24:2449–2460
Deng Y, Chen H, Chen H, Li Y (2021) Learning from images: A distillation learning framework for event cameras. IEEE Trans Image Process 30:4919–4931
Zhou B, Lapedriza A, Khosla A, Oliva A, Torralba A (2017) Places: A 10 million image database for scene recognition. IEEE Trans Pattern Anal Mach Intell 40(6):1452–1464
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. International journal of computer vision 60(2):91–110
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05), volume 1, pages 886–893. Ieee
Oliva A, Torralba A (2001) Modeling the shape of the scene: A holistic representation of the spatial envelope. International journal of computer vision 42(3):145–175
Jégou H, Perronnin F, Douze M, Sánchez J, Pérez P, Schmid C (2011) Aggregating local image descriptors into compact codes. IEEE transactions on pattern analysis and machine intelligence 34(9):1704–1716
Perronnin F, Sánchez J, Mensink T (2010) Improving the fisher kernel for large-scale image classification. In European conference on computer vision, pages 143–156. Springer
Liu H, Wang X, Zhang W, Zhang Z, Li YF (2020) Infrared head pose estimation with multi-scales feature fusion on the irhp database for human attention recognition. Neurocomputing 411:510–520
Deng Y, Chen H, Li Y (2021) Mvf-net: A multi-view fusion network for event-based object classification. IEEE Transactions on Circuits and Systems for Video Technology 32(12):8275–8284
Zheng H, Fu J, Mei T, Luo J (2017) Learning multi-attention convolutional neural network for fine-grained image recognition. In Proceedings of the IEEE international conference on computer vision, pages 5209–5217
Yang Z, Luo T, Wang D, Hu Z, Gao J, Wang L (2018) Learning to navigate for fine-grained classification. In Proceedings of the European Conference on Computer Vision (ECCV), pages 420–435
Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958
Ioffe S, Szegedy C (2015) Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning, pages 448–456. PMLR
Singh KK, Lee YJ (2017) Hide-and-seek: Forcing a network to be meticulous for weakly-supervised object and action localization. In 2017 IEEE international conference on computer vision (ICCV), pages 3544–3553. IEEE
Zhong Z, Zheng L, Kang G, Li S, Yang Y (2020) Random erasing data augmentation. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 13001–13008
DeVries T, Taylor GW (2017) Improved regularization of convolutional neural networks with cutout. arXiv:1708.04552
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. arXiv:1506.01497
Quattoni A, Torralba A (2009) Recognizing indoor scenes. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, pages 413–420. IEEE
Xiao J, Hays J, Ehinger KA, Oliva A, Torralba A (2010) Sun database: Large-scale scene recognition from abbey to zoo. In 2010 IEEE computer society conference on computer vision and pattern recognition, pages 3485–3492. IEEE
Goyal P, Dollár P, Girshick R, Noordhuis P, Wesolowski L, Kyrola A, Tulloch A, Jia Y, He K (2017) Accurate, large minibatch sgd: Training imagenet in 1 hour. arXiv:1706.02677
Sitaula C, Xiang Y, Aryal S, Lu X (2021) Scene image representation by foreground, background and hybrid features. Expert Systems with Applications, page 115285
Guo S, Huang W, Wang L, Qiao Y (2016) Locally supervised deep hybrid model for scene recognition. IEEE transactions on image processing 26(2):808–820
Xie GS, Zhang XY, Yan S, Liu CL (2015) Hybrid cnn and dictionary-based models for scene recognition and domain adaptation. IEEE Transactions on Circuits and Systems for Video Technology 27(6):1263–1274
Herranz L, Jiang S, Li X (2016) Scene recognition with cnns: objects, scales and dataset bias. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 571–579
López-Cifuentes A, Escudero-Viñolo M (2020) Jesús Bescós, Á García-Martín. Semantic-aware scene recognition. Pattern Recognition 102:107256
Chen G, Song X, Zeng H, Jiang S (2020) Scene recognition with prototype-agnostic scene layout. IEEE Transactions on Image Processing, 29:5877–5888
Funding
The numerical calculations in this paper have been done on the supercomputing system in the Supercomputing Center of Wuhan University. This work is supported by the National Natural Science Foundation of China Enterprise Innovation and Development Joint Fund (Project No. U19B2004).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest
The authors declare no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wang, J., Zou, L., Fan, C. et al. Adaptive local recalibration network for scene recognition. Appl Intell 53, 27935–27950 (2023). https://doi.org/10.1007/s10489-023-04963-0
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-023-04963-0