Adaptive local recalibration network for scene recognition

Wang, Jiale; Zou, Lian; Fan, Cien; Jiang, Hao; Chen, Liqiong; Cheng, Mofan; Yu, Hu; Liu, Yifeng

doi:10.1007/s10489-023-04963-0

Adaptive local recalibration network for scene recognition

Published: 19 September 2023

Volume 53, pages 27935–27950, (2023)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Jiale Wang¹,
Lian Zou¹,
Cien Fan¹,
Hao Jiang¹,
Liqiong Chen¹,
Mofan Cheng¹,
Hu Yu¹ &
…
Yifeng Liu²

327 Accesses
Explore all metrics

Abstract

Scene recognition is a computer vision task that categorizes scenes from photographs. In this paper, we introuduce the Adaptive Local Recalibration Network (ALR-Net), a novel scene recognition method based on convolutional neural networks (CNNs). In comparison to the object classification task, the scene classification images have a more dispersed distribution of information. To solve this issue, we suggest an attention mechanism for locating the discriminative regions for scene recognition. Along with normal data augmentation, we use the regions to guide two additional data augmentation approaches, namely adaptive cropping and adaptive hiding, in order to capture local information more efficiently and specifically. Attention maps are also used to adaptively recalibrate scene feature maps so that discriminative regions receive more attention than others. In addition, we bring in a scene distribution label for each image, which is used to assist the training of attention maps. Extensive studies on two scene recognition benchmarks verified the proposed model’s effectiveness: MIT67 (88.37%) and SUN397 (74.24%).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

Multi-level information fusion Transformer with background filter for fine-grained image recognition

Article 20 June 2024

FCT: fusing CNN and transformer for scene classification

Article 15 September 2022

Deep Attention Network for Remote Sensing Scene Classification

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data Availability

All data generated or analysed during this study are included in these published articles [32, 48, 49].

References

Zhou B, Lapedriza A, Xiao J, Torralba A, Oliva A (2014) Learning deep features for scene recognition using places database. Advances in Neural Information Processing Systems (NIPS) 27:2014
Google Scholar
Liu T, Wang J, Yang B, Wang X (2021) Ngdnet: Nonuniform gaussian-label distribution learning for infrared head pose estimation and ontask behavior understanding in the classroom. Neurocomputing, 436:210–220
Li Z, Liu H, Zhang Z, Liu T, Xiong NN (2021) Learning knowledge graph embedding with heterogeneous relation attention networks. IEEE Trans Neural Netw Learn Syst 33(8):3961–3973
H Liu, C Zheng, D Li, X Shen, K Lin, J Wang, Z Zhang, Z Zhang, NN Xiong. Edmf: Efficient deep matrix factorization with review feature learning for industrial recommender system. IEEE Transactions on Industrial Informatics, 18(7):4361–4371, 2021
Wang Z, Wang L, Wang Y, Zhang B, Qiao Y (2017) Weakly supervised patchnets: Describing and aggregating local patches for scene recognition. IEEE Trans Image Process 26(4):2028–2041
Wu R, Wang B, Wang W, Yu Y (2015) Harvesting discriminative meta objects with deep cnn features for scene classification. In Proceedings of the IEEE International Conference on Computer Vision, pages 1287–1295
Cheng X, Lu J, Feng J, Yuan B, Zhou J (2018) Scene recognition with objectness. Pattern Recognition 74:474–487
Zhao Z and Larson M (2018) From volcano to toyshop: Adaptive discriminative region discovery for scene recognition. In Proceedings of the 26th ACM international conference on Multimedia, pages 1760–1768
Zhou B, Khosla A, Lapedriza A, Oliva A, Torralba A (2016) Learning deep features for discriminative localization. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2921–2929
Simon M and Rodner E (2015) Neural activation constellations: Unsupervised part model discovery with convolutional networks. In Proceedings of the IEEE international conference on computer vision, pages 1143–1151
Song X, Jiang S, Herranz L (2017) Multi-scale multi-feature context modeling for scene recognition in the semantic manifold. IEEE Transactions on Image Processing, 26(6):2721–2735
Zeng H, Song X, Chen G, Jiang S (2019) Learning scene attribute for scene recognition. IEEE Transactions on Multimedia 22(6):1519–1530
Yu L, Jin M, Zhou K (2020) Multi-channel biomimetic visual transformation for object feature extraction and recognition of complex scenes. Applied Intelligence 50(3):792–811
Patterson G, Hays J (2012) Sun attribute database: Discovering, annotating, and recognizing scene attributes. In 2012 IEEE Conference on Computer Vision and Pattern Recognition, pages 2751–2758. IEEE
Patterson G, Xu C, Su H, Hays J (2014) The sun attribute database: Beyond categories for deeper scene understanding. International Journal of Computer Vision, 108(1-2):59–81
Wang L, Guo S, Huang W, Xiong Y, Qiao Y (2017) Knowledge guided disambiguation for large-scale scene classification with multiresolution cnns. IEEE Transactions on Image Processing 26(4):2055–2068
Gao BB, Xing C, Xie CW, Wu J, Geng X (2017) Deep label distribution learning with label ambiguity. IEEE Transactions on Image Processing, 26(6):2825–2838
Tanaka D, Ikami D, Yamasaki T, Aizawa K (2018) Joint optimization framework for learning with noisy labels. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5552–5560
Yi K, Wu J (2019) Probabilistic end-to-end noise correction for learning with noisy labels. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7017–7025
Liu JB, Huang YP, Zou Q, Wang SC (2019) Learning representative features via constrictive annular loss for image classification. Applied Intelligence, 49(8):3082–3092
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems 25:1097–1105
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1–9
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778
Yuan C, Wu Y, Qin X, Qiao S, Pan Y, Huang P, Liu D, Han N (2019) An effective image classification method for shallow densely connected convolution networks through squeezing and splitting techniques. Applied Intelligence 49(10):3570–3586
Article Google Scholar
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 7132–7141
Park J, Woo S, Lee JY, Kweon IS (2018) Bam: Bottleneck attention module. arXiv:1807.06514
Woo S, Park J, Lee JY, Kweon IS (2018) Cbam: Convolutional block attention module. In Proceedings of the European conference on computer vision (ECCV), pages 3–19
Liu H, Nie H, Zhang Z, Li YF (2021) Anisotropic angle distribution learning for head pose estimation and attention understanding in human-computer interaction. Neurocomputing 433:310–322
Article Google Scholar
Liu H, Fang S, Zhang Z, Li D, Lin K, Wang J (2021) Mfdnet: Collaborative poses perception and matrix fisher distribution for head pose estimation. IEEE Trans Multimedia 24:2449–2460
Article Google Scholar
Deng Y, Chen H, Chen H, Li Y (2021) Learning from images: A distillation learning framework for event cameras. IEEE Trans Image Process 30:4919–4931
Article Google Scholar
Zhou B, Lapedriza A, Khosla A, Oliva A, Torralba A (2017) Places: A 10 million image database for scene recognition. IEEE Trans Pattern Anal Mach Intell 40(6):1452–1464
Article Google Scholar
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. International journal of computer vision 60(2):91–110
Article Google Scholar
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05), volume 1, pages 886–893. Ieee
Oliva A, Torralba A (2001) Modeling the shape of the scene: A holistic representation of the spatial envelope. International journal of computer vision 42(3):145–175
Article MATH Google Scholar
Jégou H, Perronnin F, Douze M, Sánchez J, Pérez P, Schmid C (2011) Aggregating local image descriptors into compact codes. IEEE transactions on pattern analysis and machine intelligence 34(9):1704–1716
Article Google Scholar
Perronnin F, Sánchez J, Mensink T (2010) Improving the fisher kernel for large-scale image classification. In European conference on computer vision, pages 143–156. Springer
Liu H, Wang X, Zhang W, Zhang Z, Li YF (2020) Infrared head pose estimation with multi-scales feature fusion on the irhp database for human attention recognition. Neurocomputing 411:510–520
Article Google Scholar
Deng Y, Chen H, Li Y (2021) Mvf-net: A multi-view fusion network for event-based object classification. IEEE Transactions on Circuits and Systems for Video Technology 32(12):8275–8284
Article Google Scholar
Zheng H, Fu J, Mei T, Luo J (2017) Learning multi-attention convolutional neural network for fine-grained image recognition. In Proceedings of the IEEE international conference on computer vision, pages 5209–5217
Yang Z, Luo T, Wang D, Hu Z, Gao J, Wang L (2018) Learning to navigate for fine-grained classification. In Proceedings of the European Conference on Computer Vision (ECCV), pages 420–435
Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958
MathSciNet MATH Google Scholar
Ioffe S, Szegedy C (2015) Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning, pages 448–456. PMLR
Singh KK, Lee YJ (2017) Hide-and-seek: Forcing a network to be meticulous for weakly-supervised object and action localization. In 2017 IEEE international conference on computer vision (ICCV), pages 3544–3553. IEEE
Zhong Z, Zheng L, Kang G, Li S, Yang Y (2020) Random erasing data augmentation. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 13001–13008
DeVries T, Taylor GW (2017) Improved regularization of convolutional neural networks with cutout. arXiv:1708.04552
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. arXiv:1506.01497
Quattoni A, Torralba A (2009) Recognizing indoor scenes. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, pages 413–420. IEEE
Xiao J, Hays J, Ehinger KA, Oliva A, Torralba A (2010) Sun database: Large-scale scene recognition from abbey to zoo. In 2010 IEEE computer society conference on computer vision and pattern recognition, pages 3485–3492. IEEE
Goyal P, Dollár P, Girshick R, Noordhuis P, Wesolowski L, Kyrola A, Tulloch A, Jia Y, He K (2017) Accurate, large minibatch sgd: Training imagenet in 1 hour. arXiv:1706.02677
Sitaula C, Xiang Y, Aryal S, Lu X (2021) Scene image representation by foreground, background and hybrid features. Expert Systems with Applications, page 115285
Guo S, Huang W, Wang L, Qiao Y (2016) Locally supervised deep hybrid model for scene recognition. IEEE transactions on image processing 26(2):808–820
Xie GS, Zhang XY, Yan S, Liu CL (2015) Hybrid cnn and dictionary-based models for scene recognition and domain adaptation. IEEE Transactions on Circuits and Systems for Video Technology 27(6):1263–1274
Article Google Scholar
Herranz L, Jiang S, Li X (2016) Scene recognition with cnns: objects, scales and dataset bias. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 571–579
López-Cifuentes A, Escudero-Viñolo M (2020) Jesús Bescós, Á García-Martín. Semantic-aware scene recognition. Pattern Recognition 102:107256
Article Google Scholar
Chen G, Song X, Zeng H, Jiang S (2020) Scene recognition with prototype-agnostic scene layout. IEEE Transactions on Image Processing, 29:5877–5888

Download references

Funding

The numerical calculations in this paper have been done on the supercomputing system in the Supercomputing Center of Wuhan University. This work is supported by the National Natural Science Foundation of China Enterprise Innovation and Development Joint Fund (Project No. U19B2004).

Author information

Authors and Affiliations

School of Electronic Information, Wuhan University, Wuhan, 430072, China
Jiale Wang, Lian Zou, Cien Fan, Hao Jiang, Liqiong Chen, Mofan Cheng & Hu Yu
National Engineering Laboratory for Risk Perception and Prevention (NEL-RPP), Beijing, 100041, China
Yifeng Liu

Authors

Jiale Wang
View author publications
You can also search for this author in PubMed Google Scholar
Lian Zou
View author publications
You can also search for this author in PubMed Google Scholar
Cien Fan
View author publications
You can also search for this author in PubMed Google Scholar
Hao Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Liqiong Chen
View author publications
You can also search for this author in PubMed Google Scholar
Mofan Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Hu Yu
View author publications
You can also search for this author in PubMed Google Scholar
Yifeng Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lian Zou.

Ethics declarations

Conflicts of interest

The authors declare no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Wang, J., Zou, L., Fan, C. et al. Adaptive local recalibration network for scene recognition. Appl Intell 53, 27935–27950 (2023). https://doi.org/10.1007/s10489-023-04963-0

Download citation

Accepted: 11 August 2023
Published: 19 September 2023
Issue Date: December 2023
DOI: https://doi.org/10.1007/s10489-023-04963-0

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

Adaptive local recalibration network for scene recognition

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Multi-level information fusion Transformer with background filter for fine-grained image recognition

FCT: fusing CNN and transformer for scene classification

Deep Attention Network for Remote Sensing Scene Classification

Data Availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Adaptive local recalibration network for scene recognition

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Multi-level information fusion Transformer with background filter for fine-grained image recognition

FCT: fusing CNN and transformer for scene classification

Deep Attention Network for Remote Sensing Scene Classification

Explore related subjects

Data Availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation