计算机科学 ›› 2022, Vol. 49 ›› Issue (4): 209-214.doi: 10.11896/jsjkx.210100135
许华杰1,2, 秦远卓1, 杨洋1
XU Hua-jie1,2, QIN Yuan-zhuo1, YANG Yang1
摘要: 场景图像通常由背景信息和前景目标对象构成,用于场景识别任务的卷积神经网络(CNN)通常需要根据场景中关键目标的特征,甚至结合目标之间的位置关系来识别出场景所属类别。针对场景图像中较小尺寸的关键目标特征随着网络层次的加深而逐渐消失,从而导致场景识别错误的问题,提出了一种基于多级特征融合与注意力模块的场景识别方法。首先,将深度神经网络ResNet-18的特征提取部分划分出5个分支;然后,将5个分支输出的多级特征进行融合,利用融合后的特征进行场景识别和分类,以弥补丢失的目标信息;最后,在网络中加入改进的注意力模块,以达到着重学习场景图像中关键目标的目的,进一步提升识别效果。在多个场景数据集上进行实验对比,结果表明,所提方法在MIT-67,SUN-397和UIUC-Sports这3个场景数据集上的识别准确率分别达到了88.2%,79.9%和97.7%,相比目前主流的场景识别方法其具有更高的识别准确率。
中图分类号:
[1] TIAN Y L,ZHANG W T,ZHANG Q S,et al.Review on Image Scene Classification Technology[J].Acta Electronica Sinica,2019,47(4):915-926. [2] XU J L,LI L Y,WAN X J,et al.Indoor scene recognition me-thod combined with target detection[J].Computer Application,2021,41(3):1-6. [3] LI X Y,ZHU J,MA L N.Survey of Scene Recognition Methods Based on Deep Learning[J].Computer Engineering and Applications,2020,56(5):25-33. [4] LUIS H,JIANG S,LI X.Scene Recognition with CNNs:Objects,Scales and Dataset Bias[C]//Proceedings of International Conference on Computer Vision and Pattern Recognition.2016:571-579. [5] ZHANG L H,LI L Q,PAN X P,et al.Multi-level ensemble network for scene recognition[J].Multimedia Tools and Applications,2019,78(19):28209-28230. [6] KUDUS A R,TEH C S.Design and Development of Scene Re-cognition and Classification Model Based on Human Preattention Visual Attention[J].Journal of Physics:Conference Series,2021,1755(1):1-12. [7] WOO S,PARK J,LEE J Y,et al.CBAM:Convolutional Block Attention Module[C]//Proceedings of the 2018 European Conference on Computer Vision.2018:3-19. [8] BAI S,TANG H D,AN S.Coordinate CNNs and LSTMs to ca-tegrize scene images with multi-views and multi-levels of abstraction[J].Expert Systems with Applications,2019,120:298-309. [9] BAI S.Growing random forest on deep convolutional neural networks for scene categorization[J].Expert Systems with Applications,2017,71:279-287. [10] TANG P,WANG H,KWONG S.G-MS2F:GoogleNet basedmulti-stage feature fusion of deep CNN for scene recognition[J].IEEE Geoscience and Remote Sensing Letter,2017,225:188-197. [11] GUO S,HUANG W,QIAO Y.Locally supervised deep hybrid model for scene recognition[J].IEEE Transactions on Image Processing,2017,26(2):808-820. [12] ZHAO H,SHI J,QI X,et al.Pyramid scene parsing network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:2881-2890. [13] HU J,LI S,GANG S.Squeeze-and-Excitation Networks[C]//The 2018 IEEE Conference on Computer Vision and Pattern Recognition(CVPR).Salt Lake City,UT,USA,2018:7132-7141. [14] HE K,ZHANG X,REN S,et al.Deep Residual Learning forImage Recognition[C]//Proceedings of the International Confe-rence on Computer Vision and Pattern Recognition.2016:770-778. [15] QUATTONI A,TORRALBA A.Recognizing indoor scenes[C]//Proceedings of International Conference on Computer Vision and Pattern Recognition.2009:413-420. [16] XIAO J,HAYS J,EHINGER K A,et al.SUN database:Large-scale Scene Recognition from abbey to zoo[C]//Proceedings of International Conference on Computer Vision and Pattern Re-cognition.2010:3485-3492. [17] LI L J,LI F F.What,Where and Who?Classifying Events by Scene and Object Recognition[C]//Proceedings of the International Conference on Computer Vision and Pattern Recognition.2007:1-8. [18] BAI S,TANG H.Categorizing scenes by exploring scene partinformation without constructing explicit models[J].Neurocomputing,2018(281):160-168. [19] XIE G S,ZHANG X Y,YAN S,et al.Hybrid CNN and dictio-nary-based models for scene recognition and domain adaption[J].IEEE Transaction on Circuits & Systems for Video Technology,2017,27(6):1263-1274. [20] MENG X,WANG Z,WU L.Building global image features for scene recognition[J].Pattern Recognition,2012(45):373-380. [21] GAO C,SANG N,HUANG R.Spatial multi-scale gradientorientation consistency for place instance and scene category re-cognition[J].Information Sciences,2016(372):84-97. [22] SADEGHI F,TAPPEN M F.Latent pyramidal regions for re-cognizing scenes[C]//Proceedings of European Conference on Computer Vision.Florence,2012:228-241. [23] HUANG C,LUO W,XIE Y.Local-class-shared topic latentdirichlet allocation based scene classification[J].Multi-media Tools and Applications,2017,76(14):15661-15679. |
[1] | 周乐员, 张剑华, 袁甜甜, 陈胜勇. 多层注意力机制融合的序列到序列中国连续手语识别和翻译 Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion 计算机科学, 2022, 49(9): 155-161. https://doi.org/10.11896/jsjkx.210800026 |
[2] | 李宗民, 张玉鹏, 刘玉杰, 李华. 基于可变形图卷积的点云表征学习 Deformable Graph Convolutional Networks Based Point Cloud Representation Learning 计算机科学, 2022, 49(8): 273-278. https://doi.org/10.11896/jsjkx.210900023 |
[3] | 陈泳全, 姜瑛. 基于卷积神经网络的APP用户行为分析方法 Analysis Method of APP User Behavior Based on Convolutional Neural Network 计算机科学, 2022, 49(8): 78-85. https://doi.org/10.11896/jsjkx.210700121 |
[4] | 朱承璋, 黄嘉儿, 肖亚龙, 王晗, 邹北骥. 基于注意力机制的医学影像深度哈希检索算法 Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism 计算机科学, 2022, 49(8): 113-119. https://doi.org/10.11896/jsjkx.210700153 |
[5] | 魏恺轩, 付莹. 基于重参数化多尺度融合网络的高效极暗光原始图像降噪 Re-parameterized Multi-scale Fusion Network for Efficient Extreme Low-light Raw Denoising 计算机科学, 2022, 49(8): 120-126. https://doi.org/10.11896/jsjkx.220200179 |
[6] | 刘冬梅, 徐洋, 吴泽彬, 刘倩, 宋斌, 韦志辉. 基于边框距离度量的增量目标检测方法 Incremental Object Detection Method Based on Border Distance Measurement 计算机科学, 2022, 49(8): 136-142. https://doi.org/10.11896/jsjkx.220100132 |
[7] | 檀莹莹, 王俊丽, 张超波. 基于图卷积神经网络的文本分类方法研究综述 Review of Text Classification Methods Based on Graph Convolutional Network 计算机科学, 2022, 49(8): 205-216. https://doi.org/10.11896/jsjkx.210800064 |
[8] | 张颖涛, 张杰, 张睿, 张文强. 全局信息引导的真实图像风格迁移 Photorealistic Style Transfer Guided by Global Information 计算机科学, 2022, 49(7): 100-105. https://doi.org/10.11896/jsjkx.210600036 |
[9] | 戴朝霞, 李锦欣, 张向东, 徐旭, 梅林, 张亮. 基于DNGAN的磁共振图像超分辨率重建算法 Super-resolution Reconstruction of MRI Based on DNGAN 计算机科学, 2022, 49(7): 113-119. https://doi.org/10.11896/jsjkx.210600105 |
[10] | 程成, 降爱莲. 基于多路径特征提取的实时语义分割方法 Real-time Semantic Segmentation Method Based on Multi-path Feature Extraction 计算机科学, 2022, 49(7): 120-126. https://doi.org/10.11896/jsjkx.210500157 |
[11] | 刘月红, 牛少华, 神显豪. 基于卷积神经网络的虚拟现实视频帧内预测编码 Virtual Reality Video Intraframe Prediction Coding Based on Convolutional Neural Network 计算机科学, 2022, 49(7): 127-131. https://doi.org/10.11896/jsjkx.211100179 |
[12] | 徐鸣珂, 张帆. Head Fusion:一种提高语音情绪识别的准确性和鲁棒性的方法 Head Fusion:A Method to Improve Accuracy and Robustness of Speech Emotion Recognition 计算机科学, 2022, 49(7): 132-141. https://doi.org/10.11896/jsjkx.210100085 |
[13] | 金方焱, 王秀利. 融合RACNN和BiLSTM的金融领域事件隐式因果关系抽取 Implicit Causality Extraction of Financial Events Integrating RACNN and BiLSTM 计算机科学, 2022, 49(7): 179-186. https://doi.org/10.11896/jsjkx.210500190 |
[14] | 孙福权, 崔志清, 邹彭, 张琨. 基于多尺度特征的脑肿瘤分割算法 Brain Tumor Segmentation Algorithm Based on Multi-scale Features 计算机科学, 2022, 49(6A): 12-16. https://doi.org/10.11896/jsjkx.210700217 |
[15] | 吴子斌, 闫巧. 基于动量的映射式梯度下降算法 Projected Gradient Descent Algorithm with Momentum 计算机科学, 2022, 49(6A): 178-183. https://doi.org/10.11896/jsjkx.210500039 |
|