Fine-grained Semantic Reasoning Based Cross-media Dual-way Adversarial Hashing Learning Model

Computer Science ›› 2022, Vol. 49 ›› Issue (9): 123-131.doi: 10.11896/jsjkx.220600011

• Computer Graphics & Multimedia • Previous Articles     Next Articles

Fine-grained Semantic Reasoning Based Cross-media Dual-way Adversarial Hashing Learning Model

CAO Xiao-wen, LIANG Mei-yu, LU Kang-kang   

  1. Beijing Key Laboratory of Intelligent Communication Software and Multimedia,School of Computer(National Pilot Software Engineering School),Beijing University of Posts and Telecommunications,Beijing 100876,China
  • Received:2022-06-02 Revised:2022-07-05 Online:2022-09-15 Published:2022-09-09
  • About author:CAO Xiao-wen,born in 1998,master.Her main research interests include deep learning and cross-modal retrieval.
    LIANG Mei-yu,born in 1985,associate professor,master supervisor.Her main research interests include artificial intelligence,data mining,multimedia information processing and computer vision.
  • Supported by:
    National Natural Science Foundation of China(61877006,62192784) and CAAI-Huawei MindSpore Open Fund(CAAIXSJLJJ-2021-007B).

Abstract: Cross-media hashing has received extensive attention in cross-media searching tasks due to its superior searching efficiency and low storage cost.However,existing methods cannot adequately preserve the high-level semantic relevance and multi-label of multi-media data.In order to solve the above problems,this paper proposes a fine-grained semantic reasoning based cross-media dual-way adversarial hashing learning model(SDAH),which generates compact and consistent cross-media unified efficient hash semantic representations by maximizing fine-grained semantic associations between different medias.First,a fine-grained cross-media semantic association learning and inference method based on the cross-media collaborative attention mechanism is proposed.The cross-media attention mechanism collaboratively learns the fine-grained implicit semantic associations of images and texts,and obtains the salient semantic inference features of images and texts.Then,a cross-media dual-way adversarial hashing network is established to jointly learn the intra-modality and inter-modality semantic similarity constraints,and better to align the semantic distributions of different media hash codes through a two-way adversarial learning mechanism,which generates higher-quality and more discriminative cross-media unified hash representation,facilitates the process of cross-media semantic fusion and improves the cross-media searching performance.Experimental results compared with existing methods on two public datasets verify the performance superiority of the proposed method in various cross-media search scenarios.

Key words: Semantic reasoning, Hash learning, Cross-media search, Adversarial learning, Cross-media semantic fusion

CLC Number: 

  • TP391
[1]LIU S,QIAN S S,GUAN Y,et al.Joint-modal distribution-based similarity hashing for large-scale unsupervised deep cross-modal retrieval[C]//Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval.2020:1379-1388.
[2]ZHANG Y F,ZHOU W G,WANG M,et al.Deep relation embedding for cross-modal retrieval[J].IEEE Transactions on Image Processing,2020,30:617-627.
[3]HE Y,LIU X,CHEUNG Y M,et al.Cross-Graph AttentionEnhanced Multi-Modal Correlation Learning for Fine-Grained Image-Text Retrieval[C]//Proceedings of the 44th Interna-tional ACM SIGIR Conference on Research and Development in Information Retrieval.2021:1865-1869.
[4]ZHANG P F,DUAN J S,HUANG Z,et al.Joint-teaching:Learning to Refine Knowledge for Resource-constrained Unsupervised Cross-modal Retrieval[C]//Proceedings of the 29th ACM International Conference on Multimedia.2021:1517-1525.
[5]ZHANG D Q,LI W J.Large-scale supervised multimodal ha-shing with semantic correlation maximization[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2014:2177-2183.
[6]LIN Z J,DING G G,HU M Q,et al.Semantics-preserving ha-shing for cross-view retrieval[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2015:3864-3872.
[7]LIU X B,NIE X S,SUN H L,et al.Modality-specific structure preserving hashing for cross-modal retrieval[C]//2018 IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP).IEEE,2018:1678-1682.
[8]LIANG M Y,DU J P,YANG C X,et al.Cross-Media Semantic Correlation Learning Based on Deep Hash Network and Semantic Expansion for Social Network Cross-Media Search[J].IEEE Transactions on Neural Networks and Learning Systems,2020,31(9):3634-3648.
[9]DEVRAJ M,KUNAL N C,SOMA B.Generalized semantic preserving hashing for cross-modal retrieval[J].IEEE Transations on Image Processing,2018,28(1):102-112.
[10]CHEN Z D,WANG Y X,LI H Q,et al.A two-step cross-modal hashing by exploiting label correlations and preserving similarity in both steps[C]//Proceedings of the 27th ACM International Conference on Multimedia.2019:1694-1702.
[11]JIANG Q Y,LI W J.Deep cross-modal hashing[C]//Procee-dings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:3232-3240.
[12]GU W,GU X Y,GU J Z,et al.Adversary guided asymmetric hashing for cross-modal retrieval[C]//Proceedings of the 2019International Conference on Multimedia Retrieval.2019:159-167.
[13]WANG X Z,ZOU X T,BAKKER E M,et al.Self-constraining and attention-based hashing network for bit-scalable cross-modal retrieval[J].Neurocomputing,2020,400:255-271.
[14]ZOU X T,WANG X Z,BAKKER E M,et al.Multi-label semantics preserving based deep cross-modal hashing[J].Signal Processing:Image Communication,2021,93:116131.
[15]YAO H L,ZHAN Y W,CHEN Z D,et al.TEACH:Attention-Aware Deep Cross-Modal Hashing[C]//Proceedings of the 2021 International Conference on Multimedia Retrieval.2021:376-384.
[16]GOODFELLOW I,POUGET-ABADIE J,MIRZA M,et al.Generative adversarial nets[J].arXiv:1406.2661,2014.
[17]WANG B L,YANG Y,XU X,et al.Adversarial cross-modal retrieval[C]//Proceedings of the 25th ACM International Confe-rence on Multimedia.2017:154-162.
[18]LI C,DENG C,LI N,et al.Self-supervised adversarial hashing networks for cross-modal retrieval[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:4242-4251.
[19]BAI C,ZENG C,MA Q,et al.Deep adversarial discrete hashing for cross-modal retrieval[C]//Proceedings of the 2020 International Conference on Multimedia Retrieval.2020:525-531.
[20]HAN L G,MIN M R,STATHOPOULOS A,et al.Dual projection generative adversarial networks for conditional imagegene-ration[C]//Proceedings of the IEEE/CVF International Confe-rence on Computer Vision.2021:14438-14447.
[21]KARRAS T,AITTALA M,HELLSTEN J,et al.Training gene-rative adversarial networks with limited data[J].Advances in Neural Information Processing Systems,2020,33:12104-12114.
[22]SANTORO A,RAPOSO D,BARRETT D G,et al.A simple neural network module for relational reasoning[J].arXiv:1706.01427,2017.
[23]MESSINA N,AMATO G,CARRARA F,et al.Learning visual features for relational CBIR[J].International Journal of Multimedia Information Retrieval,2020,9(2):113-124.
[24]MESSINA N,AMATO G,CARRARA F,et al.Learning rela-tionship-aware visual features[C]//Proceedings of the Euro-pean Conference on Computer Vision(ECCV) Workshops.2018:486-501.
[25]HU R H,ANDREAS J,ROHRBACH M,et al.Learning to reason:End-to-end module networks for visual question answering[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:804-813.
[26]ZHENG W F,LIU X J,NI X B,et al.Improving visual reaso-ning through semantic representation[J].IEEE Access,2021,9:91476-91486.
[27]WANG J B,WANG W,WANG L,et al.Learning visual relationship and context-aware attention for image captioning[J].Pattern Recognition,2020,98:107075.
[28]YANG L,HU H,LU X L,et al.Constrained lstm and residual attention for image captioning[J].ACM Transactions on Multimedia Computing,Communications,and Applications(TOMM),2020,16(3):1-18.
[29]LI Y K,OUYANG W L,ZHOU B,et al.Factorizable net:anefficient subgraph-based framework for scene graph generation[C]//Proceedings of the European Conference on Computer Vision(ECCV).2018:335-351.
[30]REN S Q,HE K M,GIRSHICK R,et al.Faster r-cnn:Towards real-time object detection with region proposal networks[J].Advances in Neural Information Processing Systems,2017,39(6):1137-1149.
[31]DEVLIN J,CHANG M W,LEE K,et al.Bert:Pre-training of deep bidirectional transformers for language understanding[J].Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies,Volume 1(Long and Short Papers).2019:4171-4186.
[32]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isall you need[J].arXiv:1706.03762 2017.
[33]MESSINA N,FALCHI F,ESULI A,et al.Transformer reaso-ning network for image-text matching and retrieval[C]//2020 25th International Conference on Pattern Recognition(ICPR).IEEE,2021:5222-5229.
[34]ZHAO F,HUANG Y,WANG L,et al.Deep semantic ranking based hashing for multi-label image retrieval[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2015:1556-1564.
[35]HUISKES M J,LEW M S.The mir flickr retrieval evaluation[C]//Proceedings of the 1st ACM International Conference on Multimedia Information Retrieval.2008:39-43.
[36]CHUA T S,TANG J H,HONG R C,et al.Nus-wide:a real-world web image database from national university of singapore[C]//Proceedings of the ACM International Conference on Image and Video Retrieval.2009:1-9.
[37]WOLF T,DEBUT L,SANH V,et al.Huggingface's transfor-mers:State-of-the-art natural language processing[J].arXiv:1910.03771,2019.
[38]ANDERSON P,HE X D,BUEHLER C,et al.Bottom-up and top-down attention for image captioning and visual question answering[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:6077-6086.
[1] HOU Hong-xu, SUN Shuo, WU Nier. Survey of Mongolian-Chinese Neural Machine Translation [J]. Computer Science, 2022, 49(1): 31-40.
[2] LIU Li-bo, GOU Ting-ting. Cross-modal Retrieval Combining Deep Canonical Correlation Analysis and Adversarial Learning [J]. Computer Science, 2021, 48(9): 200-207.
[3] WANG Sheng, ZHANG Yang-sen, CHEN Ruo-yu, XIANG Ga. Text Matching Method Based on Fine-grained Difference Features [J]. Computer Science, 2021, 48(8): 60-65.
[4] ZHAN Wan-jiang, HONG Zhi-lin, FANG Lu-ping, WU Zhe-fu, LYU Yue-hua. Collaborative Filtering Recommendation Algorithm Based on Adversarial Learning [J]. Computer Science, 2021, 48(7): 172-177.
[5] SUN Quan, ZENG Xiao-qin. Image Inpainting Based on Generative Adversarial Networks [J]. Computer Science, 2018, 45(12): 229-234.
[6] LIU Xiao-qin, WANG Jie-ting, QIAN Yu-hua and WANG Xiao-yue. Ensemble Method Against Evasion Attack with Different Strength of Attack [J]. Computer Science, 2018, 45(1): 34-38.
[7] CHEN Heng. Spark Based Large-scale Semantic Data Distributed Reasoning Framework [J]. Computer Science, 2016, 43(Z11): 93-96.
[8] CUI Hua,YING Shi,YUAN Wen-jie,HU Luo-kai. Review of Semantic Web Service Composition [J]. Computer Science, 2010, 37(5): 21-25.
[9] . [J]. Computer Science, 2009, 36(1): 171-176.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!