{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,9,20]],"date-time":"2024-09-20T16:49:25Z","timestamp":1726850965978},"publisher-location":"New York, NY, USA","reference-count":45,"publisher":"ACM","funder":[{"name":"National NaturalScience Foundation of China","award":["No. 61802405"]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2021,8,24]]},"DOI":"10.1145\/3460426.3463615","type":"proceedings-article","created":{"date-parts":[[2021,9,1]],"date-time":"2021-09-01T22:50:29Z","timestamp":1630536629000},"update-policy":"http:\/\/dx.doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":12,"title":["Global Relation-Aware Attention Network for Image-Text Retrieval"],"prefix":"10.1145","author":[{"given":"Jie","family":"Cao","sequence":"first","affiliation":[{"name":"School of Artificial Intelligence, Chinese Academy of Sciences & University of Chinese Academy of Sciences, BeiJing, China"}]},{"given":"Shengsheng","family":"Qian","sequence":"additional","affiliation":[{"name":"School of Artificial Intelligence, Chinese Academy of Sciences & University of Chinese Academy of Sciences, BeiJing, China"}]},{"given":"Huaiwen","family":"Zhang","sequence":"additional","affiliation":[{"name":"School of Artificial Intelligence, Chinese Academy of Sciences & University of Chinese Academy of Sciences, BeiJing, China"}]},{"given":"Quan","family":"Fang","sequence":"additional","affiliation":[{"name":"School of Artificial Intelligence, Chinese Academy of Sciences & University of Chinese Academy of Sciences, BeiJing, China"}]},{"given":"Changsheng","family":"Xu","sequence":"additional","affiliation":[{"name":"School of Artificial Intelligence, Chinese Academy of Sciences; University of Chinese Academy of Sciences; & Peng Cheng Laboratory, BeiJing, China"}]}],"member":"320","published-online":{"date-parts":[[2021,9]]},"reference":[{"key":"e_1_3_2_1_1_1","volume-title":"Multimodal Categorization of Crisis Events in Social Media. In 2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020","author":"Abavisani Mahdi","year":"2020","unstructured":"Mahdi Abavisani , Liwei Wu , Shengli Hu , Joel R. Tetreault , and Alejandro Jaimes . 2020 . Multimodal Categorization of Crisis Events in Social Media. In 2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020 , Seattle, WA, USA, June 13--19 , 2020. 14667--14677. Mahdi Abavisani, Liwei Wu, Shengli Hu, Joel R. Tetreault, and Alejandro Jaimes. 2020. Multimodal Categorization of Crisis Events in Social Media. In 2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13--19, 2020. 14667--14677."},{"key":"e_1_3_2_1_2_1","volume-title":"Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018","author":"Anderson Peter","year":"2018","unstructured":"Peter Anderson , Xiaodong He , Chris Buehler , Damien Teney , Mark Johnson , Stephen Gould , and Lei Zhang . 2018 . Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018 , Salt Lake City, UT, USA, June 18--22 , 2018. 6077--6086. Peter Anderson, Xiaodong He, Chris Buehler, Damien Teney, Mark Johnson, Stephen Gould, and Lei Zhang. 2018. Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18--22, 2018. 6077--6086."},{"key":"e_1_3_2_1_3_1","volume-title":"Proceedings of the 30th International Conference on Machine Learning, ICML 2013","author":"Andrew Galen","year":"2013","unstructured":"Galen Andrew , Raman Arora , Jeff A. Bilmes , and Karen Livescu . 2013 . Deep Canonical Correlation Analysis . In Proceedings of the 30th International Conference on Machine Learning, ICML 2013 , Atlanta, GA, USA, 16- -21 June 2013 . 1247--1255. Galen Andrew, Raman Arora, Jeff A. Bilmes, and Karen Livescu. 2013. Deep Canonical Correlation Analysis. In Proceedings of the 30th International Conference on Machine Learning, ICML 2013, Atlanta, GA, USA, 16--21 June 2013 . 1247--1255."},{"key":"e_1_3_2_1_4_1","volume-title":"GCNet: Non-Local Networks Meet Squeeze-Excitation Networks and Beyond. In 2019 IEEE\/CVF International Conference on Computer Vision Workshops, ICCV Workshops 2019","author":"Cao Yue","year":"2019","unstructured":"Yue Cao , Jiarui Xu , Stephen Lin , Fangyun Wei , and Han Hu . 2019 . GCNet: Non-Local Networks Meet Squeeze-Excitation Networks and Beyond. In 2019 IEEE\/CVF International Conference on Computer Vision Workshops, ICCV Workshops 2019 , Seoul, Korea (South), October 27--28 , 2019. 1971--1980. Yue Cao, Jiarui Xu, Stephen Lin, Fangyun Wei, and Han Hu. 2019. GCNet: Non-Local Networks Meet Squeeze-Excitation Networks and Beyond. In 2019 IEEE\/CVF International Conference on Computer Vision Workshops, ICCV Workshops 2019, Seoul, Korea (South), October 27--28, 2019. 1971--1980."},{"key":"e_1_3_2_1_5_1","doi-asserted-by":"crossref","unstructured":"Hui Chen Guiguang Ding Zijia Lin Sicheng Zhao Xiaopeng Gu Wenyuan Xu and Jungong Han. 2020 b. ACMNet: Adaptive Confidence Matching Network for Human Behavior Analysis via Cross-Modal Retrieval. ACM Trans. Multimedia Comput. Commun. Appl. (2020) bibinfonumpages21 pages. Hui Chen Guiguang Ding Zijia Lin Sicheng Zhao Xiaopeng Gu Wenyuan Xu and Jungong Han. 2020 b. ACMNet: Adaptive Confidence Matching Network for Human Behavior Analysis via Cross-Modal Retrieval. ACM Trans. Multimedia Comput. Commun. Appl. (2020) bibinfonumpages21 pages.","DOI":"10.1145\/3362065"},{"key":"e_1_3_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/3343031.3351055"},{"key":"e_1_3_2_1_7_1","volume-title":"IMRAM: Iterative Matching With Recurrent Attention Memory for Cross-Modal Image-Text Retrieval. In 2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020","author":"Chen Hui","year":"2020","unstructured":"Hui Chen , Guiguang Ding , Xudong Liu , Zijia Lin , Ji Liu , and Jungong Han . 2020 c . IMRAM: Iterative Matching With Recurrent Attention Memory for Cross-Modal Image-Text Retrieval. In 2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020 , Seattle, WA, USA, June 13--19 , 2020 . 12652--12660. Hui Chen, Guiguang Ding, Xudong Liu, Zijia Lin, Ji Liu, and Jungong Han. 2020 c. IMRAM: Iterative Matching With Recurrent Attention Memory for Cross-Modal Image-Text Retrieval. In 2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13--19, 2020 . 12652--12660."},{"key":"e_1_3_2_1_8_1","volume-title":"Proceedings, Part XIII. 549--565","author":"Chen Tianlang","year":"2020","unstructured":"Tianlang Chen , Jiajun Deng , and Jiebo Luo . 2020 a. Adaptive Offline Quintuplet Loss for Image-Text Matching. In Computer Vision - ECCV 2020 - 16th European Conference, Glasgow, UK, August 23--28, 2020 , Proceedings, Part XIII. 549--565 . Tianlang Chen, Jiajun Deng, and Jiebo Luo. 2020 a. Adaptive Offline Quintuplet Loss for Image-Text Matching. In Computer Vision - ECCV 2020 - 16th European Conference, Glasgow, UK, August 23--28, 2020, Proceedings, Part XIII. 549--565."},{"key":"e_1_3_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v34i07.6631"},{"key":"e_1_3_2_1_10_1","volume-title":"BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding","author":"Devlin Jacob","year":"2019","unstructured":"Jacob Devlin , Ming-Wei Chang , Kenton Lee , and Kristina Toutanova . 2019 . BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding . Association for Computational Linguistics , 4171--4186. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Association for Computational Linguistics, 4171--4186."},{"key":"e_1_3_2_1_11_1","volume-title":"Imagine and Match: Improving Textual-Visual Cross-Modal Retrieval With Generative Models. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018","author":"Gu Jiuxiang","year":"2018","unstructured":"Jiuxiang Gu , Jianfei Cai , Shafiq R. Joty , Li Niu , and Gang Wang . 2018 . Look , Imagine and Match: Improving Textual-Visual Cross-Modal Retrieval With Generative Models. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018 , Salt Lake City, UT, USA, June 18--22 , 2018. 7181--7189. Jiuxiang Gu, Jianfei Cai, Shafiq R. Joty, Li Niu, and Gang Wang. 2018. Look, Imagine and Match: Improving Textual-Visual Cross-Modal Retrieval With Generative Models. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18--22, 2018. 7181--7189."},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/3372278.3390709"},{"key":"e_1_3_2_1_13_1","volume-title":"Efficient Graph Deep Learning in TensorFlow with tf_geometric. CoRR","author":"Hu Jun","year":"2021","unstructured":"Jun Hu , Shengsheng Qian , Quan Fang , Youze Wang , Quan Zhao , Huaiwen Zhang , and Changsheng Xu. 2021. Efficient Graph Deep Learning in TensorFlow with tf_geometric. CoRR ( 2021 ), 1--6. Jun Hu, Shengsheng Qian, Quan Fang, Youze Wang, Quan Zhao, Huaiwen Zhang, and Changsheng Xu. 2021. Efficient Graph Deep Learning in TensorFlow with tf_geometric. CoRR (2021), 1--6."},{"key":"e_1_3_2_1_14_1","volume-title":"ACMM: Aligned Cross-Modal Memory for Few-Shot Image and Sentence Matching. In 2019 IEEE\/CVF International Conference on Computer Vision, ICCV 2019","author":"Huang Yan","year":"2019","unstructured":"Yan Huang and Liang Wang . 2019 . ACMM: Aligned Cross-Modal Memory for Few-Shot Image and Sentence Matching. In 2019 IEEE\/CVF International Conference on Computer Vision, ICCV 2019 , Seoul, Korea (South), October 27 - November 2, 2019. 5773--5782. Yan Huang and Liang Wang. 2019. ACMM: Aligned Cross-Modal Memory for Few-Shot Image and Sentence Matching. In 2019 IEEE\/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), October 27 - November 2, 2019. 5773--5782."},{"key":"e_1_3_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2020.2975594"},{"key":"e_1_3_2_1_16_1","volume-title":"SMAN: Stacked Multimodal Attention Network for Cross-Modal Image-Text Retrieval","author":"Ji Z.","year":"2020","unstructured":"Z. Ji , H. Wang , J. Han , and Y. Pang . 2020 . SMAN: Stacked Multimodal Attention Network for Cross-Modal Image-Text Retrieval . IEEE Transactions on Cybernetics ( 2020), 1--12. Z. Ji, H. Wang , J. Han, and Y. Pang. 2020. SMAN: Stacked Multimodal Attention Network for Cross-Modal Image-Text Retrieval. IEEE Transactions on Cybernetics (2020), 1--12."},{"key":"e_1_3_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2016.2598339"},{"key":"e_1_3_2_1_18_1","volume-title":"Kipf and Max Welling","author":"Thomas","year":"2017","unstructured":"Thomas N. Kipf and Max Welling . 2017 . Semi-Supervised Classification with Graph Convolutional Networks. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24--26, 2017, Conference Track Proceedings . 1--14. Thomas N. Kipf and Max Welling. 2017. Semi-Supervised Classification with Graph Convolutional Networks. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24--26, 2017, Conference Track Proceedings. 1--14."},{"key":"e_1_3_2_1_19_1","volume-title":"Proceedings, Part IV. 212--228","author":"Lee Kuang-Huei","year":"2018","unstructured":"Kuang-Huei Lee , Xi Chen , Gang Hua , Houdong Hu , and Xiaodong He . 2018 . Stacked Cross Attention for Image-Text Matching. In Computer Vision - ECCV 2018 - 15th European Conference, Munich, Germany, September 8--14, 2018 , Proceedings, Part IV. 212--228 . Kuang-Huei Lee, Xi Chen, Gang Hua, Houdong Hu, and Xiaodong He. 2018. Stacked Cross Attention for Image-Text Matching. In Computer Vision - ECCV 2018 - 15th European Conference, Munich, Germany, September 8--14, 2018, Proceedings, Part IV. 212--228."},{"key":"e_1_3_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.emnlp-main.733"},{"key":"e_1_3_2_1_21_1","volume-title":"The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI","author":"Li Gen","year":"2020","unstructured":"Gen Li , Nan Duan , Yuejian Fang , Ming Gong , and Daxin Jiang . 2020 a. Unicoder-VL: A Universal Encoder for Vision and Language by Cross-Modal Pre-Training . In The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020 , The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, February 7--12, 2020 . 11336--11344. Gen Li, Nan Duan, Yuejian Fang, Ming Gong, and Daxin Jiang. 2020 a. Unicoder-VL: A Universal Encoder for Vision and Language by Cross-Modal Pre-Training. In The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, February 7--12, 2020 . 11336--11344."},{"key":"e_1_3_2_1_22_1","volume-title":"Proceedings, Part V. 740--755","author":"Lin Tsung-Yi","unstructured":"Tsung-Yi Lin , Michael Maire , Serge J. Belongie , James Hays , Pietro Perona , Deva Ramanan , Piotr Doll\u00e1 r, and C. Lawrence Zitnick . 2014. Microsoft COCO: Common Objects in Context. In Computer Vision - ECCV 2014 - 13th European Conference, Zurich, Switzerland, September 6--12, 2014 , Proceedings, Part V. 740--755 . Tsung-Yi Lin, Michael Maire, Serge J. Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Doll\u00e1 r, and C. Lawrence Zitnick. 2014. Microsoft COCO: Common Objects in Context. In Computer Vision - ECCV 2014 - 13th European Conference, Zurich, Switzerland, September 6--12, 2014, Proceedings, Part V. 740--755."},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/3397271.3401086"},{"key":"e_1_3_2_1_24_1","volume-title":"2020 a. Fine-grained Visual Textual Alignment for Cross-Modal Retrieval using Transformer Encoders. CoRR","author":"Messina Nicola","year":"2020","unstructured":"Nicola Messina , Giuseppe Amato , Andrea Esuli , Fabrizio Falchi , Claudio Gennaro , and St\u00e9 phane Marchand-Maillet . 2020 a. Fine-grained Visual Textual Alignment for Cross-Modal Retrieval using Transformer Encoders. CoRR ( 2020 ), 1--12. Nicola Messina, Giuseppe Amato, Andrea Esuli, Fabrizio Falchi, Claudio Gennaro, and St\u00e9 phane Marchand-Maillet. 2020 a. Fine-grained Visual Textual Alignment for Cross-Modal Retrieval using Transformer Encoders. CoRR (2020), 1--12."},{"key":"e_1_3_2_1_25_1","doi-asserted-by":"crossref","unstructured":"Nicola Messina Fabrizio Falchi Andrea Esuli and Giuseppe Amato. 2020 b. Transformer Reasoning Network for Image- Text Matching and Retrieval. (2020) 5222--5229. Nicola Messina Fabrizio Falchi Andrea Esuli and Giuseppe Amato. 2020 b. Transformer Reasoning Network for Image- Text Matching and Retrieval. (2020) 5222--5229.","DOI":"10.1109\/ICPR48806.2021.9413172"},{"key":"e_1_3_2_1_26_1","doi-asserted-by":"crossref","unstructured":"Kai Niu Yan Huang and Liang Wang. 2020. Re-ranking image-text matching by adaptive metric fusion. Pattern Recognit. (2020) 107351. Kai Niu Yan Huang and Liang Wang. 2020. Re-ranking image-text matching by adaptive metric fusion. Pattern Recognit. (2020) 107351.","DOI":"10.1016\/j.patcog.2020.107351"},{"key":"e_1_3_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/2964284.2964294"},{"key":"e_1_3_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2015.2510329"},{"key":"e_1_3_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/1873951.1873987"},{"key":"e_1_3_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D19-1410"},{"key":"e_1_3_2_1_31_1","volume-title":"Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks","author":"Ren Shaoqing","year":"2017","unstructured":"Shaoqing Ren , Kaiming He , Ross B. Girshick , and Jian Sun . 2017. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks . IEEE Trans. Pattern Anal. Mach. Intell . ( 2017 ), 1137--1149. Shaoqing Ren, Kaiming He, Ross B. Girshick, and Jian Sun. 2017. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. (2017), 1137--1149."},{"key":"e_1_3_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2012.6247923"},{"key":"e_1_3_2_1_33_1","unstructured":"Nitish Srivastava and Ruslan Salakhutdinov. 2014. Multimodal learning with deep Boltzmann machines. J. Mach. Learn. Res. (2014) 2949--2980. Nitish Srivastava and Ruslan Salakhutdinov. 2014. Multimodal learning with deep Boltzmann machines. J. Mach. Learn. Res. (2014) 2949--2980."},{"key":"e_1_3_2_1_34_1","volume-title":"Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani , Noam Shazeer , Niki Parmar , Jakob Uszkoreit , Llion Jones , Aidan N. Gomez , Lukasz Kaiser , and Illia Polosukhin . 2017 . Attention is All you Need . In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017 , December 4 --9 , 2017, Long Beach, CA, USA. 5998--6008. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4--9, 2017, Long Beach, CA, USA. 5998--6008."},{"key":"e_1_3_2_1_35_1","volume-title":"Cross-modal Scene Graph Matching for Relationship-aware Image-Text Retrieval. In IEEE Winter Conference on Applications of Computer Vision, WACV 2020","author":"Wang Sijin","year":"2020","unstructured":"Sijin Wang , Ruiping Wang , Ziwei Yao , Shiguang Shan , and Xilin Chen . 2020 . Cross-modal Scene Graph Matching for Relationship-aware Image-Text Retrieval. In IEEE Winter Conference on Applications of Computer Vision, WACV 2020 , Snowmass Village, CO, USA, March 1--5 , 2020. 1497--1506. Sijin Wang, Ruiping Wang, Ziwei Yao, Shiguang Shan, and Xilin Chen. 2020. Cross-modal Scene Graph Matching for Relationship-aware Image-Text Retrieval. In IEEE Winter Conference on Applications of Computer Vision, WACV 2020, Snowmass Village, CO, USA, March 1--5, 2020. 1497--1506."},{"key":"e_1_3_2_1_36_1","volume-title":"CAMP: Cross-Modal Adaptive Message Passing for Text-Image Retrieval. In 2019 IEEE\/CVF International Conference on Computer Vision, ICCV 2019","author":"Wang Zihao","year":"2019","unstructured":"Zihao Wang , Xihui Liu , Hongsheng Li , Lu Sheng , Junjie Yan , Xiaogang Wang , and Jing Shao . 2019 . CAMP: Cross-Modal Adaptive Message Passing for Text-Image Retrieval. In 2019 IEEE\/CVF International Conference on Computer Vision, ICCV 2019 , Seoul, Korea (South), October 27 - November 2, 2019. 5763--5772. Zihao Wang, Xihui Liu, Hongsheng Li, Lu Sheng, Junjie Yan, Xiaogang Wang, and Jing Shao. 2019. CAMP: Cross-Modal Adaptive Message Passing for Text-Image Retrieval. In 2019 IEEE\/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), October 27 - November 2, 2019. 5763--5772."},{"key":"e_1_3_2_1_37_1","volume-title":"Universal Weighting Metric Learning for Cross-Modal Matching. In 2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020","author":"Wei Jiwei","year":"2020","unstructured":"Jiwei Wei , Xing Xu , Yang Yang , Yanli Ji , Zheng Wang , and Heng Tao Shen . 2020 a . Universal Weighting Metric Learning for Cross-Modal Matching. In 2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020 , Seattle, WA, USA, June 13--19 , 2020. 13002--13011. Jiwei Wei, Xing Xu, Yang Yang, Yanli Ji, Zheng Wang, and Heng Tao Shen. 2020 a. Universal Weighting Metric Learning for Cross-Modal Matching. In 2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13--19, 2020. 13002--13011."},{"key":"e_1_3_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2020.2996407"},{"key":"e_1_3_2_1_39_1","volume-title":"Multi-Modality Cross Attention Network for Image and Sentence Matching. In 2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020","author":"Wei Xi","year":"2020","unstructured":"Xi Wei , Tianzhu Zhang , Yan Li , Yongdong Zhang , and Feng Wu . 2020 b . Multi-Modality Cross Attention Network for Image and Sentence Matching. In 2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020 , Seattle, WA, USA, June 13--19 , 2020. 10938--10947. Xi Wei, Tianzhu Zhang, Yan Li, Yongdong Zhang, and Feng Wu. 2020 b. Multi-Modality Cross Attention Network for Image and Sentence Matching. In 2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13--19, 2020. 10938--10947."},{"key":"e_1_3_2_1_40_1","volume-title":"Cross-Modal Retrieval With CNN Visual Features: A New Baseline","author":"Wei Yunchao","year":"2017","unstructured":"Yunchao Wei , Yao Zhao , Canyi Lu , Shikui Wei , Luoqi Liu , Zhenfeng Zhu , and Shuicheng Yan . 2017. Cross-Modal Retrieval With CNN Visual Features: A New Baseline . IEEE Trans. Cybern . ( 2017 ), 449--460. Yunchao Wei, Yao Zhao, Canyi Lu, Shikui Wei, Luoqi Liu, Zhenfeng Zhu, and Shuicheng Yan. 2017. Cross-Modal Retrieval With CNN Visual Features: A New Baseline. IEEE Trans. Cybern. (2017), 449--460."},{"key":"e_1_3_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/3343031.3350940"},{"key":"e_1_3_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298966"},{"key":"e_1_3_2_1_43_1","volume-title":"From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions. Trans. Assoc. Comput. Linguistics","author":"Young Peter","year":"2014","unstructured":"Peter Young , Alice Lai , Micah Hodosh , and Julia Hockenmaier . 2014. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions. Trans. Assoc. Comput. Linguistics ( 2014 ), 67--78. Peter Young, Alice Lai, Micah Hodosh, and Julia Hockenmaier. 2014. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions. Trans. Assoc. Comput. Linguistics (2014), 67--78."},{"key":"e_1_3_2_1_44_1","volume-title":"Context-Aware Attention Network for Image-Text Retrieval. In 2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020","author":"Zhang Qi","year":"2020","unstructured":"Qi Zhang , Zhen Lei , Zhaoxiang Zhang , and Stan Z. Li . 2020 . Context-Aware Attention Network for Image-Text Retrieval. In 2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020 , Seattle, WA, USA, June 13--19 , 2020 . 3533--3542. Qi Zhang, Zhen Lei, Zhaoxiang Zhang, and Stan Z. Li. 2020. Context-Aware Attention Network for Image-Text Retrieval. In 2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13--19, 2020. 3533--3542."},{"key":"e_1_3_2_1_45_1","volume-title":"Deep Supervised Cross-Modal Retrieval. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019","author":"Zhen Liangli","year":"2019","unstructured":"Liangli Zhen , Peng Hu , Xu Wang , and Dezhong Peng . 2019 . Deep Supervised Cross-Modal Retrieval. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019 , Long Beach, CA, USA, June 16--20 , 2019 . 10394--10403. Liangli Zhen, Peng Hu, Xu Wang, and Dezhong Peng. 2019. Deep Supervised Cross-Modal Retrieval. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16--20, 2019 . 10394--10403."}],"event":{"name":"ICMR '21: International Conference on Multimedia Retrieval","location":"Taipei Taiwan","acronym":"ICMR '21","sponsor":["SIGMM ACM Special Interest Group on Multimedia"]},"container-title":["Proceedings of the 2021 International Conference on Multimedia Retrieval"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3460426.3463615","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,6,9]],"date-time":"2023-06-09T20:52:35Z","timestamp":1686343955000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3460426.3463615"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,8,24]]},"references-count":45,"alternative-id":["10.1145\/3460426.3463615","10.1145\/3460426"],"URL":"https:\/\/doi.org\/10.1145\/3460426.3463615","relation":{},"subject":[],"published":{"date-parts":[[2021,8,24]]},"assertion":[{"value":"2021-09-01","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}