{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,10,30]],"date-time":"2024-10-30T21:31:12Z","timestamp":1730323872302,"version":"3.28.0"},"publisher-location":"New York, NY, USA","reference-count":56,"publisher":"ACM","funder":[{"DOI":"10.13039\/501100012226","name":"Fundamental Research Funds for the Central Universities","doi-asserted-by":"publisher","award":["ZYGX2019Z015"],"id":[{"id":"10.13039\/501100012226","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61976049, 62072080, 61632007 and U20B2063"],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Sichuan Science and Technology Program","award":["2018GZDZX0032, 2019ZDZX0008, 2019YFG0003, 2019YFG0533 and 2020YFS0057"]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2021,8,24]]},"DOI":"10.1145\/3460426.3463618","type":"proceedings-article","created":{"date-parts":[[2021,9,1]],"date-time":"2021-09-01T18:50:28Z","timestamp":1630522228000},"page":"173-182","update-policy":"http:\/\/dx.doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":10,"title":["Cross-Modal Image-Recipe Retrieval via Intra- and Inter-Modality Hybrid Fusion"],"prefix":"10.1145","author":[{"given":"Jiao","family":"Li","sequence":"first","affiliation":[{"name":"University of Electronic Science and Technology of China, Chengdu, China"}]},{"given":"Jialiang","family":"Sun","sequence":"additional","affiliation":[{"name":"University of Electronic Science and Technology of China, Chengdu, China"}]},{"given":"Xing","family":"Xu","sequence":"additional","affiliation":[{"name":"University of Electronic Science and\u00a0Technology of China, Chengdu, China"}]},{"given":"Wei","family":"Yu","sequence":"additional","affiliation":[{"name":"Glasgow College & University of Electronic Science and\u00a0Technology of China, Chengdu, China"}]},{"given":"Fumin","family":"Shen","sequence":"additional","affiliation":[{"name":"University of Electronic Science and Technology of China, Chengdu, China"}]}],"member":"320","published-online":{"date-parts":[[2021,9]]},"reference":[{"key":"e_1_3_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/3194658.3194663"},{"key":"e_1_3_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/3209978.3210036"},{"key":"e_1_3_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/2964284.2964315"},{"key":"e_1_3_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/3123266.3123428"},{"key":"e_1_3_2_1_5_1","volume-title":"Deep Understanding of Cooking Procedure for Cross-modal Recipe Retrieval. In 2018 ACM Multimedia Conference on Multimedia Conference, MM. 1020--1028","author":"Chen Jingjing","year":"2018","unstructured":"Jingjing Chen , Chong-Wah Ngo , Fuli Feng , and Tat-Seng Chua . 2018 . Deep Understanding of Cooking Procedure for Cross-modal Recipe Retrieval. In 2018 ACM Multimedia Conference on Multimedia Conference, MM. 1020--1028 . Jingjing Chen, Chong-Wah Ngo, Fuli Feng, and Tat-Seng Chua. 2018. Deep Understanding of Cooking Procedure for Cross-modal Recipe Retrieval. In 2018 ACM Multimedia Conference on Multimedia Conference, MM. 1020--1028."},{"volume-title":"MultiMedia Modeling - 23rd International Conference, MMM. 588--600.","author":"Chen Jingjing","key":"e_1_3_2_1_6_1","unstructured":"Jingjing Chen , Lei Pang , and Chong-Wah Ngo . 2017b. Cross-Modal Recipe Retrieval: How to Cook this Dish? . In MultiMedia Modeling - 23rd International Conference, MMM. 588--600. Jingjing Chen, Lei Pang, and Chong-Wah Ngo. 2017b. Cross-Modal Recipe Retrieval: How to Cook this Dish?. In MultiMedia Modeling - 23rd International Conference, MMM. 588--600."},{"key":"e_1_3_2_1_7_1","volume-title":"Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. CoRR","author":"Chung Junyoung","year":"2014","unstructured":"Junyoung Chung , Caglar G\u00fclcehre , KyungHyun Cho , and Yoshua Bengio . 2014. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. CoRR , Vol. abs\/ 1412 .3555 ( 2014 ). Junyoung Chung, Caglar G\u00fclcehre, KyungHyun Cho, and Yoshua Bengio. 2014. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. CoRR, Vol. abs\/1412.3555 (2014)."},{"key":"e_1_3_2_1_8_1","volume-title":"Pre-Training with Whole Word Masking for Chinese BERT. CoRR","author":"Cui Yiming","year":"2019","unstructured":"Yiming Cui , Wanxiang Che , Ting Liu , Bing Qin , Ziqing Yang , Shijin Wang , and Guoping Hu. 2019. Pre-Training with Whole Word Masking for Chinese BERT. CoRR , Vol. abs\/ 1906 .08101 ( 2019 ). arxiv: 1906.08101 Yiming Cui, Wanxiang Che, Ting Liu, Bing Qin, Ziqing Yang, Shijin Wang, and Guoping Hu. 2019. Pre-Training with Whole Word Masking for Chinese BERT. CoRR, Vol. abs\/1906.08101 (2019). arxiv: 1906.08101"},{"key":"e_1_3_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"e_1_3_2_1_10_1","volume-title":"BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. CoRR","author":"Devlin Jacob","year":"2018","unstructured":"Jacob Devlin , Ming-Wei Chang , Kenton Lee , and Kristina Toutanova . 2018 . BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. CoRR , Vol. abs\/ 1810 .04805 (2018). arxiv: 1810.04805 Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. CoRR, Vol. abs\/1810.04805 (2018). arxiv: 1810.04805"},{"key":"e_1_3_2_1_11_1","volume-title":"An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. CoRR","author":"Dosovitskiy Alexey","year":"1929","unstructured":"Alexey Dosovitskiy , Lucas Beyer , Alexander Kolesnikov , Dirk Weissenborn , Xiaohua Zhai , Thomas Unterthiner , Mostafa Dehghani , Matthias Minderer , Georg Heigold , Sylvain Gelly , Jakob Uszkoreit , and Neil Houlsby . 2020. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. CoRR , Vol. abs\/ 2010 .1 1929 (2020). arxiv: 2010.11929 Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2020. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. CoRR, Vol. abs\/2010.11929 (2020). arxiv: 2010.11929"},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/3077136.3080826"},{"key":"e_1_3_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/2647868.2654902"},{"key":"e_1_3_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/1719970.1720021"},{"key":"e_1_3_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.01458"},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/3397271.3401430"},{"key":"e_1_3_2_1_17_1","volume-title":"Generative Adversarial Nets. In Annual Conference on Neural Information Processing Systems","author":"Goodfellow Ian J.","year":"2014","unstructured":"Ian J. Goodfellow , Jean Pouget-Abadie , Mehdi Mirza , Bing Xu , David Warde-Farley , Sherjil Ozair , Aaron C. Courville , and Yoshua Bengio . 2014 . Generative Adversarial Nets. In Annual Conference on Neural Information Processing Systems 2014. 2672--2680. Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron C. Courville, and Yoshua Bengio. 2014. Generative Adversarial Nets. In Annual Conference on Neural Information Processing Systems 2014. 2672--2680."},{"key":"e_1_3_2_1_18_1","volume-title":"Deep Residual Learning for Image Recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR. 770--778","author":"He Kaiming","year":"2016","unstructured":"Kaiming He , Xiangyu Zhang , Shaoqing Ren , and Jian Sun . 2016 . Deep Residual Learning for Image Recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR. 770--778 . Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR. 770--778."},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1162\/neco.1997.9.8.1735"},{"key":"e_1_3_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1093\/biomet\/28.3-4.321"},{"key":"e_1_3_2_1_21_1","volume-title":"ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision. CoRR","author":"Kim Wonjae","year":"2021","unstructured":"Wonjae Kim , Bokyung Son , and Ildoo Kim . 2021. ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision. CoRR , Vol. abs\/ 2102 .03334 ( 2021 ). arxiv: 2102.03334 Wonjae Kim, Bokyung Son, and Ildoo Kim. 2021. ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision. CoRR, Vol. abs\/2102.03334 (2021). arxiv: 2102.03334"},{"key":"e_1_3_2_1_22_1","volume-title":"Skip-Thought Vectors. In Annual Conference on Neural Information Processing Systems","author":"Kiros Ryan","year":"2015","unstructured":"Ryan Kiros , Yukun Zhu , Ruslan Salakhutdinov , Richard S. Zemel , Raquel Urtasun , Antonio Torralba , and Sanja Fidler . 2015 . Skip-Thought Vectors. In Annual Conference on Neural Information Processing Systems 2015. 3294--3302. Ryan Kiros, Yukun Zhu, Ruslan Salakhutdinov, Richard S. Zemel, Raquel Urtasun, Antonio Torralba, and Sanja Fidler. 2015. Skip-Thought Vectors. In Annual Conference on Neural Information Processing Systems 2015. 3294--3302."},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/2983323.2983897"},{"key":"e_1_3_2_1_24_1","volume-title":"Stacked Cross Attention for Image-Text Matching. In 15th European Conference on Computer Vision (ECCV). 212--228","author":"Lee Kuang-Huei","year":"2018","unstructured":"Kuang-Huei Lee , Xi Chen , Gang Hua , Houdong Hu , and Xiaodong He . 2018 a. Stacked Cross Attention for Image-Text Matching. In 15th European Conference on Computer Vision (ECCV). 212--228 . Kuang-Huei Lee, Xi Chen, Gang Hua, Houdong Hu, and Xiaodong He. 2018a. Stacked Cross Attention for Image-Text Matching. In 15th European Conference on Computer Vision (ECCV). 212--228."},{"key":"e_1_3_2_1_25_1","volume-title":"CleanNet: Transfer Learning for Scalable Image Classifier Training With Label Noise. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR. 5447--5456","author":"Lee Kuang-Huei","year":"2018","unstructured":"Kuang-Huei Lee , Xiaodong He , Lei Zhang , and Linjun Yang . 2018 b. CleanNet: Transfer Learning for Scalable Image Classifier Training With Label Noise. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR. 5447--5456 . Kuang-Huei Lee, Xiaodong He, Lei Zhang, and Linjun Yang. 2018b. CleanNet: Transfer Learning for Scalable Image Classifier Training With Label Noise. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR. 5447--5456."},{"key":"e_1_3_2_1_26_1","volume-title":"DeepFood: Deep Learning-Based Food Image Recognition for Computer-Aided Dietary Assessment. In 14th International Conference on Smart Homes and Health Telematics, ICOST","author":"Liu Chang","year":"2016","unstructured":"Chang Liu , Yu Cao , Yan Luo , Guanling Chen , Vinod Vokkarane , and Yunsheng Ma . 2016 . DeepFood: Deep Learning-Based Food Image Recognition for Computer-Aided Dietary Assessment. In 14th International Conference on Smart Homes and Health Telematics, ICOST 2016. 37--48. Chang Liu, Yu Cao, Yan Luo, Guanling Chen, Vinod Vokkarane, and Yunsheng Ma. 2016. DeepFood: Deep Learning-Based Food Image Recognition for Computer-Aided Dietary Assessment. In 14th International Conference on Smart Homes and Health Telematics, ICOST 2016. 37--48."},{"key":"e_1_3_2_1_27_1","volume-title":"ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks. In Annual Conference on Neural Information Processing Systems","author":"Lu Jiasen","year":"2019","unstructured":"Jiasen Lu , Dhruv Batra , Devi Parikh , and Stefan Lee . 2019 . ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks. In Annual Conference on Neural Information Processing Systems 2019. 13--23. Jiasen Lu, Dhruv Batra, Devi Parikh, and Stefan Lee. 2019. ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks. In Annual Conference on Neural Information Processing Systems 2019. 13--23."},{"key":"e_1_3_2_1_28_1","volume-title":"Wide-Slice Residual Networks for Food Recognition. In 2018 IEEE Winter Conference on Applications of Computer Vision, WACV. 567--576","author":"Martinel Niki","year":"2018","unstructured":"Niki Martinel , Gian Luca Foresti , and Christian Micheloni . 2018 . Wide-Slice Residual Networks for Food Recognition. In 2018 IEEE Winter Conference on Applications of Computer Vision, WACV. 567--576 . Niki Martinel, Gian Luca Foresti, and Christian Micheloni. 2018. Wide-Slice Residual Networks for Food Recognition. In 2018 IEEE Winter Conference on Applications of Computer Vision, WACV. 567--576."},{"key":"e_1_3_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCVW.2015.70"},{"key":"e_1_3_2_1_30_1","volume-title":"Fine-grained Visual Textual Alignment for Cross-Modal Retrieval using Transformer Encoders. CoRR","author":"Messina Nicola","year":"2020","unstructured":"Nicola Messina , Giuseppe Amato , Andrea Esuli , Fabrizio Falchi , Claudio Gennaro , and St\u00e9 phane Marchand-Maillet . 2020. Fine-grained Visual Textual Alignment for Cross-Modal Retrieval using Transformer Encoders. CoRR , Vol. abs\/ 2008 .05231 ( 2020 ). arxiv: 2008.05231 Nicola Messina, Giuseppe Amato, Andrea Esuli, Fabrizio Falchi, Claudio Gennaro, and St\u00e9 phane Marchand-Maillet. 2020. Fine-grained Visual Textual Alignment for Cross-Modal Retrieval using Transformer Encoders. CoRR, Vol. abs\/2008.05231 (2020). arxiv: 2008.05231"},{"key":"e_1_3_2_1_31_1","volume-title":"Annual Conference on Neural Information Processing Systems","author":"Mikolov Tom\u00e1","year":"2013","unstructured":"Tom\u00e1 s Mikolov , Ilya Sutskever , Kai Chen , Gregory S. Corrado , and Jeffrey Dean . 2013 . Distributed Representations of Words and Phrases and their Compositionality . In Annual Conference on Neural Information Processing Systems 2013. 3111--3119. Tom\u00e1 s Mikolov, Ilya Sutskever, Kai Chen, Gregory S. Corrado, and Jeffrey Dean. 2013. Distributed Representations of Words and Phrases and their Compositionality. In Annual Conference on Neural Information Processing Systems 2013. 3111--3119."},{"key":"e_1_3_2_1_32_1","first-page":"950","article-title":"You Are What You Eat","volume":"20","author":"Min Weiqing","year":"2018","unstructured":"Weiqing Min , Bing-Kun Bao , Shuhuan Mei , Yaohui Zhu , Yong Rui , and Shuqiang Jiang . 2018 . You Are What You Eat : Exploring Rich Recipe Information for Cross-Region Food Analysis. IEEE Trans. Multim. , Vol. 20 , 4 (2018), 950 -- 964 . Weiqing Min, Bing-Kun Bao, Shuhuan Mei, Yaohui Zhu, Yong Rui, and Shuqiang Jiang. 2018. You Are What You Eat: Exploring Rich Recipe Information for Cross-Region Food Analysis. IEEE Trans. Multim., Vol. 20, 4 (2018), 950--964.","journal-title":"Exploring Rich Recipe Information for Cross-Region Food Analysis. IEEE Trans. Multim."},{"key":"e_1_3_2_1_33_1","volume-title":"Jain","author":"Min Weiqing","year":"2019","unstructured":"Weiqing Min , Shuqiang Jiang , Linhu Liu , Yong Rui , and Ramesh C . Jain . 2019 . A Survey on Food Computing. ACM Comput. Surv ., Vol. 52 , 5 (2019), 92:1--92:36. Weiqing Min, Shuqiang Jiang, Linhu Liu, Yong Rui, and Ramesh C. Jain. 2019. A Survey on Food Computing. ACM Comput. Surv., Vol. 52, 5 (2019), 92:1--92:36."},{"key":"e_1_3_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2016.2639382"},{"key":"e_1_3_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.146"},{"volume-title":"Proceedings of the 28th International Conference on Machine Learning, ICML. 689--696","author":"Ngiam Jiquan","key":"e_1_3_2_1_36_1","unstructured":"Jiquan Ngiam , Aditya Khosla , Mingyu Kim , Juhan Nam , Honglak Lee , and Andrew Y. Ng . 2011. Multimodal Deep Learning . In Proceedings of the 28th International Conference on Machine Learning, ICML. 689--696 . Jiquan Ngiam, Aditya Khosla, Mingyu Kim, Juhan Nam, Honglak Lee, and Andrew Y. Ng. 2011. Multimodal Deep Learning. In Proceedings of the 28th International Conference on Machine Learning, ICML. 689--696."},{"key":"e_1_3_2_1_37_1","volume-title":"High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019","author":"Paszke Adam","year":"2019","unstructured":"Adam Paszke , Sam Gross , Francisco Massa , Adam Lerer , James Bradbury , Gregory Chanan , Trevor Killeen , Zeming Lin , Natalia Gimelshein , Luca Antiga , Alban Desmaison , Andreas K\u00f6 pf, Edward Yang , Zachary DeVito , Martin Raison , Alykhan Tejani , Sasank Chilamkurthy , Benoit Steiner , Lu Fang , Junjie Bai , and Soumith Chintala . 2019 . PyTorch: An Imperative Style , High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019 , NeurIPS 2019, December 8--14, 2019, Vancouver, BC, Canada. 8024--8035. Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas K\u00f6 pf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8--14, 2019, Vancouver, BC, Canada. 8024--8035."},{"key":"e_1_3_2_1_38_1","volume-title":"CHEF: Cross-modal Hierarchical Embeddings for Food Domain Retrieval. CoRR","author":"Pham Hai Xuan","year":"2021","unstructured":"Hai Xuan Pham , Ricardo Guerrero , Jiatong Li , and Vladimir Pavlovic . 2021 . CHEF: Cross-modal Hierarchical Embeddings for Food Domain Retrieval. CoRR , Vol. abs\/ 2102 .02547 (2021). arxiv: 2102.02547 Hai Xuan Pham, Ricardo Guerrero, Jiatong Li, and Vladimir Pavlovic. 2021. CHEF: Cross-modal Hierarchical Embeddings for Food Domain Retrieval. CoRR, Vol. abs\/2102.02547 (2021). arxiv: 2102.02547"},{"key":"e_1_3_2_1_39_1","volume-title":"Cross-Modal Hierarchical Modelling for Fine-Grained Sketch Based Image Retrieval. In 31st British Machine Vision Conference","author":"Sain Aneeshan","year":"2020","unstructured":"Aneeshan Sain , Ayan Kumar Bhunia , Yongxin Yang , Tao Xiang , and Yi-Zhe Song . 2020 . Cross-Modal Hierarchical Modelling for Fine-Grained Sketch Based Image Retrieval. In 31st British Machine Vision Conference 2020, BMVC. Aneeshan Sain, Ayan Kumar Bhunia, Yongxin Yang, Tao Xiang, and Yi-Zhe Song. 2020. Cross-Modal Hierarchical Modelling for Fine-Grained Sketch Based Image Retrieval. In 31st British Machine Vision Conference 2020, BMVC."},{"key":"e_1_3_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1145\/3041021.3055137"},{"key":"e_1_3_2_1_41_1","volume-title":"Learning Cross-Modal Embeddings for Cooking Recipes and Food Images. In 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR. 3068--3076","author":"Salvador Amaia","year":"2017","unstructured":"Amaia Salvador , Nicholas Hynes , Yusuf Aytar , Javier Mar'i n, Ferda Ofli , Ingmar Weber , and Antonio Torralba . 2017 . Learning Cross-Modal Embeddings for Cooking Recipes and Food Images. In 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR. 3068--3076 . Amaia Salvador, Nicholas Hynes, Yusuf Aytar, Javier Mar'i n, Ferda Ofli, Ingmar Weber, and Antonio Torralba. 2017. Learning Cross-Modal Embeddings for Cooking Recipes and Food Images. In 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR. 3068--3076."},{"key":"e_1_3_2_1_42_1","volume-title":"Manning","author":"Tai Kai Sheng","year":"2015","unstructured":"Kai Sheng Tai , Richard Socher , and Christopher D . Manning . 2015 . Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, ACL 2015, July 26--31, 2015, Beijing, China, Volume 1: Long Papers. 1556--1566. Kai Sheng Tai, Richard Socher, and Christopher D. Manning. 2015. Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, ACL 2015, July 26--31, 2015, Beijing, China, Volume 1: Long Papers. 1556--1566."},{"key":"e_1_3_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D19-1514"},{"key":"e_1_3_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1145\/3038912.3052573"},{"key":"e_1_3_2_1_45_1","volume-title":"Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani , Noam Shazeer , Niki Parmar , Jakob Uszkoreit , Llion Jones , Aidan N. Gomez , Lukasz Kaiser , and Illia Polosukhin . 2017 . Attention is All you Need . In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017 , December 4 --9 , 2017, Long Beach, CA, USA. 5998--6008. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4--9, 2017, Long Beach, CA, USA. 5998--6008."},{"key":"e_1_3_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1145\/3123266.3123326"},{"volume-title":"Learning Cross-Modal Embeddings With Adversarial Networks for Cooking Recipes and Food Images. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR. 11572--11581","author":"Wang Hao","key":"e_1_3_2_1_47_1","unstructured":"Hao Wang , Doyen Sahoo , Chenghao Liu , Ee-Peng Lim , and Steven C. H. Hoi . 2019 b . Learning Cross-Modal Embeddings With Adversarial Networks for Cooking Recipes and Food Images. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR. 11572--11581 . Hao Wang, Doyen Sahoo, Chenghao Liu, Ee-Peng Lim, and Steven C. H. Hoi. 2019 b. Learning Cross-Modal Embeddings With Adversarial Networks for Cooking Recipes and Food Images. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR. 11572--11581."},{"key":"e_1_3_2_1_48_1","volume-title":"CAMP: Cross-Modal Adaptive Message Passing for Text-Image Retrieval. In 2019 IEEE\/CVF International Conference on Computer Vision, ICCV. 5763--5772","author":"Wang Zihao","year":"2019","unstructured":"Zihao Wang , Xihui Liu , Hongsheng Li , Lu Sheng , Junjie Yan , Xiaogang Wang , and Jing Shao . 2019 a . CAMP: Cross-Modal Adaptive Message Passing for Text-Image Retrieval. In 2019 IEEE\/CVF International Conference on Computer Vision, ICCV. 5763--5772 . Zihao Wang, Xihui Liu, Hongsheng Li, Lu Sheng, Junjie Yan, Xiaogang Wang, and Jing Shao. 2019 a. CAMP: Cross-Modal Adaptive Message Passing for Text-Image Retrieval. In 2019 IEEE\/CVF International Conference on Computer Vision, ICCV. 5763--5772."},{"key":"e_1_3_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11280-018-0541-x"},{"key":"e_1_3_2_1_50_1","volume-title":"2020 a. Joint Feature Synthesis and Embedding: Adversarial Cross-modal Retrieval Revisited","author":"Xu X.","year":"2020","unstructured":"X. Xu , K. Lin , Y. Yang , A. Hanjalic , and H. Shen . 2020 a. Joint Feature Synthesis and Embedding: Adversarial Cross-modal Retrieval Revisited . IEEE Transactions on Pattern Analysis & Machine Intelligence ( 2020 ), 1--18. X. Xu, K. Lin, Y. Yang, A. Hanjalic, and H. Shen. 2020 a. Joint Feature Synthesis and Embedding: Adversarial Cross-modal Retrieval Revisited. IEEE Transactions on Pattern Analysis & Machine Intelligence (2020), 1--18."},{"key":"e_1_3_2_1_51_1","unstructured":"X. Xu H. Lu J. Song Y. Yang H. T. Shen and X. Li. 2019. Ternary Adversarial Networks With Self-Supervision for Zero-Shot Cross-Modal Retrieval. IEEE Transactions on Cybernetics (2019) 1--14. X. Xu H. Lu J. Song Y. Yang H. T. Shen and X. Li. 2019. Ternary Adversarial Networks With Self-Supervision for Zero-Shot Cross-Modal Retrieval. IEEE Transactions on Cybernetics (2019) 1--14."},{"key":"e_1_3_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1109\/TNNLS.2020.2967597"},{"key":"e_1_3_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1145\/3372278.3390681"},{"key":"e_1_3_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.1145\/2872427.2882995"},{"key":"e_1_3_2_1_55_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.127"},{"key":"e_1_3_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.01174"}],"event":{"name":"ICMR '21: International Conference on Multimedia Retrieval","sponsor":["SIGMM ACM Special Interest Group on Multimedia"],"location":"Taipei Taiwan","acronym":"ICMR '21"},"container-title":["Proceedings of the 2021 International Conference on Multimedia Retrieval"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3460426.3463618","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,6,9]],"date-time":"2023-06-09T16:52:07Z","timestamp":1686329527000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3460426.3463618"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,8,24]]},"references-count":56,"alternative-id":["10.1145\/3460426.3463618","10.1145\/3460426"],"URL":"https:\/\/doi.org\/10.1145\/3460426.3463618","relation":{},"subject":[],"published":{"date-parts":[[2021,8,24]]},"assertion":[{"value":"2021-09-01","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}