{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,5,3]],"date-time":"2025-05-03T06:46:01Z","timestamp":1746254761581},"reference-count":116,"publisher":"Wiley","issue":"2","license":[{"start":{"date-parts":[[2023,1,20]],"date-time":"2023-01-20T00:00:00Z","timestamp":1674172800000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/onlinelibrary.wiley.com\/termsAndConditions#vor"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["62106150"],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["wires.onlinelibrary.wiley.com"],"crossmark-restriction":true},"short-container-title":["WIREs Data Min & Knowl"],"published-print":{"date-parts":[[2023,3]]},"abstract":"Abstract<\/jats:title>Multimodal learning provides a path to fully utilize all types of information related to the modeling target to provide the model with a global vision. Zero\u2010shot learning (ZSL) is a general solution for incorporating prior knowledge into data\u2010driven models and achieving accurate class identification. The combination of the two, known as multimodal ZSL (MZSL), can fully exploit the advantages of both technologies and is expected to produce models with greater generalization ability. However, the MZSL algorithms and applications have not yet been thoroughly investigated and summarized. This study fills this gap by providing an objective overview of MZSL's definition, typical algorithms, representative applications, and critical issues. This article will not only provide researchers in this field with a comprehensive perspective, but it will also highlight several promising research directions.<\/jats:p>This article is categorized under:\nAlgorithmic Development > Multimedia<\/jats:p><\/jats:list-item>\nTechnologies > Classification<\/jats:p><\/jats:list-item>\nTechnologies > Machine Learning<\/jats:p><\/jats:list-item>\n<\/jats:list><\/jats:p>","DOI":"10.1002\/widm.1488","type":"journal-article","created":{"date-parts":[[2023,1,20]],"date-time":"2023-01-20T10:44:20Z","timestamp":1674211460000},"update-policy":"http:\/\/dx.doi.org\/10.1002\/crossmark_policy","source":"Crossref","is-referenced-by-count":24,"title":["A review on multimodal zero\u2010shot learning"],"prefix":"10.1002","volume":"13","author":[{"ORCID":"http:\/\/orcid.org\/0000-0003-2414-6066","authenticated-orcid":false,"given":"Weipeng","family":"Cao","sequence":"first","affiliation":[{"name":"Guangdong Laboratory of Artificial Intelligence and Digital Economy (SZ) Shenzhen China"}]},{"given":"Yuhao","family":"Wu","sequence":"additional","affiliation":[{"name":"Guangdong Laboratory of Artificial Intelligence and Digital Economy (SZ) Shenzhen China"}]},{"given":"Yixuan","family":"Sun","sequence":"additional","affiliation":[{"name":"Anhui University New York Stony Brook College Anhui University Hefei China"}]},{"given":"Haigang","family":"Zhang","sequence":"additional","affiliation":[{"name":"Institute of Applied Artificial Intelligence of the Guangdong\u2010Hong Kong\u2010Macao Greater Bay Area Shenzhen Polytechnic Shenzhen China"}]},{"ORCID":"http:\/\/orcid.org\/0000-0002-2511-8318","authenticated-orcid":false,"given":"Jin","family":"Ren","sequence":"additional","affiliation":[{"name":"Institute of Applied Artificial Intelligence of the Guangdong\u2010Hong Kong\u2010Macao Greater Bay Area Shenzhen Polytechnic Shenzhen China"}]},{"given":"Dujuan","family":"Gu","sequence":"additional","affiliation":[{"name":"NSFOCUS Technologies Group Co., Ltd Beijing China"}]},{"given":"Xingkai","family":"Wang","sequence":"additional","affiliation":[{"name":"NSFOCUS Technologies Group Co., Ltd Beijing China"}]}],"member":"311","published-online":{"date-parts":[[2023,1,20]]},"reference":[{"key":"e_1_2_10_2_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2015.2487986"},{"key":"e_1_2_10_3_1","first-page":"24206","article-title":"Vatt: Transformers for multimodal self\u2010supervised learning from raw video, audio and text","volume":"34","author":"Akbari H.","year":"2021","journal-title":"Proceedings of the Advances in Neural Information Processing Systems"},{"key":"e_1_2_10_4_1","doi-asserted-by":"crossref","unstructured":"Annadani Y. &Biswas S.(2018).Preserving semantic relations for zero\u2010shot learning. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 7603\u20137612).","DOI":"10.1109\/CVPR.2018.00793"},{"key":"e_1_2_10_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2018.2798607"},{"key":"e_1_2_10_6_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00371-021-02166-7"},{"key":"e_1_2_10_7_1","doi-asserted-by":"crossref","unstructured":"Bendre N. Desai K. &Najafirad P.(2021).Generalized zero\u2010shot learning using multimodal variational auto\u2010encoder with semantic concepts. In Proceedings of the IEEE international conference on image processing (pp. 1284\u20131288).","DOI":"10.1109\/ICIP42928.2021.9506108"},{"key":"e_1_2_10_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMI.2021.3131245"},{"key":"e_1_2_10_9_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2022.09.070"},{"key":"e_1_2_10_10_1","doi-asserted-by":"crossref","unstructured":"Cao W. Zhou C. Wu Y. Ming Z. Xu Z. &Zhang J.(2020).Research progress of zero\u2010shot learning beyond computer vision. In International conference on algorithms and architectures for parallel processing (pp. 538\u2013551).","DOI":"10.1007\/978-3-030-60239-0_36"},{"key":"e_1_2_10_11_1","doi-asserted-by":"crossref","unstructured":"Chen S. Wang W. Xia B. Peng Q. You X. Zheng F. &Shao L.(2021).Free: Feature refinement for generalized zero\u2010shot learning. In Proceedings of the IEEE\/CVF international conference on computer vision (pp. 122\u2013131).","DOI":"10.1109\/ICCV48922.2021.00019"},{"key":"e_1_2_10_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2020.3047546"},{"key":"e_1_2_10_13_1","doi-asserted-by":"crossref","unstructured":"Chen Z. Chen J. Geng Y. Pan J. Z. Yuan Z. &Chen H.(2021).Zeroshot visual question answering using knowledge graph. In International semantic web conference (pp. 146\u2013162).","DOI":"10.1007\/978-3-030-88361-4_9"},{"key":"e_1_2_10_14_1","doi-asserted-by":"crossref","unstructured":"Chen Z. Li J. Luo Y. Huang Z. &Yang Y.(2020).Canzsl: Cycle\u2010consistent adversarial networks for zero\u2010shot learning from natural language. In Proceedings of the IEEE\/CVF winter conference on applications of computer vision (pp. 874\u2013883).","DOI":"10.1109\/WACV45572.2020.9093610"},{"key":"e_1_2_10_15_1","doi-asserted-by":"crossref","unstructured":"Chi J. &Peng Y.(2018).Dual adversarial networks for zero\u2010shot cross\u2010media retrieval. In Proceedings of the international joint conference on artificial intelligence (pp. 663\u2013669).","DOI":"10.24963\/ijcai.2018\/92"},{"key":"e_1_2_10_16_1","doi-asserted-by":"crossref","unstructured":"Chua T.\u2010S. Tang J. Hong R. Li H. Luo Z. &Zheng Y.(2009).Nus\u2010wide: A real\u2010world web image database from national university of Singapore. In Proceedings of the acm international conference on image and video retrieval (pp. 1\u20139).","DOI":"10.1145\/1646396.1646452"},{"key":"e_1_2_10_17_1","unstructured":"Dai W. Liu Z. Yu T. &Fung P.(2020).Modality\u2010transferable emotion embeddings for low\u2010resource multimodal emotion recognition. In Proceedings of the 1st conference of the Asia\u2010Pacific chapter of the association for computational linguistics and the 10th international joint conference on natural language processing (pp. 269\u2013280)."},{"key":"e_1_2_10_18_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10489-022-03869-7"},{"key":"e_1_2_10_19_1","doi-asserted-by":"crossref","unstructured":"Elhoseiny M. Liu J. Cheng H. Sawhney H. &Elgammal A.(2016).Zeroshot event detection by multimodal distributional semantic embedding of videos. In Proceedings of the AAAI conference on artificial intelligence.","DOI":"10.1609\/aaai.v30i1.10458"},{"key":"e_1_2_10_20_1","doi-asserted-by":"crossref","unstructured":"Farhadi A. Endres I. Hoiem D. &Forsyth D.(2009).Describing objects by their attributes. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1778\u20131785).","DOI":"10.1109\/CVPR.2009.5206772"},{"key":"e_1_2_10_21_1","doi-asserted-by":"crossref","unstructured":"Felix R. Vijay Kumar B. G. Reid I. &Carneiro G.(2018).Multi\u2010modal cycle\u2010consistent generalized zero\u2010shot learning. In Proceedings of the European conference on computer vision (pp. 21\u201337).","DOI":"10.1007\/978-3-030-01231-1_2"},{"key":"e_1_2_10_22_1","unstructured":"Frome A. Corrado G. S. Shlens J. Bengio S. Dean J. Ranzato M. &Mikolov T.(2013).Devise: A deep visual\u2010semantic embedding model. In Proceedings of the advances in neural information processing systems."},{"issue":"2","key":"e_1_2_10_23_1","first-page":"303","article-title":"Learning multimodal latent attributes","volume":"36","author":"Fu Y.","year":"2013","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"},{"key":"e_1_2_10_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2015.2408354"},{"key":"e_1_2_10_25_1","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00473"},{"key":"e_1_2_10_26_1","unstructured":"Goodfellow I. Pouget\u2010Abadie J. Mirza M. Xu B. Warde\u2010Farley D. Ozair S. Courville A. &Bengio Y.(2014).Generative adversarial nets. In Proceedings of the advances in neural information processing systems."},{"key":"e_1_2_10_27_1","doi-asserted-by":"crossref","unstructured":"Guo D. Lu S. Duan N. Wang Y. Zhou M. &Yin J.(2022).Unixcoder: Unified cross\u2010modal pre\u2010training for code representation. In Proceedings of the 60th annual meeting of the association for computational linguistics (pp. 7212\u20137225).","DOI":"10.18653\/v1\/2022.acl-long.499"},{"key":"e_1_2_10_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/TAES.2022.3192804"},{"key":"e_1_2_10_29_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2022.118237"},{"key":"e_1_2_10_30_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.future.2021.09.044"},{"key":"e_1_2_10_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/TGCN.2021.3062972"},{"key":"e_1_2_10_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2019.2916887"},{"key":"e_1_2_10_33_1","unstructured":"Hayat N. Lashen H. &Shamout F. E.(2021).Multi\u2010label generalized zero shot learning for the classiffcation of disease in chest radiographs. In Machine learning for healthcare conference (pp. 461\u2013477)."},{"key":"e_1_2_10_34_1","doi-asserted-by":"crossref","unstructured":"He K. Zhang X. Ren S. &Sun J.(2016).Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770\u2013778).","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_2_10_35_1","doi-asserted-by":"crossref","unstructured":"Huang H. Wang C. Yu P. S. &Wang C.\u2010D.(2019).Generative dual adversarial network for generalized zero\u2010shot learning. In Proceedings of the ieee\/cvf conference on computer vision and pattern recognition (pp. 801\u2013810).","DOI":"10.1109\/CVPR.2019.00089"},{"key":"e_1_2_10_36_1","doi-asserted-by":"crossref","unstructured":"Huang P.\u2010Y. Patrick M. Hu J. Neubig G. Metze F. &Hauptmann A. G.(2021).Multilingual multimodal pre\u2010training for zero\u2010shot cross\u2010lingual transfer of vision\u2010language models. In Proceedings of the 2021 conference of the north American chapter of the association for computational linguistics: Human language technologies (pp. 2443\u20132459).","DOI":"10.18653\/v1\/2021.naacl-main.195"},{"key":"e_1_2_10_37_1","first-page":"10944","article-title":"What makes multi\u2010modal learning better than single (provably)","volume":"34","author":"Huang Y.","year":"2021","journal-title":"Proceedings of the Advances in Neural Information Processing Systems"},{"key":"e_1_2_10_38_1","doi-asserted-by":"crossref","unstructured":"Hubert Tsai Y.\u2010H. Huang L.\u2010K. &Salakhutdinov R.(2017).Learning robust visual\u2010semantic embeddings. In Proceedings of the IEEE international conference on computer vision (pp. 3571\u20133580).","DOI":"10.1109\/ICCV.2017.386"},{"key":"e_1_2_10_39_1","doi-asserted-by":"crossref","unstructured":"Huynh D. &Elhamifar E.(2020).A shared multi\u2010attention framework for multi\u2010label zero\u2010shot learning. In Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition (pp. 8776\u20138786).","DOI":"10.1109\/CVPR42600.2020.00880"},{"key":"e_1_2_10_40_1","doi-asserted-by":"crossref","unstructured":"Jain A. Mildenhall B. Barron J. T. Abbeel P. &Poole B.(2022).Zeroshot text\u2010guided object generation with dream fields. In Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition (pp. 867\u2013876).","DOI":"10.1109\/CVPR52688.2022.00094"},{"key":"e_1_2_10_41_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.knosys.2020.105847"},{"key":"e_1_2_10_42_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2018.08.014"},{"key":"e_1_2_10_43_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.ins.2016.10.025"},{"key":"e_1_2_10_44_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2022.3173815"},{"key":"e_1_2_10_45_1","unstructured":"Kingma D. P. &Welling M.(2014).Auto\u2010encoding variational Bayes. In Proceedings of the International Conference on Learning Representations pp. 1\u201314."},{"key":"e_1_2_10_46_1","doi-asserted-by":"crossref","unstructured":"Kolouri S. Rostami M. Owechko Y. &Kim K.(2018).Joint dictionaries for zero\u2010shot learning. In Proceedings of the AAAI conference on artificial intelligence (Vol. 32).","DOI":"10.1609\/aaai.v32i1.11649"},{"key":"e_1_2_10_47_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2013.140"},{"key":"e_1_2_10_48_1","doi-asserted-by":"crossref","unstructured":"Lee C.\u2010W. Fang W. Yeh C.\u2010K. &Wang Y.\u2010C. F.(2018).Multi\u2010label zeroshot learning with structured knowledge graphs. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1576\u20131585).","DOI":"10.1109\/CVPR.2018.00170"},{"key":"e_1_2_10_49_1","doi-asserted-by":"crossref","unstructured":"Lee S. H. Roh W. Byeon W. Yoon S. H. Kim C. Kim J. &Kim S.(2022).Sound\u2010guided semantic image manipulation. In Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition (pp. 3377\u20133386).","DOI":"10.1109\/CVPR52688.2022.00337"},{"key":"e_1_2_10_50_1","doi-asserted-by":"crossref","unstructured":"Lei Ba J. Swersky K. Fidler S. &Salakhutdinov R.(2015).Predicting deep zero\u2010shot convolutional neural networks using textual descriptions. In Proceedings of the IEEE international conference on computer vision (pp. 4247\u20134255).","DOI":"10.1109\/ICCV.2015.483"},{"key":"e_1_2_10_51_1","doi-asserted-by":"crossref","unstructured":"Li H. Ding W. Kang Y. Liu T. Wu Z. &Liu Z.(2021).Ctal: Pre\u2010training cross\u2010modal transformer for audio\u2010and\u2010language representations. In Proceedings of the 2021 conference on empirical methods in natural language processing (pp. 3966\u20133977).","DOI":"10.18653\/v1\/2021.emnlp-main.323"},{"key":"e_1_2_10_52_1","doi-asserted-by":"crossref","unstructured":"Li J. Jing M. Lu K. Ding Z. Zhu L. &Huang Z.(2019).Leveraging the invariant side of generative zero\u2010shot learning. In Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition (pp. 7402\u20137411).","DOI":"10.1109\/CVPR.2019.00758"},{"key":"e_1_2_10_53_1","doi-asserted-by":"publisher","DOI":"10.1109\/TCYB.2021.3050803"},{"key":"e_1_2_10_54_1","doi-asserted-by":"crossref","unstructured":"Li J. Jing M. Zhu L. Ding Z. Lu K. &Yang Y.(2020).Learning modalityinvariant latent representations for generalized zero\u2010shot learning. In Proceedings of the 28th acm international conference on multimedia (pp. 1348\u20131356).","DOI":"10.1145\/3394171.3413503"},{"key":"e_1_2_10_55_1","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2020.3029288"},{"key":"e_1_2_10_56_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2021.3139211"},{"key":"e_1_2_10_57_1","doi-asserted-by":"crossref","unstructured":"Lin K. Xu X. Gao L. Wang Z. &Shen H. T.(2020).Learning cross\u2010aligned latent embeddings for zero\u2010shot cross\u2010modal retrieval. In Proceedings of the AAAI conference on artificial intelligence (Vol. 34 pp. 11515\u201311522).","DOI":"10.1609\/aaai.v34i07.6817"},{"key":"e_1_2_10_58_1","doi-asserted-by":"publisher","DOI":"10.1109\/TSMC.2018.2818184"},{"key":"e_1_2_10_59_1","doi-asserted-by":"publisher","DOI":"10.1109\/TCYB.2022.3164142"},{"key":"e_1_2_10_60_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2020.2975980"},{"key":"e_1_2_10_61_1","doi-asserted-by":"crossref","unstructured":"Liu Y. Xie D.\u2010Y. Gao Q. Han J. Wang S. &Gao X.(2019).Graph and autoencoder based feature extraction for zero\u2010shot learning. In Proceedings of the international joint conference on artificial intelligence (Vol. 1 p. 6).","DOI":"10.24963\/ijcai.2019\/421"},{"key":"e_1_2_10_62_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.patrec.2022.02.002"},{"key":"e_1_2_10_63_1","doi-asserted-by":"crossref","unstructured":"Madapana N.(2020).Zero\u2010shot learning for gesture recognition. In Proceedings of the 2020 international conference on multimodal interaction (pp. 754\u2013757).","DOI":"10.1145\/3382507.3421161"},{"key":"e_1_2_10_64_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-87722-4_5"},{"key":"e_1_2_10_65_1","doi-asserted-by":"crossref","unstructured":"Mandal D. Narayan S. Dwivedi S. K. Gupta V. Ahmed S. Khan F. S. &Shao L.(2019).Out\u2010of\u2010distribution detection for generalized zero\u2010shot action recognition. In Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition (pp. 9985\u20139993).","DOI":"10.1109\/CVPR.2019.01022"},{"key":"e_1_2_10_66_1","doi-asserted-by":"crossref","unstructured":"Mazumder P. Singh P. Parida K. K. &Namboodiri V. P.(2021).Avgzslnet: Audio\u2010visual generalized zero\u2010shot learning by reconstructing label features from multi\u2010modal embeddings. In Proceedings of the IEEE\/CVF winter conference on applications of computer vision (pp. 3090\u20133099).","DOI":"10.1109\/WACV48630.2021.00313"},{"issue":"108","key":"e_1_2_10_67_1","first-page":"556","article-title":"A zero\u2010shot deep metric learning approach to brain\u2013computer interfaces for image retrieval","volume":"246","author":"McCartney B.","year":"2022","journal-title":"Knowledge\u2010Based Systems"},{"key":"e_1_2_10_68_1","doi-asserted-by":"crossref","unstructured":"Mercea O.\u2010B. Riesch L. Koepke A. &Akata Z.(2022).Audio\u2010visual generalized zero\u2010shot learning with cross\u2010modal attention and language. In Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition (pp. 10553\u201310563).","DOI":"10.1109\/CVPR52688.2022.01030"},{"key":"e_1_2_10_69_1","doi-asserted-by":"crossref","unstructured":"Mishra A. Krishna Reddy S. Mittal A. &Murthy H. A.(2018).A generative model for zero shot learning using conditional variational autoencoders. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops (pp. 2188\u20132196).","DOI":"10.1109\/CVPRW.2018.00294"},{"key":"e_1_2_10_70_1","doi-asserted-by":"crossref","unstructured":"Narayan S. Gupta A. Khan F. S. Snoek C. G. &Shao L.(2020).Latent embedding feedback and discriminative features for zero\u2010shot classification. In Proceedings of the European conference on computer vision (pp. 479\u2013495).","DOI":"10.1007\/978-3-030-58542-6_29"},{"key":"e_1_2_10_71_1","doi-asserted-by":"crossref","unstructured":"Narayan S. Gupta A. Khan S. Khan F. S. Shao L. &Shah M.(2021).Discriminative region\u2010based multi\u2010label zero\u2010shot learning. In Proceedings of the IEEE\/CVF international conference on computer vision (pp. 8731\u20138740).","DOI":"10.1109\/ICCV48922.2021.00861"},{"key":"e_1_2_10_72_1","doi-asserted-by":"crossref","unstructured":"Nilsback M.\u2010E. &Zisserman A.(2008).Automated flower classification over a large number of classes. In Proceedings of the indian conference on computer vision graphics and image processing (pp. 722\u2013729).","DOI":"10.1109\/ICVGIP.2008.47"},{"key":"e_1_2_10_73_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.neunet.2020.09.010"},{"key":"e_1_2_10_74_1","doi-asserted-by":"crossref","unstructured":"Parida K. Matiyali N. Guha T. &Sharma G.(2020).Coordinated joint multimodal embeddings for generalized audio\u2010visual zero\u2010shot classification and retrieval of videos. In Proceedings of the IEEE\/CVF winter conference on applications of computer vision (pp. 3251\u20133260).","DOI":"10.1109\/WACV45572.2020.9093438"},{"key":"e_1_2_10_75_1","doi-asserted-by":"crossref","unstructured":"Patterson G. &Hays J.(2012).Sun attribute database: Discovering annotating and recognizing scene attributes. In 2012 IEEE conference on computer vision and pattern recognition (pp. 2751\u20132758).","DOI":"10.1109\/CVPR.2012.6247998"},{"key":"e_1_2_10_76_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2022.3191696"},{"key":"e_1_2_10_77_1","doi-asserted-by":"publisher","DOI":"10.1145\/3514247"},{"key":"e_1_2_10_78_1","doi-asserted-by":"crossref","unstructured":"Rei\u00df S. Roitberg A. Haurilet M. &Stiefelhagen R.(2020).Activity\u2010aware attributes for zero\u2010shot driver behavior recognition. In Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition workshops (pp. 902\u2013903).","DOI":"10.1109\/CVPRW50498.2020.00459"},{"key":"e_1_2_10_79_1","doi-asserted-by":"crossref","unstructured":"Schonfeld E. Ebrahimi S. Sinha S. Darrell T. &Akata Z.(2019).Generalized zero\u2010and few\u2010shot learning via aligned variational autoencoders. In Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition (pp. 8247\u20138255).","DOI":"10.1109\/CVPR.2019.00844"},{"key":"e_1_2_10_80_1","doi-asserted-by":"crossref","unstructured":"Sener F. &Yao A.(2019).Zero\u2010shot anticipation for instructional activities. In Proceedings of the IEEE\/CVF international conference on computer vision (pp. 862\u2013871).","DOI":"10.1109\/ICCV.2019.00095"},{"key":"e_1_2_10_81_1","doi-asserted-by":"crossref","unstructured":"Shigeto Y. Suzuki I. Hara K. Shimbo M. &Matsumoto Y.(2015).Ridge regression hubness and zero\u2010shot learning. In Joint European conference on machine learning and knowledge discovery in databases (pp. 135\u2013151).","DOI":"10.1007\/978-3-319-23528-8_9"},{"key":"e_1_2_10_82_1","doi-asserted-by":"crossref","unstructured":"Shvetsova N. Chen B. Rouditchenko A. Thomas S. Kingsbury B. Feris R. S. Harwarth D. Glass J. &Kuehne H.(2022).Everything at once\u2010multi\u2010modal fusion transformer for video retrieval. In Proceedings of the ieee\/cvf conference on computer vision and pattern recognition (pp. 20020\u201320029).","DOI":"10.1109\/CVPR52688.2022.01939"},{"key":"e_1_2_10_83_1","doi-asserted-by":"crossref","unstructured":"Sinha A. Akilesh B. Sarkar M. &Krishnamurthy B.(2019).Attention based natural language grounding by navigating virtual environment. In Proceedings of the ieee winter conference on applications of computer vision (pp. 236\u2013244).","DOI":"10.1109\/WACV.2019.00031"},{"key":"e_1_2_10_84_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10489-020-02075-7"},{"key":"e_1_2_10_85_1","doi-asserted-by":"crossref","unstructured":"Sung F. Yang Y. Zhang L. Xiang T. Torr P. H. &Hospedales T. M.(2018).Learning to compare: Relation network for few\u2010shot learning. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1199\u20131208).","DOI":"10.1109\/CVPR.2018.00131"},{"key":"e_1_2_10_86_1","doi-asserted-by":"crossref","unstructured":"Tziafas G. &Kasaei H.(2021).Few\u2010shot visual grounding for natural human\u2013robot interaction. In Proceedings of the IEEE international conference on autonomous robot systems and competitions (pp. 50\u201355).","DOI":"10.1109\/ICARSC52212.2021.9429801"},{"issue":"11","key":"e_1_2_10_87_1","first-page":"2579","article-title":"Visualizing data using t\u2010sne","volume":"9","author":"Maaten L.","year":"2008","journal-title":"Journal of Machine Learning Research"},{"key":"e_1_2_10_88_1","doi-asserted-by":"crossref","unstructured":"Verma V. K. Arora G. Mishra A. &Rai P.(2018).Generalized zero\u2010shot learning via synthesized examples. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4281\u20134289).","DOI":"10.1109\/CVPR.2018.00450"},{"key":"e_1_2_10_89_1","doi-asserted-by":"crossref","unstructured":"Vyas M. R. Venkateswara H. &Panchanathan S.(2020).Leveraging seen and unseen semantic relationships for generative zero\u2010shot learning. In Proceedings of the European conference on computer vision (pp. 70\u201386).","DOI":"10.1007\/978-3-030-58577-8_5"},{"key":"e_1_2_10_90_1","unstructured":"Wah C. Branson S. Welinder P. Perona P. &Belongie S.(2011).The caltech\u2010ucsd birds\u2010200\u20102011 dataset. Computation & Neural Systems Technical Report 2010\u2010001. California Institute of Technology Pasadena."},{"key":"e_1_2_10_91_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2022.02.056"},{"key":"e_1_2_10_92_1","doi-asserted-by":"crossref","unstructured":"Wang W. Tran D. &Feiszli M.(2020).What makes training multi\u2010modal classification networks hard? In Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition (pp. 12695\u201312705).","DOI":"10.1109\/CVPR42600.2020.01271"},{"issue":"2","key":"e_1_2_10_93_1","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3293318","article-title":"A survey of zero\u2010shot learning: Settings, methods, and applications","volume":"10","author":"Wang W.","year":"2019","journal-title":"ACM Transactions on Intelligent Systems and Technology"},{"key":"e_1_2_10_94_1","doi-asserted-by":"crossref","unstructured":"Wray M. Larlus D. Csurka G. &Damen D.(2019).Fine\u2010grained action retrieval through multiple parts\u2010of\u2010speech embeddings. In Proceedings of the IEEE\/CVF international conference on computer vision (pp. 450\u2013459).","DOI":"10.1109\/ICCV.2019.00054"},{"key":"e_1_2_10_95_1","unstructured":"Wu H. H. Fuentes M. &Bello J. P.(2021).Exploring modality\u2010agnostic representations for music classification. In Proceedings of the sound and music computing conference (pp. 191\u2013198)."},{"key":"e_1_2_10_96_1","doi-asserted-by":"crossref","unstructured":"Wu J. Zhang T. Zha Z.\u2010J. Luo J. Zhang Y. &Wu F.(2020).Self\u2010supervised domain\u2010aware generative network for generalized zero\u2010shot learning. In Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition (pp. 12767\u201312776).","DOI":"10.1109\/CVPR42600.2020.01278"},{"key":"e_1_2_10_97_1","doi-asserted-by":"crossref","unstructured":"Wu S. Bondugula S. Luisier F. Zhuang X. &Natarajan P.(2014).Zero\u2010shot event detection using multi\u2010modal fusion of weakly supervised concepts. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2665\u20132672).","DOI":"10.1109\/CVPR.2014.341"},{"key":"e_1_2_10_98_1","doi-asserted-by":"crossref","unstructured":"Wu Y. Cao W. Liu Y. Ming Z. Li J. &Lu B.(2021).Semantic autoencoder with l2\u2010norm constraint for zero\u2010shot learning. In Proceedings of the international conference on machine learning and computing (pp. 101\u2013105).","DOI":"10.1145\/3457682.3457699"},{"key":"e_1_2_10_99_1","doi-asserted-by":"crossref","unstructured":"Xian Y. Akata Z. Sharma G. Nguyen Q. Hein M. &Schiele B.(2016).Latent embeddings for zero\u2010shot classification. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 69\u201377).","DOI":"10.1109\/CVPR.2016.15"},{"key":"e_1_2_10_100_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2018.2857768"},{"key":"e_1_2_10_101_1","doi-asserted-by":"crossref","unstructured":"Xian Y. Lorenz T. Schiele B. &Akata Z.(2018).Feature generating networks for zero\u2010shot learning. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 5542\u20135551).","DOI":"10.1109\/CVPR.2018.00581"},{"key":"e_1_2_10_102_1","doi-asserted-by":"crossref","unstructured":"Xian Y. Sharma S. Schiele B. &Akata Z.(2019).f\u2010vaegan\u2010d2: A feature generating framework for any\u2010shot learning. In Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition (pp. 10275\u201310284).","DOI":"10.1109\/CVPR.2019.01052"},{"key":"e_1_2_10_103_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.neunet.2021.02.009"},{"key":"e_1_2_10_104_1","doi-asserted-by":"crossref","unstructured":"Xu H. Ghosh G. Huang P.\u2010Y. Okhonko D. Aghajanyan A. Metze F. Zettlemoyer L. &Feichtenhofer C.(2021).Videoclip: Contrastive pre\u2010training for zeroshot video\u2010text understanding. In Proceedings of the 2021 conference on empirical methods in natural language processing (pp. 6787\u20136800).","DOI":"10.18653\/v1\/2021.emnlp-main.544"},{"key":"e_1_2_10_105_1","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1109\/TPAMI.2022.3173208","article-title":"Learning to answer visual questions from web videos","volume":"1","author":"Yang A.","year":"2022","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"},{"key":"e_1_2_10_106_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2021.3089017"},{"key":"e_1_2_10_107_1","unstructured":"Yu H. &Lee B.(2019).Zero\u2010shot learning via simultaneous generating and learning. In Proceedings of the advances in neural information processing systems 32."},{"key":"e_1_2_10_108_1","doi-asserted-by":"publisher","DOI":"10.1109\/TCYB.2018.2850750"},{"key":"e_1_2_10_109_1","doi-asserted-by":"crossref","unstructured":"Yu Y. Ji Z. Han J. &Zhang Z.(2020).Episode\u2010based prototype generating network for zero\u2010shot learning. In Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition (pp. 14035\u201314044).","DOI":"10.1109\/CVPR42600.2020.01405"},{"key":"e_1_2_10_110_1","doi-asserted-by":"crossref","unstructured":"Yue Z. Wang T. Sun Q. Hua X.\u2010S. &Zhang H.(2021).Counterfactual zero\u2010shot and open\u2010set visual recognition. In Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition (pp. 15404\u201315414).","DOI":"10.1109\/CVPR46437.2021.01515"},{"key":"e_1_2_10_111_1","doi-asserted-by":"publisher","DOI":"10.1109\/JSTSP.2020.2987728"},{"key":"e_1_2_10_112_1","doi-asserted-by":"crossref","unstructured":"Zhang H. &Koniusz P.(2018).Zero\u2010shot kernel learning. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 7670\u20137679).","DOI":"10.1109\/CVPR.2018.00800"},{"key":"e_1_2_10_113_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.ins.2018.08.048"},{"key":"e_1_2_10_114_1","doi-asserted-by":"crossref","unstructured":"Zhang L. Xiang T. &Gong S.(2017).Learning a deep embedding model for zero\u2010shot learning. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2021\u20132030).","DOI":"10.1109\/CVPR.2017.321"},{"key":"e_1_2_10_115_1","doi-asserted-by":"publisher","DOI":"10.1186\/s40649-019-0069-y"},{"key":"e_1_2_10_116_1","doi-asserted-by":"crossref","unstructured":"Zhao X. Pang Y. Yang J. Zhang L. &Lu H.(2021).Multi\u2010source fusion and automatic predictor selection for zero\u2010shot video object segmentation. In Proceedings of the 29th ACM international conference on multimedia (pp. 2645\u20132653).","DOI":"10.1145\/3474085.3475192"},{"key":"e_1_2_10_117_1","doi-asserted-by":"crossref","unstructured":"Zhu Y. Elhoseiny M. Liu B. Peng X. &Elgammal A.(2018).A generative adversarial approach for zero\u2010shot learning from noisy texts. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1004\u20131013).","DOI":"10.1109\/CVPR.2018.00111"}],"container-title":["WIREs Data Mining and Knowledge Discovery"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/onlinelibrary.wiley.com\/doi\/pdf\/10.1002\/widm.1488","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/onlinelibrary.wiley.com\/doi\/full-xml\/10.1002\/widm.1488","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/wires.onlinelibrary.wiley.com\/doi\/pdf\/10.1002\/widm.1488","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,8,22]],"date-time":"2023-08-22T03:07:05Z","timestamp":1692673625000},"score":1,"resource":{"primary":{"URL":"https:\/\/wires.onlinelibrary.wiley.com\/doi\/10.1002\/widm.1488"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,1,20]]},"references-count":116,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2023,3]]}},"alternative-id":["10.1002\/widm.1488"],"URL":"https:\/\/doi.org\/10.1002\/widm.1488","archive":["Portico"],"relation":{},"ISSN":["1942-4787","1942-4795"],"issn-type":[{"value":"1942-4787","type":"print"},{"value":"1942-4795","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,1,20]]},"assertion":[{"value":"2022-08-17","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-12-23","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-01-20","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}