{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,8,16]],"date-time":"2024-08-16T17:07:10Z","timestamp":1723828030805},"reference-count":50,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2017,10,4]],"date-time":"2017-10-04T00:00:00Z","timestamp":1507075200000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Research Grants Council of the Hong Kong Special Administrative Region, China","award":["CityU 120213"]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Multimedia Comput. Commun. Appl."],"published-print":{"date-parts":[[2017,11,30]]},"abstract":"\n Searching in digital video data for high-level events, such as a parade or a car accident, is challenging when the query is textual and lacks visual example images or videos. Current research in deep neural networks is highly beneficial for the retrieval of high-level events using visual examples, but without examples it is still hard to (1) determine which concepts are useful to pre-train (\n Vocabulary challenge<\/jats:italic>\n ) and (2) which pre-trained concept detectors are relevant for a certain unseen high-level event (\n Concept Selection challenge<\/jats:italic>\n ). In our article, we present our Semantic Event Retrieval System which (1) shows the importance of high-level concepts in a vocabulary for the retrieval of complex and generic high-level events and (2) uses a novel concept selection method (\n i-w2v<\/jats:italic>\n ) based on semantic embeddings. Our experiments on the international TRECVID Multimedia Event Detection benchmark show that a diverse vocabulary including high-level concepts improves performance on the retrieval of high-level events in videos and that our novel method outperforms a knowledge-based concept selection method.\n <\/jats:p>","DOI":"10.1145\/3131288","type":"journal-article","created":{"date-parts":[[2017,10,4]],"date-time":"2017-10-04T12:17:29Z","timestamp":1507119449000},"page":"1-17","update-policy":"http:\/\/dx.doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":14,"title":["Semantic Reasoning in Zero Example Video Event Retrieval"],"prefix":"10.1145","volume":"13","author":[{"given":"Maaike H. T. De","family":"Boer","sequence":"first","affiliation":[{"name":"TNO and Radboud University, The Netherlands"}]},{"given":"Yi-Jie","family":"Lu","sequence":"additional","affiliation":[{"name":"City University of Hong Kong, Hong Kong"}]},{"given":"Hao","family":"Zhang","sequence":"additional","affiliation":[{"name":"City University of Hong Kong, Hong Kong"}]},{"given":"Klamer","family":"Schutte","sequence":"additional","affiliation":[{"name":"TNO Netherlands"}]},{"given":"Chong-Wah","family":"Ngo","sequence":"additional","affiliation":[{"name":"City University of Hong Kong, Hong Kong"}]},{"given":"Wessel","family":"Kraaij","sequence":"additional","affiliation":[{"name":"TNO and Leiden University, The Netherlands"}]}],"member":"320","published-online":{"date-parts":[[2017,10,4]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11042-011-0818-x"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11042-010-0643-7"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/2071389.2071390"},{"key":"e_1_2_1_4_1","volume-title":"Proceedings of the 24th International Conference on Artificial Intelligence. AAAI Press, 2234--2240","author":"Chang Xiaojun","year":"2015","unstructured":"Xiaojun Chang , Yi Yang , Alexander G. Hauptmann , Eric P. Xing , and Yao-Liang Yu . 2015 . Semantic concept discovery for large-scale zero-shot event detection . In Proceedings of the 24th International Conference on Artificial Intelligence. AAAI Press, 2234--2240 . Xiaojun Chang, Yi Yang, Alexander G. Hauptmann, Eric P. Xing, and Yao-Liang Yu. 2015. Semantic concept discovery for large-scale zero-shot event detection. In Proceedings of the 24th International Conference on Artificial Intelligence. AAAI Press, 2234--2240."},{"key":"e_1_2_1_5_1","volume-title":"Hauptmann","author":"Chang Xiaojun","year":"2016","unstructured":"Xiaojun Chang , Yi Yang , Guodong Long , Chengqi Zhang , and Alexander G . Hauptmann . 2016 . Dynamic concept composition for zero-example event detection. In AAAI. 3464--3470. Xiaojun Chang, Yi Yang, Guodong Long, Chengqi Zhang, and Alexander G. Hauptmann. 2016. Dynamic concept composition for zero-example event detection. In AAAI. 3464--3470."},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/2578726.2578729"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/2505515.2507880"},{"key":"e_1_2_1_8_1","unstructured":"Maaike de Boer Klamer Schutte and Wessel Kraaij. 2015. Knowledge based query expansion in complex multimedia event detection. Multimed. Tools Appl. (2015) 1--19. Maaike de Boer Klamer Schutte and Wessel Kraaij. 2015. Knowledge based query expansion in complex multimedia event detection. Multimed. Tools Appl. (2015) 1--19."},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/2578726.2578746"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/2647868.2654913"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/2461466.2461482"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/1282280.1282369"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2007.900150"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/1460096.1460170"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.521"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/2647868.2654918"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/2578726.2578764"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/2671188.2749399"},{"key":"e_1_2_1_20_1","volume-title":"Inf. Retriev.","author":"Jiang Yu-Gang","year":"2012","unstructured":"Yu-Gang Jiang , Subhabrata Bhattacharya , Shih-Fu Chang , and Mubarak Shah . 2012. High-level event recognition in unconstrained videos . Int. J. Multimed. Inf. Retriev. ( 2012 ), 1--29. Yu-Gang Jiang, Subhabrata Bhattacharya, Shih-Fu Chang, and Mubarak Shah. 2012. High-level event recognition in unconstrained videos. Int. J. Multimed. Inf. Retriev. (2012), 1--29."},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2017.2670560"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2014.223"},{"key":"e_1_2_1_23_1","unstructured":"Lyndon Kennedy and Alexander Hauptmann. 2006. LSCOM lexicon definitions and annotations (version 1.0). (2006). Lyndon Kennedy and Alexander Hauptmann. 2006. LSCOM lexicon definitions and annotations (version 1.0). (2006)."},{"key":"e_1_2_1_24_1","volume-title":"Hinton","author":"Krizhevsky Alex","year":"2012","unstructured":"Alex Krizhevsky , Ilya Sutskever , and Geoffrey E . Hinton . 2012 . Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems . 1097--1105. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems. 1097--1105."},{"key":"e_1_2_1_25_1","unstructured":"Omer Levy and Yoav Goldberg. 2014. Neural word embedding as implicit matrix factorization. In Advances in Neural Information Processing Systems. 2177--2185. Omer Levy and Yoav Goldberg. 2014. Neural word embedding as implicit matrix factorization. In Advances in Neural Information Processing Systems. 2177--2185."},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00134"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.patcog.2006.04.045"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/2911996.2912015"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/2461466.2461507"},{"key":"e_1_2_1_30_1","volume-title":"Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR\u201914)","author":"Mensink Thomas","unstructured":"Thomas Mensink , Efstratios Gavves , and Cees G. M. Snoek . 2014. COSTA: Co-occurrence statistics for zero-shot classification . In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR\u201914) . IEEE, 2441--2448. Thomas Mensink, Efstratios Gavves, and Cees G. M. Snoek. 2014. COSTA: Co-occurrence statistics for zero-shot classification. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR\u201914). IEEE, 2441--2448."},{"key":"e_1_2_1_31_1","unstructured":"Tomas Mikolov Ilya Sutskever Kai Chen Greg S. Corrado and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems. 3111--3119. Tomas Mikolov Ilya Sutskever Kai Chen Greg S. Corrado and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems. 3111--3119."},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/219717.219748"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.artint.2012.06.007"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/1291233.1291448"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1007\/11788034_15"},{"key":"e_1_2_1_36_1","volume-title":"TRECVID 2014 -- An overview of the goals, tasks, data, evaluation mechanisms and metrics. In Proceedings of the Annual TREC Video Retrieval Evaluation (TRECVID\u201914)","author":"Over Paul","year":"2014","unstructured":"Paul Over , George Awad , Martial Michel , Jonathan Fiscus , Greg Sanders , Wessel Kraaij , Alan F. Smeaton , and Georges Quenot . 2014 . TRECVID 2014 -- An overview of the goals, tasks, data, evaluation mechanisms and metrics. In Proceedings of the Annual TREC Video Retrieval Evaluation (TRECVID\u201914) . NIST, USA. Paul Over, George Awad, Martial Michel, Jonathan Fiscus, Greg Sanders, Wessel Kraaij, Alan F. Smeaton, and Georges Quenot. 2014. TRECVID 2014 -- An overview of the goals, tasks, data, evaluation mechanisms and metrics. In Proceedings of the Annual TREC Video Retrieval Evaluation (TRECVID\u201914). NIST, USA."},{"key":"e_1_2_1_37_1","volume-title":"TRECVID 2015\u2014An overview of the goals, tasks, data, evaluation mechanisms and metrics. In Proceedings of the Annual TREC Video Retrieval Evaluation (TRECVID\u201915)","author":"Over Paul","year":"2015","unstructured":"Paul Over , George Awad , Martial Michel , Jonathan Fiscus , Greg Sanders , Wessel Kraaij , Alan F. Smeaton , Georges Quenot , and Roeland Ordelman . 2015 . TRECVID 2015\u2014An overview of the goals, tasks, data, evaluation mechanisms and metrics. In Proceedings of the Annual TREC Video Retrieval Evaluation (TRECVID\u201915) . NIST. Paul Over, George Awad, Martial Michel, Jonathan Fiscus, Greg Sanders, Wessel Kraaij, Alan F. Smeaton, Georges Quenot, and Roeland Ordelman. 2015. TRECVID 2015\u2014An overview of the goals, tasks, data, evaluation mechanisms and metrics. In Proceedings of the Annual TREC Video Retrieval Evaluation (TRECVID\u201915). NIST."},{"key":"e_1_2_1_38_1","first-page":"40","article-title":"Relevance feedback in content based image retrieval: A review","volume":"10","author":"Patil Pushpa B.","year":"2011","unstructured":"Pushpa B. Patil and Manesh B. Kokare . 2011 . Relevance feedback in content based image retrieval: A review . J. Appl. Comput. Sci. Math. 10 , 10 (2011), pp. 40 -- 47 . Pushpa B. Patil and Manesh B. Kokare. 2011. Relevance feedback in content based image retrieval: A review.J. Appl. Comput. Sci. Math. 10, 10 (2011), pp. 40--47.","journal-title":"J. Appl. Comput. Sci. Math."},{"key":"e_1_2_1_39_1","volume-title":"Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP\u201914)","volume":"14","author":"Pennington Jeffrey","unstructured":"Jeffrey Pennington , Richard Socher , and Christopher D. Manning . 2014. Glove: Global vectors for word representation . In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP\u201914) , Vol. 14 . 1532--1543. Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP\u201914), Vol. 14. 1532--1543."},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1145\/1178677.1178722"},{"key":"e_1_2_1_41_1","volume-title":"Proceedings of the 9th International Conference on Computational Semantics. Association for Computational Linguistics, 385--389","author":"Spagnola Steve","year":"2011","unstructured":"Steve Spagnola and Carl Lagoze . 2011 . Edge dependent pathway scoring for calculating semantic similarity in ConceptNet . In Proceedings of the 9th International Conference on Computational Semantics. Association for Computational Linguistics, 385--389 . Steve Spagnola and Carl Lagoze. 2011. Edge dependent pathway scoring for calculating semantic similarity in ConceptNet. In Proceedings of the 9th International Conference on Computational Semantics. Association for Computational Linguistics, 385--389."},{"key":"e_1_2_1_42_1","volume-title":"The new data and new challenges in multimedia research. arXiv preprint arXiv:1503.01817","author":"Thomee Bart","year":"2015","unstructured":"Bart Thomee , David A. Shamma , Gerald Friedland , Benjamin Elizalde , Karl Ni , Douglas Poland , Damian Borth , and Li-Jia Li. 2015. The new data and new challenges in multimedia research. arXiv preprint arXiv:1503.01817 ( 2015 ). Bart Thomee, David A. Shamma, Gerald Friedland, Benjamin Elizalde, Karl Ni, Douglas Poland, Damian Borth, and Li-Jia Li. 2015. The new data and new challenges in multimedia research. arXiv preprint arXiv:1503.01817 (2015)."},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.510"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.imavis.2015.09.005"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2014.341"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1145\/2671188.2749413"},{"key":"e_1_2_1_47_1","volume-title":"Proceedings of the 29th AAAI Conference on Artificial Intelligence.","author":"Yan Yan","year":"2015","unstructured":"Yan Yan , Yi Yang , Haoquan Shen , Deyu Meng , Gaowen Liu , Alex Hauptmann , and Nicu Sebe . 2015 . Complex event detection via event oriented dictionary learning . In Proceedings of the 29th AAAI Conference on Artificial Intelligence. Yan Yan, Yi Yang, Haoquan Shen, Deyu Meng, Gaowen Liu, Alex Hauptmann, and Nicu Sebe. 2015. Complex event detection via event oriented dictionary learning. In Proceedings of the 29th AAAI Conference on Artificial Intelligence."},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1145\/2733373.2806221"},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1145\/2647868.2654997"},{"key":"e_1_2_1_50_1","unstructured":"Bolei Zhou Agata Lapedriza Jianxiong Xiao Antonio Torralba and Aude Oliva. 2014. Learning deep features for scene recognition using places database. In Advances in Neural Information Processing Systems. 487--495. Bolei Zhou Agata Lapedriza Jianxiong Xiao Antonio Torralba and Aude Oliva. 2014. Learning deep features for scene recognition using places database. In Advances in Neural Information Processing Systems. 487--495."}],"container-title":["ACM Transactions on Multimedia Computing, Communications, and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3131288","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,12,31]],"date-time":"2022-12-31T21:46:48Z","timestamp":1672523208000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3131288"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2017,10,4]]},"references-count":50,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2017,11,30]]}},"alternative-id":["10.1145\/3131288"],"URL":"https:\/\/doi.org\/10.1145\/3131288","relation":{},"ISSN":["1551-6857","1551-6865"],"issn-type":[{"value":"1551-6857","type":"print"},{"value":"1551-6865","type":"electronic"}],"subject":[],"published":{"date-parts":[[2017,10,4]]},"assertion":[{"value":"2016-07-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2017-07-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2017-10-04","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}