{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2023,3,11]],"date-time":"2023-03-11T05:44:59Z","timestamp":1678513499684},"reference-count":22,"publisher":"Association for Computing Machinery (ACM)","issue":"3","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Asian Low-Resour. Lang. Inf. Process."],"published-print":{"date-parts":[[2023,3,31]]},"abstract":"Combining different input modalities beyond text is a key challenge for natural language processing. Previous work has been inconclusive as to the true utility of images as a supplementary information source for text classification tasks, motivating this large-scale human study of labelling performance given text-only, images-only, or both text and images. To this end, we create a new dataset accompanied with a novel annotation method\u2014Japanese Entity Labeling with Dynamic Annotation\u2014to deepen our understanding of the effectiveness of images for multi-modal text classification. By performing careful comparative analysis of human performance and the performance of state-of-the-art multi-modal text classification models, we gain valuable insights into differences between human and model performance, and the conditions under which images are beneficial for text classification.<\/jats:p>","DOI":"10.1145\/3565572","type":"journal-article","created":{"date-parts":[[2022,10,7]],"date-time":"2022-10-07T13:17:02Z","timestamp":1665148622000},"page":"1-19","update-policy":"http:\/\/dx.doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["On the Effectiveness of Images in Multi-modal Text Classification: An Annotation Study"],"prefix":"10.1145","volume":"22","author":[{"ORCID":"http:\/\/orcid.org\/0000-0002-5454-739X","authenticated-orcid":false,"given":"Chunpeng","family":"Ma","sequence":"first","affiliation":[{"name":"Fujitsu Limited, Kawasaki, Kanagawa, Japan"}]},{"ORCID":"http:\/\/orcid.org\/0000-0002-5816-3465","authenticated-orcid":false,"given":"Aili","family":"Shen","sequence":"additional","affiliation":[{"name":"Amazon, Australia"}]},{"ORCID":"http:\/\/orcid.org\/0000-0002-0733-6960","authenticated-orcid":false,"given":"Hiyori","family":"Yoshikawa","sequence":"additional","affiliation":[{"name":"Fujitsu Limited, Kawasaki, Kanagawa, Japan"}]},{"ORCID":"http:\/\/orcid.org\/0000-0002-6558-4911","authenticated-orcid":false,"given":"Tomoya","family":"Iwakura","sequence":"additional","affiliation":[{"name":"Fujitsu Limited, Kawasaki, Kanagawa, Japan"}]},{"ORCID":"http:\/\/orcid.org\/0000-0002-8529-7792","authenticated-orcid":false,"given":"Daniel","family":"Beck","sequence":"additional","affiliation":[{"name":"The University of Melbourne, Australia"}]},{"ORCID":"http:\/\/orcid.org\/0000-0002-4445-1386","authenticated-orcid":false,"given":"Timothy","family":"Baldwin","sequence":"additional","affiliation":[{"name":"The University of Melbourne, Australia and Mohamed bin Zayed University of Artificial Intelligence, United Arab Emirates"}]}],"member":"320","published-online":{"date-parts":[[2023,3,10]]},"reference":[{"key":"e_1_3_3_2_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00636"},{"key":"e_1_3_3_3_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/N19-1422"},{"key":"e_1_3_3_4_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58577-8_7"},{"key":"e_1_3_3_5_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/N19-1423"},{"key":"e_1_3_3_6_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/W16-3210"},{"key":"e_1_3_3_7_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D19-1107"},{"key":"e_1_3_3_8_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.670"},{"key":"e_1_3_3_9_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_3_10_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-016-0981-7"},{"key":"e_1_3_3_11_2","doi-asserted-by":"publisher","DOI":"10.1145\/3474085.3475190"},{"key":"e_1_3_3_12_2","first-page":"2181","article-title":"Learning entity and relation embeddings for knowledge graph completion","author":"Lin Yankai","year":"2015","unstructured":"Yankai Lin, Zhiyuan Liu, Maosong Sun, Yang Liu, and Xuan Zhu. 2015. Learning entity and relation embeddings for knowledge graph completion. In Proceedings of the 29th AAAI Conference on Artificial Intelligence. 2181\u20132187.","journal-title":"Proceedings of the 29th AAAI Conference on Artificial Intelligence"},{"key":"e_1_3_3_13_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v34i03.5681"},{"key":"e_1_3_3_14_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.eacl-main.4"},{"key":"e_1_3_3_15_2","first-page":"91","volume-title":"Advances in Neural Information Processing Systems","author":"Ren Shaoqing","year":"2015","unstructured":"Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster R-CNN: Towards real-time object detection with region proposal networks. In Advances in Neural Information Processing Systems, Vol. 28. 91\u201399."},{"key":"e_1_3_3_16_2","volume-title":"Proceedings of Automated Knowledge Base Construction (AKBC\u201918)","author":"Sekine Satoshi","year":"2018","unstructured":"Satoshi Sekine, Akio Kobayashi, and Kouta Nakayama. 2018. Shinra: Structuring wikipedia by collaborative contribution. In Proceedings of Automated Knowledge Base Construction (AKBC\u201918)."},{"key":"e_1_3_3_17_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P18-1238"},{"key":"e_1_3_3_18_2","doi-asserted-by":"publisher","DOI":"10.1613\/jair.1.11647"},{"key":"e_1_3_3_19_2","volume-title":"Proceedings of the International Conference on Learning Representations","author":"Su Weijie","year":"2020","unstructured":"Weijie Su, Xizhou Zhu, Yue Cao, Bin Li, Lewei Lu, Furu Wei, and Jifeng Dai. 2020. VL-BERT: Pre-training of generic visual-linguistic representations. In Proceedings of the International Conference on Learning Representations."},{"key":"e_1_3_3_20_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.308"},{"key":"e_1_3_3_21_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D19-1514"},{"key":"e_1_3_3_22_2","first-page":"201","volume-title":"Proceedings of the 15th NTCIR Conference on Evaluation of Information Access Technologies","author":"Yoshikawa Hiyori","year":"2020","unstructured":"Hiyori Yoshikawa, Chunpeng Ma, Aili Shen, Qian Sun, Chenbang Huang, Guillaume Pelat, Akiva Miura, Daniel Beck, Timothy Baldwin, and Tomoya Iwakura. 2020. UOM-FJ at the NTCIR-15 SHINRA2020-ML task. In Proceedings of the 15th NTCIR Conference on Evaluation of Information Access Technologies. 201\u2013207."},{"key":"e_1_3_3_23_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00688"}],"container-title":["ACM Transactions on Asian and Low-Resource Language Information Processing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3565572","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,3,10]],"date-time":"2023-03-10T13:33:41Z","timestamp":1678455221000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3565572"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,3,10]]},"references-count":22,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2023,3,31]]}},"alternative-id":["10.1145\/3565572"],"URL":"https:\/\/doi.org\/10.1145\/3565572","relation":{},"ISSN":["2375-4699","2375-4702"],"issn-type":[{"value":"2375-4699","type":"print"},{"value":"2375-4702","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,3,10]]},"assertion":[{"value":"2021-12-10","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-09-18","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-03-10","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}