{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,10,30]],"date-time":"2024-10-30T21:43:25Z","timestamp":1730324605392,"version":"3.28.0"},"publisher-location":"New York, NY, USA","reference-count":48,"publisher":"ACM","funder":[{"name":"NSFC","award":["No. 62072399 No. U19B2042 No. 61402403"]},{"name":"Chinese Knowledge Center for Engineering Sciences and Technology"},{"name":"Fundamental Research Funds for the Central Universities","award":["No. 226-2022-00070"]},{"name":"National Key R&D Program of China","award":["No. 2018AAA0101900"]},{"name":"MoE Engineering Research Center of Digital Library"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2022,10,10]]},"DOI":"10.1145\/3503161.3548406","type":"proceedings-article","created":{"date-parts":[[2022,10,10]],"date-time":"2022-10-10T11:43:01Z","timestamp":1665402181000},"page":"4877-4886","update-policy":"http:\/\/dx.doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":5,"title":["mmLayout: Multi-grained MultiModal Transformer for Document Understanding"],"prefix":"10.1145","author":[{"given":"Wenjin","family":"Wang","sequence":"first","affiliation":[{"name":"Zhejiang University, Hangzhou, China"}]},{"given":"Zhengjie","family":"Huang","sequence":"additional","affiliation":[{"name":"Baidu Inc., Shenzhen, China"}]},{"given":"Bin","family":"Luo","sequence":"additional","affiliation":[{"name":"Baidu Inc., Shenzhen, China"}]},{"given":"Qianglong","family":"Chen","sequence":"additional","affiliation":[{"name":"Zhejiang University, Hangzhou, China"}]},{"given":"Qiming","family":"Peng","sequence":"additional","affiliation":[{"name":"Baidu Inc., Shenzhen, China"}]},{"given":"Yinxu","family":"Pan","sequence":"additional","affiliation":[{"name":"Baidu Inc., Beijing, China"}]},{"given":"Weichong","family":"Yin","sequence":"additional","affiliation":[{"name":"Baidu Inc., Beijing, China"}]},{"given":"Shikun","family":"Feng","sequence":"additional","affiliation":[{"name":"Baidu Inc., Shenzhen, China"}]},{"given":"Yu","family":"Sun","sequence":"additional","affiliation":[{"name":"Baidu Inc., Beijing, China"}]},{"given":"Dianhai","family":"Yu","sequence":"additional","affiliation":[{"name":"Baidu Inc., Beijing, China"}]},{"given":"Yin","family":"Zhang","sequence":"additional","affiliation":[{"name":"Zhejiang University, Hangzhou, China"}]}],"member":"320","published-online":{"date-parts":[[2022,10,10]]},"reference":[{"key":"e_1_3_2_2_1_1","volume-title":"Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering. In CVPR 2018","author":"Anderson Peter","year":"2018","unstructured":"Peter Anderson , Xiaodong He , Chris Buehler , Damien Teney , Mark Johnson , Stephen Gould , and Lei Zhang . 2018 . Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering. In CVPR 2018 . IEEE, Salt Lake City, UT, 6077--6086. https:\/\/doi.org\/10.1109\/CVPR. 2018.00636 Peter Anderson, Xiaodong He, Chris Buehler, Damien Teney, Mark Johnson, Stephen Gould, and Lei Zhang. 2018. Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering. In CVPR 2018. IEEE, Salt Lake City, UT, 6077--6086. https:\/\/doi.org\/10.1109\/CVPR.2018.00636"},{"key":"e_1_3_2_2_2_1","first-page":"993","article-title":"DocFormer","volume":"2021","author":"Appalaraju Srikar","year":"2021","unstructured":"Srikar Appalaraju , Bhavan Jasani , Bhargava Urala Kota , Yusheng Xie , and R. Manmatha . 2021 . DocFormer : End-to-End Transformer for Document Understanding. In ICCV 2021. 993 -- 1003 . Srikar Appalaraju, Bhavan Jasani, Bhargava Urala Kota, Yusheng Xie, and R. Manmatha. 2021. DocFormer: End-to-End Transformer for Document Understanding. In ICCV 2021. 993--1003.","journal-title":"End-to-End Transformer for Document Understanding. In ICCV"},{"key":"e_1_3_2_2_3_1","volume-title":"Proceedings of the 37th International Conference on Machine Learning. PMLR, 642--652","author":"Bao Hangbo","year":"2020","unstructured":"Hangbo Bao , Li Dong , Furu Wei , Wenhui Wang , Nan Yang , Xiaodong Liu , Yu Wang , Jianfeng Gao , Songhao Piao , Ming Zhou , and Hsiao-Wuen Hon . 2020 . UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training . In Proceedings of the 37th International Conference on Machine Learning. PMLR, 642--652 . Hangbo Bao, Li Dong, Furu Wei, Wenhui Wang, Nan Yang, Xiaodong Liu, Yu Wang, Jianfeng Gao, Songhao Piao, Ming Zhou, and Hsiao-Wuen Hon. 2020. UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training. In Proceedings of the 37th International Conference on Machine Learning. PMLR, 642--652."},{"key":"e_1_3_2_2_4_1","volume-title":"Graph Transformer for Graph-to-Sequence Learning. In AAAI","author":"Cai Deng","year":"2019","unstructured":"Deng Cai and Wai Lam . 2019 . Graph Transformer for Graph-to-Sequence Learning. In AAAI 2020. Deng Cai and Wai Lam. 2019. Graph Transformer for Graph-to-Sequence Learning. In AAAI 2020."},{"key":"e_1_3_2_2_5_1","volume-title":"Named Entity Recognition and Relation Extraction with Graph Neural Networks in Semi Structured Documents. In ICPR 2020","author":"Carbonell Manuel","year":"2021","unstructured":"Manuel Carbonell , Pau Riba , Mauricio Villegas , Alicia Fornes , and Josep Llados . 2021 . Named Entity Recognition and Relation Extraction with Graph Neural Networks in Semi Structured Documents. In ICPR 2020 . IEEE, Milan, Italy, 9622-- 9627. https:\/\/doi.org\/10.1109\/ICPR48806. 2021.9412669 Manuel Carbonell, Pau Riba, Mauricio Villegas, Alicia Fornes, and Josep Llados. 2021. Named Entity Recognition and Relation Extraction with Graph Neural Networks in Semi Structured Documents. In ICPR 2020. IEEE, Milan, Italy, 9622-- 9627. https:\/\/doi.org\/10.1109\/ICPR48806.2021.9412669"},{"key":"e_1_3_2_2_6_1","volume-title":"Document AI: Benchmarks, Models and Applications. arXiv:2111.08609 [cs] (Nov.","author":"Cui Lei","year":"2021","unstructured":"Lei Cui , Yiheng Xu , Tengchao Lv , and FuruWei. 2021 . Document AI: Benchmarks, Models and Applications. arXiv:2111.08609 [cs] (Nov. 2021). arXiv:2111.08609 [cs] Lei Cui, Yiheng Xu, Tengchao Lv, and FuruWei. 2021. Document AI: Benchmarks, Models and Applications. arXiv:2111.08609 [cs] (Nov. 2021). arXiv:2111.08609 [cs]"},{"volume-title":"NeurIPS 2019 Workshop.","author":"Timo","key":"e_1_3_2_2_7_1","unstructured":"Timo I. Denk and Christian Reisswig. 2019. BERTgrid: Contextualized Embedding for 2D Document Representation and Understanding . In NeurIPS 2019 Workshop. Timo I. Denk and Christian Reisswig. 2019. BERTgrid: Contextualized Embedding for 2D Document Representation and Understanding. In NeurIPS 2019 Workshop."},{"key":"e_1_3_2_2_8_1","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies","volume":"1","author":"Devlin Jacob","year":"2019","unstructured":"Jacob Devlin , Ming-Wei Chang , Kenton Lee , and Kristina Toutanova . 2019 . BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding . In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies , Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, 4171--4186. https:\/\/doi.org\/10. 18653\/v1\/N19--1423 Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, 4171--4186. https:\/\/doi.org\/10.18653\/v1\/N19--1423"},{"key":"e_1_3_2_2_9_1","volume-title":"Int. Conf. Knowledge Discovery and Data Mining","volume":"240","author":"Ester Martin","year":"1996","unstructured":"Martin Ester , Hans-Peter Kriegel , J\u00f6rg Sander , and Xiaowei Xu . 1996 . Density based spatial clustering of applications with noise . In Int. Conf. Knowledge Discovery and Data Mining , Vol. 240 . 6. Martin Ester, Hans-Peter Kriegel, J\u00f6rg Sander, and Xiaowei Xu. 1996. Density based spatial clustering of applications with noise. In Int. Conf. Knowledge Discovery and Data Mining, Vol. 240. 6."},{"key":"e_1_3_2_2_10_1","volume-title":"LAMBERT: Layout-Aware Language Modeling for Information Extraction. In ICDAR 2021 (Lecture Notes in Computer Science), Josep Llad\u00f3s","author":"Garncarek Lukasz","year":"2021","unstructured":"Lukasz Garncarek , Rafa Powalski , Tomasz Stanisawek , Bartosz Topolski , Piotr Halama , Micha Turski , and Filip Graliski . 2021 . LAMBERT: Layout-Aware Language Modeling for Information Extraction. In ICDAR 2021 (Lecture Notes in Computer Science), Josep Llad\u00f3s , Daniel Lopresti, and Seiichi Uchida (Eds.). Springer International Publishing , Cham , 532--547. https:\/\/doi.org\/10.1007\/978-3-030-86549-8_34 Lukasz Garncarek, Rafa Powalski, Tomasz Stanisawek, Bartosz Topolski, Piotr Halama, Micha Turski, and Filip Graliski. 2021. LAMBERT: Layout-Aware Language Modeling for Information Extraction. In ICDAR 2021 (Lecture Notes in Computer Science), Josep Llad\u00f3s, Daniel Lopresti, and Seiichi Uchida (Eds.). Springer International Publishing, Cham, 532--547. https:\/\/doi.org\/10.1007\/978-3-030-86549-8_34"},{"key":"e_1_3_2_2_11_1","first-page":"10","article-title":"XYLayoutLM","volume":"2022","author":"Gu Zhangxuan","year":"2022","unstructured":"Zhangxuan Gu , Changhua Meng , Ke Wang , Jun Lan , Weiqiang Wang , Ming Gu , and Liqing Zhang . 2022 . XYLayoutLM : Towards Layout-Aware Multimodal Networks For Visually-Rich Document Understanding. In CVPR 2022. 10 . Zhangxuan Gu, Changhua Meng, Ke Wang, Jun Lan, Weiqiang Wang, Ming Gu, and Liqing Zhang. 2022. XYLayoutLM: Towards Layout-Aware Multimodal Networks For Visually-Rich Document Understanding. In CVPR 2022. 10.","journal-title":"Towards Layout-Aware Multimodal Networks For Visually-Rich Document Understanding. In CVPR"},{"key":"e_1_3_2_2_12_1","volume-title":"LICHEE: Improving Language Model Pre-training with Multi-grained Tokenization. In ACL 2021 Findings. Association for Computational Linguistics, Online, 1383--1392","author":"Guo Weidong","year":"2021","unstructured":"Weidong Guo , Mingjun Zhao , Lusheng Zhang , Di Niu , Jinwen Luo , Zhenhua Liu , Zhenyang Li , and Jianbo Tang . 2021 . LICHEE: Improving Language Model Pre-training with Multi-grained Tokenization. In ACL 2021 Findings. Association for Computational Linguistics, Online, 1383--1392 . https:\/\/doi.org\/10.18653\/v1\/2021.findings-acl.119 Weidong Guo, Mingjun Zhao, Lusheng Zhang, Di Niu, Jinwen Luo, Zhenhua Liu, Zhenyang Li, and Jianbo Tang. 2021. LICHEE: Improving Language Model Pre-training with Multi-grained Tokenization. In ACL 2021 Findings. Association for Computational Linguistics, Online, 1383--1392. https:\/\/doi.org\/10.18653\/v1\/2021.findings-acl.119"},{"key":"e_1_3_2_2_13_1","volume-title":"NeurIPS","author":"Han Kai","year":"2021","unstructured":"Kai Han , An Xiao , Enhua Wu , Jianyuan Guo , Chunjing Xu , and Yunhe Wang . 2021. Transformer in Transformer . In NeurIPS 2021 . Kai Han, An Xiao, Enhua Wu, Jianyuan Guo, Chunjing Xu, and Yunhe Wang. 2021. Transformer in Transformer. In NeurIPS 2021."},{"key":"e_1_3_2_2_14_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v36i10.21322"},{"key":"e_1_3_2_2_15_1","volume-title":"ICDAR 2019 Competition on Scanned Receipt OCR and Information Extraction. In ICDAR 2019. 1516--1520","author":"Huang Zheng","year":"2019","unstructured":"Zheng Huang , Kai Chen , Jianhua He , Xiang Bai , Dimosthenis Karatzas , Shijian Lu , and C. V. Jawahar . 2019 . ICDAR 2019 Competition on Scanned Receipt OCR and Information Extraction. In ICDAR 2019. 1516--1520 . https:\/\/doi.org\/10.1109\/ICDAR. 2019 .00244 Zheng Huang, Kai Chen, Jianhua He, Xiang Bai, Dimosthenis Karatzas, Shijian Lu, and C. V. Jawahar. 2019. ICDAR 2019 Competition on Scanned Receipt OCR and Information Extraction. In ICDAR 2019. 1516--1520. https:\/\/doi.org\/10.1109\/ICDAR.2019.00244"},{"key":"e_1_3_2_2_16_1","volume-title":"Spatial Dependency Parsing for Semi-Structured Document Information Extraction. In ACL 2021 Findings.","author":"Hwang Wonseok","year":"2021","unstructured":"Wonseok Hwang , Jinyeong Yim , Seunghyun Park , Sohee Yang , and Minjoon Seo . 2021 . Spatial Dependency Parsing for Semi-Structured Document Information Extraction. In ACL 2021 Findings. Wonseok Hwang, Jinyeong Yim, Seunghyun Park, Sohee Yang, and Minjoon Seo. 2021. Spatial Dependency Parsing for Semi-Structured Document Information Extraction. In ACL 2021 Findings."},{"key":"e_1_3_2_2_17_1","volume-title":"FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents. In ICDAR 19 OST Workshop.","author":"Jaume Guillaume","year":"2019","unstructured":"Guillaume Jaume , Hazim Kemal Ekenel , and Jean-Philippe Thiran . 2019 . FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents. In ICDAR 19 OST Workshop. Guillaume Jaume, Hazim Kemal Ekenel, and Jean-Philippe Thiran. 2019. FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents. In ICDAR 19 OST Workshop."},{"key":"e_1_3_2_2_18_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D18-1476"},{"key":"e_1_3_2_2_19_1","volume-title":"Present and Future. In The Fifth International Conference on the Applications of Digital Information and Web Technologies (ICADIWT 2014","author":"Khan Kamran","year":"2014","unstructured":"Kamran Khan , Saif Ur Rehman , Kamran Aziz , Simon Fong , and S. Sarasvady . 2014. DBSCAN: Past , Present and Future. In The Fifth International Conference on the Applications of Digital Information and Web Technologies (ICADIWT 2014 ). 232--238. https:\/\/doi.org\/10.1109\/ICADIWT. 2014 .6814687 Kamran Khan, Saif Ur Rehman, Kamran Aziz, Simon Fong, and S. Sarasvady. 2014. DBSCAN: Past, Present and Future. In The Fifth International Conference on the Applications of Digital Information and Web Technologies (ICADIWT 2014). 232--238. https:\/\/doi.org\/10.1109\/ICADIWT.2014.6814687"},{"key":"e_1_3_2_2_20_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2022.acl-long.260"},{"key":"e_1_3_2_2_21_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01225-0_13"},{"key":"e_1_3_2_2_22_1","volume-title":"StructuralLM: Structural Pre-training for Form Understanding. In ACL","author":"Li Chenliang","year":"2021","unstructured":"Chenliang Li , Bin Bi , Ming Yan , Wei Wang , Songfang Huang , Fei Huang , and Luo Si . 2021 . StructuralLM: Structural Pre-training for Form Understanding. In ACL 2021. Chenliang Li, Bin Bi, Ming Yan, Wei Wang, Songfang Huang, Fei Huang, and Luo Si. 2021. StructuralLM: Structural Pre-training for Form Understanding. In ACL 2021."},{"key":"e_1_3_2_2_23_1","first-page":"5652","article-title":"SelfDoc","volume":"2021","author":"Li Peizhao","year":"2021","unstructured":"Peizhao Li , Jiuxiang Gu , Jason Kuen , Vlad I. Morariu , Handong Zhao , Rajiv Jain , Varun Manjunatha , and Hongfu Liu . 2021 . SelfDoc : Self-Supervised Document Representation Learning. In CVPR 2021. 5652 -- 5660 . Peizhao Li, Jiuxiang Gu, Jason Kuen, Vlad I. Morariu, Handong Zhao, Rajiv Jain, Varun Manjunatha, and Hongfu Liu. 2021. SelfDoc: Self-Supervised Document Representation Learning. In CVPR 2021. 5652--5660.","journal-title":"Self-Supervised Document Representation Learning. In CVPR"},{"key":"e_1_3_2_2_24_1","volume-title":"StrucTexT: Structured Text Understanding with Multi-Modal Transformers. In ACM MM","author":"Li Yulin","year":"2021","unstructured":"Yulin Li , Yuxi Qian , Yuchen Yu , Xiameng Qin , Chengquan Zhang , Yan Liu , Kun Yao , Junyu Han , Jingtuo Liu , and Errui Ding . 2021 . StrucTexT: Structured Text Understanding with Multi-Modal Transformers. In ACM MM 2021. Yulin Li, Yuxi Qian, Yuchen Yu, Xiameng Qin, Chengquan Zhang, Yan Liu, Kun Yao, Junyu Han, Jingtuo Liu, and Errui Ding. 2021. StrucTexT: Structured Text Understanding with Multi-Modal Transformers. In ACM MM 2021."},{"key":"e_1_3_2_2_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.106"},{"key":"e_1_3_2_2_26_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-86549-8_35"},{"key":"e_1_3_2_2_27_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/N19-2005"},{"key":"e_1_3_2_2_28_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.acl-main.580"},{"key":"e_1_3_2_2_29_1","volume-title":"DocVQA: A Dataset for VQA on Document Images. In WACV 2021","author":"Mathew Minesh","year":"2021","unstructured":"Minesh Mathew , Dimosthenis Karatzas , and C. V. Jawahar . 2021 . DocVQA: A Dataset for VQA on Document Images. In WACV 2021 . IEEE, Waikoloa, HI, USA, 2199--2208. https:\/\/doi.org\/10.1109\/WACV48630. 2021 .00225 Minesh Mathew, Dimosthenis Karatzas, and C. V. Jawahar. 2021. DocVQA: A Dataset for VQA on Document Images. In WACV 2021. IEEE, Waikoloa, HI, USA, 2199--2208. https:\/\/doi.org\/10.1109\/WACV48630.2021.00225"},{"key":"e_1_3_2_2_30_1","first-page":"4","article-title":"CORD","volume":"2019","author":"Park Seunghyun","year":"2019","unstructured":"Seunghyun Park , Seung Shin , Bado Lee , Junyeop Lee , Jaeheung Surh , Minjoon Seo , and Hwalsuk Lee . 2019 . CORD : A Consolidated Receipt Dataset for Post-OCR Parsing. In NeurIPS 2019. 4 . Seunghyun Park, Seung Shin, Bado Lee, Junyeop Lee, Jaeheung Surh, Minjoon Seo, and Hwalsuk Lee. 2019. CORD: A Consolidated Receipt Dataset for Post-OCR Parsing. In NeurIPS 2019. 4.","journal-title":"A Consolidated Receipt Dataset for Post-OCR Parsing. In NeurIPS"},{"key":"e_1_3_2_2_31_1","volume-title":"Going Full-TILT Boogie on Document Understanding with Text-Image-Layout Transformer. In ICDAR","author":"Powalski Rafa","year":"2021","unstructured":"Rafa Powalski , Lukasz Borchmann , Dawid Jurkiewicz , Tomasz Dwojak , Micha Pietruszka , and Gabriela Paka . 2021 . Going Full-TILT Boogie on Document Understanding with Text-Image-Layout Transformer. In ICDAR 2021. Rafa Powalski, Lukasz Borchmann, Dawid Jurkiewicz, Tomasz Dwojak, Micha Pietruszka, and Gabriela Paka. 2021. Going Full-TILT Boogie on Document Understanding with Text-Image-Layout Transformer. In ICDAR 2021."},{"key":"e_1_3_2_2_32_1","volume-title":"GraphIE: A Graph-Based Framework for Information Extraction. In NAACLHLT 2019","author":"Qian Yujie","year":"2019","unstructured":"Yujie Qian , Enrico Santus , Zhijing Jin , Jiang Guo , and Regina Barzilay . 2019 . GraphIE: A Graph-Based Framework for Information Extraction. In NAACLHLT 2019 . Association for Computational Linguistics, Minneapolis, Minnesota, 751--761. https:\/\/doi.org\/10. 18653\/v1\/N19--1082 Yujie Qian, Enrico Santus, Zhijing Jin, Jiang Guo, and Regina Barzilay. 2019. GraphIE: A Graph-Based Framework for Information Extraction. In NAACLHLT 2019. Association for Computational Linguistics, Minneapolis, Minnesota, 751--761. https:\/\/doi.org\/10.18653\/v1\/N19--1082"},{"key":"e_1_3_2_2_33_1","volume-title":"Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)","author":"Reimers Nils","year":"1865","unstructured":"Nils Reimers and Iryna Gurevych . 2019. Sentence-BERT: Sentence Embeddings Using Siamese BERT-Networks . In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) . Association for Computational Linguistics , Hong Kong , China, 3982--3992. https:\/\/doi.org\/10. 1865 3\/v1\/D19-1410 Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings Using Siamese BERT-Networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong, China, 3982--3992. https:\/\/doi.org\/10.18653\/v1\/D19-1410"},{"key":"e_1_3_2_2_34_1","doi-asserted-by":"publisher","DOI":"10.24963\/ijcai.2019\/466"},{"key":"e_1_3_2_2_35_1","volume-title":"Towards Robust Visual Information Extraction in Real World: New Dataset and Novel Solution. In AAAI","volume":"35","author":"Wang Jiapeng","year":"2021","unstructured":"Jiapeng Wang , Chongyu Liu , Lianwen Jin , Guozhi Tang , Jiaxin Zhang , Shuaitao Zhang , Qianying Wang , Yaqiang Wu , and Mingxiang Cai . 2021 . Towards Robust Visual Information Extraction in Real World: New Dataset and Novel Solution. In AAAI 2021, Vol. 35 . 2738--2745. Jiapeng Wang, Chongyu Liu, Lianwen Jin, Guozhi Tang, Jiaxin Zhang, Shuaitao Zhang, Qianying Wang, Yaqiang Wu, and Mingxiang Cai. 2021. Towards Robust Visual Information Extraction in Real World: New Dataset and Novel Solution. In AAAI 2021, Vol. 35. 2738--2745."},{"key":"e_1_3_2_2_36_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.findings-emnlp.80"},{"key":"e_1_3_2_2_37_1","volume-title":"Robust Layout-aware IE for Visually Rich Documents with Pre-trained Language Models. In ACM SIGIR 2020 (SIGIR '20)","author":"Wei Mengxi","year":"2020","unstructured":"Mengxi Wei , Y Ifan He , and Qiong Zhang . 2020 . Robust Layout-aware IE for Visually Rich Documents with Pre-trained Language Models. In ACM SIGIR 2020 (SIGIR '20) . Association for Computing Machinery, New York, NY, USA, 2367--2376. https:\/\/doi.org\/10.1145\/3397271.3401442 Mengxi Wei, YIfan He, and Qiong Zhang. 2020. Robust Layout-aware IE for Visually Rich Documents with Pre-trained Language Models. In ACM SIGIR 2020 (SIGIR '20). Association for Computing Machinery, New York, NY, USA, 2367--2376. https:\/\/doi.org\/10.1145\/3397271.3401442"},{"key":"e_1_3_2_2_38_1","volume-title":"Aggregated Residual Transformations for Deep Neural Networks. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 5987--5995","author":"Xie Saining","year":"2017","unstructured":"Saining Xie , Ross Girshick , Piotr Doll\u00e1r , Zhuowen Tu , and Kaiming He . 2017 . Aggregated Residual Transformations for Deep Neural Networks. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 5987--5995 . https:\/\/doi.org\/10.1109\/CVPR.2017.634 Saining Xie, Ross Girshick, Piotr Doll\u00e1r, Zhuowen Tu, and Kaiming He. 2017. Aggregated Residual Transformations for Deep Neural Networks. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 5987--5995. https:\/\/doi.org\/10.1109\/CVPR.2017.634"},{"key":"e_1_3_2_2_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/3394486.3403172"},{"key":"e_1_3_2_2_40_1","volume-title":"LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding. arXiv:2104.08836 [cs] (April","author":"Xu Yiheng","year":"2021","unstructured":"Yiheng Xu , Tengchao Lv , Lei Cui , Guoxin Wang , Yijuan Lu , Dinei Florencio , Cha Zhang , and Furu Wei . 2021. LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding. arXiv:2104.08836 [cs] (April 2021 ). arXiv:2104.08836 [cs] Yiheng Xu, Tengchao Lv, Lei Cui, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, and Furu Wei. 2021. LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding. arXiv:2104.08836 [cs] (April 2021). arXiv:2104.08836 [cs]"},{"key":"e_1_3_2_2_41_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.acl-long.201"},{"key":"e_1_3_2_2_42_1","volume-title":"CVPR 2017","author":"Yang Xiao","year":"2017","unstructured":"Xiao Yang , Ersin Yumer , Paul Asente , Mike Kraley , Daniel Kifer , and C. Lee Giles . 2017. Learning to Extract Semantic Structure from Documents Using Multimodal Fully Convolutional Neural Networks . In CVPR 2017 . 4342--4351. https:\/\/doi.org\/10.1109\/CVPR. 2017 .462 Xiao Yang, Ersin Yumer, Paul Asente, Mike Kraley, Daniel Kifer, and C. Lee Giles. 2017. Learning to Extract Semantic Structure from Documents Using Multimodal Fully Convolutional Neural Networks. In CVPR 2017. 4342--4351. https:\/\/doi.org\/10.1109\/CVPR.2017.462"},{"key":"e_1_3_2_2_43_1","volume-title":"NeurIPS","author":"Ying Chengxuan","year":"2021","unstructured":"Chengxuan Ying , Tianle Cai , Shengjie Luo , Shuxin Zheng , Guolin Ke , Di He , Yanming Shen , and Tie-Yan Liu . 2021. Do Transformers Really Perform Bad for Graph Representation . In NeurIPS 2021 . Chengxuan Ying, Tianle Cai, Shengjie Luo, Shuxin Zheng, Guolin Ke, Di He, Yanming Shen, and Tie-Yan Liu. 2021. Do Transformers Really Perform Bad for Graph Representation. In NeurIPS 2021."},{"key":"e_1_3_2_2_44_1","volume-title":"ICPR","author":"Yu Wenwen","year":"2020","unstructured":"Wenwen Yu , Ning Lu , Xianbiao Qi , Ping Gong , and Rong Xiao . 2020 . PICK: Processing Key Information Extraction from Documents Using Improved Graph Learning-Convolutional Networks . In ICPR 2020. Wenwen Yu, Ning Lu, Xianbiao Qi, Ping Gong, and Rong Xiao. 2020. PICK: Processing Key Information Extraction from Documents Using Improved Graph Learning-Convolutional Networks. In ICPR 2020."},{"key":"e_1_3_2_2_45_1","volume-title":"TRIE: End-to-End Text Reading and Information Extraction for Document Understanding. In ACM MM","author":"Zhang Peng","year":"2020","unstructured":"Peng Zhang , Yunlu Xu , Zhanzhan Cheng , Shiliang Pu , Jing Lu , Liang Qiao , Yi Niu , and FeiWu. 2020 . TRIE: End-to-End Text Reading and Information Extraction for Document Understanding. In ACM MM 2020. ACM, Seattle WA USA, 1413--1422. https:\/\/doi.org\/10.1145\/3394171.3413900 Peng Zhang, Yunlu Xu, Zhanzhan Cheng, Shiliang Pu, Jing Lu, Liang Qiao, Yi Niu, and FeiWu. 2020. TRIE: End-to-End Text Reading and Information Extraction for Document Understanding. In ACM MM 2020. ACM, Seattle WA USA, 1413--1422. https:\/\/doi.org\/10.1145\/3394171.3413900"},{"key":"e_1_3_2_2_46_1","volume-title":"AMBERT: A Pre-trained Language Model with Multi-Grained Tokenization. In ACL 2021 Findings. Association for Computational Linguistics, Online, 421--435","author":"Zhang Xinsong","year":"2021","unstructured":"Xinsong Zhang , Pengshuai Li , and Hang Li . 2021 . AMBERT: A Pre-trained Language Model with Multi-Grained Tokenization. In ACL 2021 Findings. Association for Computational Linguistics, Online, 421--435 . https:\/\/doi.org\/10.18653\/v1\/2021.findings-acl.37 Xinsong Zhang, Pengshuai Li, and Hang Li. 2021. AMBERT: A Pre-trained Language Model with Multi-Grained Tokenization. In ACL 2021 Findings. Association for Computational Linguistics, Online, 421--435. https:\/\/doi.org\/10.18653\/v1\/2021.findings-acl.37"},{"key":"e_1_3_2_2_47_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.emnlp-main.218"},{"key":"e_1_3_2_2_48_1","volume-title":"CUTIE: Learning to Understand Documents with Convolutional Universal Text Information Extractor. arXiv:1903.12363 [cs] (June","author":"Zhao Xiaohui","year":"2019","unstructured":"Xiaohui Zhao , Endi Niu , Zhuo Wu , and Xiaoguang Wang . 2019 . CUTIE: Learning to Understand Documents with Convolutional Universal Text Information Extractor. arXiv:1903.12363 [cs] (June 2019). arXiv:1903.12363 (cs). Xiaohui Zhao, Endi Niu, Zhuo Wu, and Xiaoguang Wang. 2019. CUTIE: Learning to Understand Documents with Convolutional Universal Text Information Extractor. arXiv:1903.12363 [cs] (June 2019). arXiv:1903.12363 (cs)."}],"event":{"name":"MM '22: The 30th ACM International Conference on Multimedia","sponsor":["SIGMM ACM Special Interest Group on Multimedia"],"location":"Lisboa Portugal","acronym":"MM '22"},"container-title":["Proceedings of the 30th ACM International Conference on Multimedia"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3503161.3548406","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,12]],"date-time":"2023-01-12T02:38:42Z","timestamp":1673491122000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3503161.3548406"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,10,10]]},"references-count":48,"alternative-id":["10.1145\/3503161.3548406","10.1145\/3503161"],"URL":"https:\/\/doi.org\/10.1145\/3503161.3548406","relation":{},"subject":[],"published":{"date-parts":[[2022,10,10]]},"assertion":[{"value":"2022-10-10","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}