{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,9,6]],"date-time":"2024-09-06T12:01:23Z","timestamp":1725624083563},"publisher-location":"New York, NY, USA","reference-count":27,"publisher":"ACM","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2022,1,7]]},"DOI":"10.1145\/3512388.3512421","type":"proceedings-article","created":{"date-parts":[[2022,3,29]],"date-time":"2022-03-29T02:28:13Z","timestamp":1648520893000},"page":"220-225","update-policy":"http:\/\/dx.doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["RPViT: Vision Transformer Based on Region Proposal"],"prefix":"10.1145","author":[{"given":"Jing","family":"Ge","sequence":"first","affiliation":[{"name":"School of Computer Science and Technology, Beijing Institute of Technology, China"}]},{"given":"Qianxiang","family":"Wang","sequence":"additional","affiliation":[{"name":"School of Computer Science and Technology, Beijing Institute of Technology, China"}]},{"given":"Jiahui","family":"Tong","sequence":"additional","affiliation":[{"name":"Science and Technology on Complex System Control and Intelligent Agent Cooperation Laboratory, China and Beijing Electro-Mechanical Engineering Institute, China"}]},{"given":"Guangyu","family":"Gao","sequence":"additional","affiliation":[{"name":"School of Computer Science and Technology, Beijing Institute of Technology, China"}]}],"member":"320","published-online":{"date-parts":[[2022,3,28]]},"reference":[{"key":"e_1_3_2_2_1_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2014.49"},{"key":"e_1_3_2_2_2_1","doi-asserted-by":"publisher","DOI":"10.1007\/BF01934122"},{"key":"e_1_3_2_2_3_1","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btz259"},{"key":"e_1_3_2_2_4_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58452-8_13"},{"key":"e_1_3_2_2_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2014.414"},{"key":"e_1_3_2_2_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00020"},{"key":"e_1_3_2_2_7_1","unstructured":"Alexey Dosovitskiy Lucas Beyer Alexander Kolesnikov Dirk Weissenborn Xiaohua Zhai Thomas Unterthiner Mostafa Dehghani Matthias Minderer Georg Heigold and Sylvain Gelly. 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929. Alexey Dosovitskiy Lucas Beyer Alexander Kolesnikov Dirk Weissenborn Xiaohua Zhai Thomas Unterthiner Mostafa Dehghani Matthias Minderer Georg Heigold and Sylvain Gelly. 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929."},{"key":"e_1_3_2_2_8_1","first-page":"167","volume-title":"International journal of computer vision. 59, 2","author":"Felzenszwalb Pedro F","unstructured":"Pedro F Felzenszwalb and Daniel P Huttenlocher . 2004. Efficient graph-based image segmentation . International journal of computer vision. 59, 2 , 167 - 181 . Pedro F Felzenszwalb and Daniel P Huttenlocher. 2004. Efficient graph-based image segmentation. International journal of computer vision. 59, 2, 167-181."},{"key":"e_1_3_2_2_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.169"},{"key":"e_1_3_2_2_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2014.81"},{"key":"e_1_3_2_2_11_1","volume-title":"Spatial pyramid pooling in deep convolutional networks for visual recognition","author":"He Kaiming","year":"1904","unstructured":"Kaiming He , Xiangyu Zhang , Shaoqing Ren , and Jian Sun . 2015. Spatial pyramid pooling in deep convolutional networks for visual recognition . IEEE transactions on pattern analysis and machine intelligence. 37, 9, 1904 -1916. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE transactions on pattern analysis and machine intelligence. 37, 9, 1904-1916."},{"key":"e_1_3_2_2_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_2_2_13_1","unstructured":"Alex Krizhevsky Ilya Sutskever and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems. 25 1097-1105. Alex Krizhevsky Ilya Sutskever and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems. 25 1097-1105."},{"key":"e_1_3_2_2_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00199"},{"key":"e_1_3_2_2_15_1","doi-asserted-by":"crossref","unstructured":"Ze Liu Yutong Lin Yue Cao Han Hu Yixuan Wei Zheng Zhang Stephen Lin and Baining Guo. 2021. Swin transformer: Hierarchical vision transformer using shifted windows. arXiv preprint arXiv:2103.14030. Ze Liu Yutong Lin Yue Cao Han Hu Yixuan Wei Zheng Zhang Stephen Lin and Baining Guo. 2021. Swin transformer: Hierarchical vision transformer using shifted windows. arXiv preprint arXiv:2103.14030.","DOI":"10.1109\/ICCV48922.2021.00986"},{"key":"e_1_3_2_2_16_1","unstructured":"Alec Radford Karthik Narasimhan Tim Salimans and Ilya Sutskever. 2018. Improving language understanding by generative pre-training Alec Radford Karthik Narasimhan Tim Salimans and Ilya Sutskever. 2018. Improving language understanding by generative pre-training"},{"key":"e_1_3_2_2_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.308"},{"key":"e_1_3_2_2_18_1","volume-title":"International Conference on Machine Learning. PMLR, 10347-10357","author":"Touvron Hugo","year":"2021","unstructured":"Hugo Touvron , Matthieu Cord , Matthijs Douze , Francisco Massa , Alexandre Sablayrolles , and Herv\u00e9 J\u00e9gou . 2021 . Training data-efficient image transformers & distillation through attention . In International Conference on Machine Learning. PMLR, 10347-10357 . Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, and Herv\u00e9 J\u00e9gou. 2021. Training data-efficient image transformers & distillation through attention. In International Conference on Machine Learning. PMLR, 10347-10357."},{"key":"e_1_3_2_2_19_1","volume-title":"Theo Gevers, and Arnold WM Smeulders.","author":"Uijlings Jasper RR","year":"2013","unstructured":"Jasper RR Uijlings , Koen EA Van De Sande , Theo Gevers, and Arnold WM Smeulders. 2013 . Selective search for object recognition. International journal of computer vision. 104, 2, 154-171. Jasper RR Uijlings, Koen EA Van De Sande, Theo Gevers, and Arnold WM Smeulders. 2013. Selective search for object recognition. International journal of computer vision. 104, 2, 154-171."},{"key":"e_1_3_2_2_20_1","unstructured":"Ashish Vaswani Noam Shazeer Niki Parmar Jakob Uszkoreit Llion Jones Aidan N Gomez \u0141ukasz Kaiser and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems. 5998-6008. Ashish Vaswani Noam Shazeer Niki Parmar Jakob Uszkoreit Llion Jones Aidan N Gomez \u0141ukasz Kaiser and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems. 5998-6008."},{"key":"e_1_3_2_2_21_1","doi-asserted-by":"crossref","unstructured":"Wenhai Wang Enze Xie Xiang Li Deng-Ping Fan Kaitao Song Ding Liang Tong Lu Ping Luo and Ling Shao. 2021. Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. arXiv preprint arXiv:2102.12122. Wenhai Wang Enze Xie Xiang Li Deng-Ping Fan Kaitao Song Ding Liang Tong Lu Ping Luo and Ling Shao. 2021. Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. arXiv preprint arXiv:2102.12122.","DOI":"10.1109\/ICCV48922.2021.00061"},{"key":"e_1_3_2_2_22_1","volume-title":"Cvt: Introducing convolutions to vision transformers. arXiv preprint arXiv:2103.15808.","author":"Wu Haiping","year":"2021","unstructured":"Haiping Wu , Bin Xiao , Noel Codella , Mengchen Liu , Xiyang Dai , Lu Yuan , and Lei Zhang . 2021 . Cvt: Introducing convolutions to vision transformers. arXiv preprint arXiv:2103.15808. Haiping Wu, Bin Xiao, Noel Codella, Mengchen Liu, Xiyang Dai, Lu Yuan, and Lei Zhang. 2021. Cvt: Introducing convolutions to vision transformers. arXiv preprint arXiv:2103.15808."},{"key":"e_1_3_2_2_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00583"},{"key":"e_1_3_2_2_24_1","unstructured":"Fisher Yu and Vladlen Koltun. 2015. Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122. Fisher Yu and Vladlen Koltun. 2015. Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122."},{"key":"e_1_3_2_2_25_1","volume-title":"Jiashi Feng, and Shuicheng Yan.","author":"Yuan Li","year":"2021","unstructured":"Li Yuan , Yunpeng Chen , Tao Wang , Weihao Yu , Yujun Shi , Zihang Jiang , Francis EH Tay , Jiashi Feng, and Shuicheng Yan. 2021 . Tokens-to-token vit: Training vision transformers from scratch on imagenet. arXiv preprint arXiv:2101.11986. Li Yuan, Yunpeng Chen, Tao Wang, Weihao Yu, Yujun Shi, Zihang Jiang, Francis EH Tay, Jiashi Feng, and Shuicheng Yan. 2021. Tokens-to-token vit: Training vision transformers from scratch on imagenet. arXiv preprint arXiv:2101.11986."},{"key":"e_1_3_2_2_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00612"},{"key":"e_1_3_2_2_27_1","volume-title":"European conference on computer vision. Springer, 391-405","author":"Lawrence Zitnick C","year":"2014","unstructured":"C Lawrence Zitnick and Piotr Doll\u00e1r . 2014 . Edge boxes: Locating object proposals from edges . In European conference on computer vision. Springer, 391-405 . C Lawrence Zitnick and Piotr Doll\u00e1r. 2014. Edge boxes: Locating object proposals from edges. In European conference on computer vision. Springer, 391-405."}],"event":{"name":"ICIGP 2022: 2022 the 5th International Conference on Image and Graphics Processing","acronym":"ICIGP 2022","location":"Beijing China"},"container-title":["2022 the 5th International Conference on Image and Graphics Processing (ICIGP)"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3512388.3512421","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,14]],"date-time":"2023-01-14T20:58:22Z","timestamp":1673729902000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3512388.3512421"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,1,7]]},"references-count":27,"alternative-id":["10.1145\/3512388.3512421","10.1145\/3512388"],"URL":"https:\/\/doi.org\/10.1145\/3512388.3512421","relation":{},"subject":[],"published":{"date-parts":[[2022,1,7]]},"assertion":[{"value":"2022-03-28","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}