{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,7,25]],"date-time":"2024-07-25T11:12:02Z","timestamp":1721905922025},"reference-count":55,"publisher":"Association for Computing Machinery (ACM)","issue":"1","funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["61972293"],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Reconfigurable Technol. Syst."],"published-print":{"date-parts":[[2024,3,31]]},"abstract":"In computer vision, the joint development of the algorithm and computing dimensions cannot be separated. Models and algorithms are constantly evolving, while hardware designs must adapt to new or updated algorithms. Reconfigurable devices are recognized as important platforms for computer vision applications because of their reconfigurability. There are two typical design approaches: customized and overlay design. However, existing work is unable to achieve both efficient performance and scalability to adapt to a wide range of models. To address both considerations, we propose a design framework based on reconfigurable devices to provide unified support for computer vision models. It provides software-programmable modules while leaving unit design space for problem-specific algorithms. Based on the proposed framework, we design a model mapping method and a hardware architecture with two processor arrays to enable dynamic and static reconfiguration, thereby relieving redesign pressure. In addition, resource consumption and efficiency can be balanced by adjusting the hyperparameter. In experiments on CNN, vision Transformer, and vision MLP models, our work\u2019s throughput is improved by 18.8x\u201333.6x and 1.4x\u20132.0x compared to CPU and GPU. Compared to others on the same platform, accelerators based on our framework can better balance resource consumption and efficiency.<\/jats:p>","DOI":"10.1145\/3635157","type":"journal-article","created":{"date-parts":[[2023,12,5]],"date-time":"2023-12-05T12:03:46Z","timestamp":1701777826000},"page":"1-31","update-policy":"http:\/\/dx.doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":2,"title":["A Hardware Design Framework for Computer Vision Models Based on Reconfigurable Devices"],"prefix":"10.1145","volume":"17","author":[{"ORCID":"http:\/\/orcid.org\/0000-0002-8281-0458","authenticated-orcid":false,"given":"Zimeng","family":"Fan","sequence":"first","affiliation":[{"name":"Wuhan University of Science and Technology, China"}]},{"ORCID":"http:\/\/orcid.org\/0000-0003-1204-8880","authenticated-orcid":false,"given":"Wei","family":"Hu","sequence":"additional","affiliation":[{"name":"Wuhan University of Science and Technology, China"}]},{"ORCID":"http:\/\/orcid.org\/0000-0001-7211-6991","authenticated-orcid":false,"given":"Fang","family":"Liu","sequence":"additional","affiliation":[{"name":"Wuhan University and Wuhan Institute of City, China"}]},{"ORCID":"http:\/\/orcid.org\/0000-0001-7302-2655","authenticated-orcid":false,"given":"Dian","family":"Xu","sequence":"additional","affiliation":[{"name":"Wuhan University of Science and Technology, China"}]},{"ORCID":"http:\/\/orcid.org\/0000-0002-4513-8498","authenticated-orcid":false,"given":"Hong","family":"Guo","sequence":"additional","affiliation":[{"name":"Wuhan University of Science and Technology, China"}]},{"ORCID":"http:\/\/orcid.org\/0000-0002-8648-993X","authenticated-orcid":false,"given":"Yanxiang","family":"He","sequence":"additional","affiliation":[{"name":"Wuhan University, China"}]},{"ORCID":"http:\/\/orcid.org\/0000-0002-8766-1105","authenticated-orcid":false,"given":"Min","family":"Peng","sequence":"additional","affiliation":[{"name":"Wuhan University, China"}]}],"member":"320","published-online":{"date-parts":[[2024,1,15]]},"reference":[{"key":"e_1_3_2_2_2","first-page":"411","volume-title":"2018 28th International Conference on Field Programmable Logic and Applications (FPL\u201918)","author":"Abdelfattah Mohamed S.","year":"2018","unstructured":"Mohamed S. Abdelfattah, David Han, Andrew Bitar, Roberto DiCecco, Shane O\u2019Connell, Nitika Shanker, Joseph Chu, Ian Prins, Joshua Fender, Andrew C. Ling, and Gordon R. Chiu. 2018. DLA: Compiler and FPGA overlay for neural network inference acceleration. In 2018 28th International Conference on Field Programmable Logic and Applications (FPL\u201918). IEEE, 411\u20134117."},{"issue":"2","key":"e_1_3_2_3_2","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3380548","article-title":"FFConv: An FPGA-based accelerator for fast convolution layers in convolutional neural networks","volume":"19","author":"Ahmad Afzal","year":"2020","unstructured":"Afzal Ahmad and Muhammad Adeel Pasha. 2020. FFConv: An FPGA-based accelerator for fast convolution layers in convolutional neural networks. ACM Transactions on Embedded Computing Systems (TECS) 19, 2 (2020), 1\u201324.","journal-title":"ACM Transactions on Embedded Computing Systems (TECS)"},{"key":"e_1_3_2_4_2","first-page":"53","volume-title":"2020 IEEE 31st International Conference on Application-specific Systems, Architectures and Processors (ASAP\u201920)","author":"Arora Aman","year":"2020","unstructured":"Aman Arora, Zhigang Wei, and Lizy K. John. 2020. Hamamu: Specializing FPGAs for ML applications by adding hard matrix multiplier blocks. In 2020 IEEE 31st International Conference on Application-specific Systems, Architectures and Processors (ASAP\u201920). IEEE, 53\u201360."},{"key":"e_1_3_2_5_2","article-title":"Layer normalization","author":"Ba Jimmy Lei","year":"2016","unstructured":"Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E. Hinton. 2016. Layer normalization. arXiv preprint arXiv:1607.06450 (2016).","journal-title":"arXiv preprint arXiv:1607.06450"},{"key":"e_1_3_2_6_2","doi-asserted-by":"crossref","first-page":"426","DOI":"10.1109\/ICECS.2011.6122304","volume-title":"2011 18th IEEE International Conference on Electronics, Circuits, and Systems","author":"Bahoura Mohammed","year":"2011","unstructured":"Mohammed Bahoura and Chan-Wang Park. 2011. FPGA-implementation of high-speed MLP neural network. In 2011 18th IEEE International Conference on Electronics, Circuits, and Systems. IEEE, 426\u2013429."},{"key":"e_1_3_2_7_2","first-page":"213","volume-title":"European Conference on Computer Vision","author":"Carion Nicolas","year":"2020","unstructured":"Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. 2020. End-to-end object detection with transformers. In European Conference on Computer Vision. Springer, 213\u2013229."},{"key":"e_1_3_2_8_2","first-page":"1691","volume-title":"International Conference on Machine Learning","author":"Chen Mark","year":"2020","unstructured":"Mark Chen, Alec Radford, Rewon Child, Jeffrey Wu, Heewoo Jun, David Luan, and Ilya Sutskever. 2020. Generative pretraining from pixels. In International Conference on Machine Learning. PMLR, 1691\u20131703."},{"key":"e_1_3_2_9_2","volume-title":"The 10th International Conference on Learning Representations (ICLR\u201922)","author":"Chen Shoufa","year":"2022","unstructured":"Shoufa Chen, Enze Xie, Chongjian Ge, Runjian Chen, Ding Liang, and Ping Luo. 2022. CycleMLP: A MLP-like architecture for dense prediction. In The 10th International Conference on Learning Representations (ICLR\u201922). OpenReview.net. https:\/\/openreview.net\/forum?id=NMEceG4v69Y"},{"key":"e_1_3_2_10_2","first-page":"1251","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition","author":"Chollet Fran\u00e7ois","year":"2017","unstructured":"Fran\u00e7ois Chollet. 2017. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1251\u20131258."},{"issue":"2","key":"e_1_3_2_11_2","doi-asserted-by":"crossref","first-page":"8","DOI":"10.1109\/MM.2018.022071131","article-title":"Serving DNNs in real time at datacenter scale with project brainwave","volume":"38","year":"2018","unstructured":"Eric S. Chung, Jeremy Fowers, Kalin Ovtcharov, Michael Papamichael, Adrian M. Caulfield, Todd Massengill, Ming Liu, Daniel Lo, Shlomi Alkalay, Michael Haselman, Maleen Abeydeera, Logan Adams, Hari Angepat, Christian Boehn, Derek Chiou, Oren Firestein, Alessandro Forin, Kang Su Gatlin, Mahdi Ghandi, Stephen Heil, Kyle Holohan, Ahmad El Husseini, Tam\u00e1s Juh\u00e1sz, Kara Kagi, Ratna Kovvuri, Sitaram Lanka, Friedel van Megen, Dima Mukhortov, Prerak Patel, Brandon Perez, Amanda Rapsang, Steven K. Reinhardt, Bita Rouhani, Adam Sapek, Raja Seera, Sangeetha Shekar, Balaji Sridharan, Gabriel Weisz, Lisa Woods, Phillip Yi Xiao, Dan Zhang, Ritchie Zhao, and Doug Burger. 2018. Serving DNNs in real time at datacenter scale with project brainwave. IEEE Micro 38, 2 (2018), 8\u201320.","journal-title":"IEEE Micro"},{"key":"e_1_3_2_12_2","doi-asserted-by":"crossref","first-page":"248","DOI":"10.1109\/CVPR.2009.5206848","volume-title":"2009 IEEE Conference on Computer Vision and Pattern Recognition","author":"Deng Jia","year":"2009","unstructured":"Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 248\u2013255."},{"key":"e_1_3_2_13_2","doi-asserted-by":"crossref","first-page":"265","DOI":"10.1109\/FPT.2016.7929549","volume-title":"2016 International Conference on Field-programmable Technology (FPT\u201916)","author":"DiCecco Roberto","year":"2016","unstructured":"Roberto DiCecco, Griffin Lacey, Jasmina Vasiljevic, Paul Chow, Graham Taylor, and Shawki Areibi. 2016. Caffeinated FPGAs: FPGA framework for convolutional neural networks. In 2016 International Conference on Field-programmable Technology (FPT\u201916). IEEE, 265\u2013268."},{"key":"e_1_3_2_14_2","article-title":"RepMLP: Re-parameterizing convolutions into fully-connected layers for image recognition","author":"Ding Xiaohan","year":"2021","unstructured":"Xiaohan Ding, Chunlong Xia, Xiangyu Zhang, Xiaojie Chu, Jungong Han, and Guiguang Ding. 2021. RepMLP: Re-parameterizing convolutions into fully-connected layers for image recognition. arXiv preprint arXiv:2105.01883 (2021).","journal-title":"arXiv preprint arXiv:2105.01883"},{"key":"e_1_3_2_15_2","volume-title":"9th International Conference on Learning Representations (ICLR\u201921)","author":"Dosovitskiy Alexey","year":"2021","unstructured":"Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2021. An image is worth 16x16 words: Transformers for image recognition at scale. In 9th International Conference on Learning Representations (ICLR\u201921). OpenReview.net. https:\/\/openreview.net\/forum?id=YicbFdNTTy"},{"key":"e_1_3_2_16_2","first-page":"250","volume-title":"2022 27th Asia and South Pacific Design Automation Conference (ASP-DAC\u201922)","author":"Fan Hongxiang","year":"2022","unstructured":"Hongxiang Fan, Martin Ferianc, Zhiqiang Que, He Li, Shuanglong Liu, Xinyu Niu, and Wayne Luk. 2022. Algorithm and hardware co-design for reconfigurable cnn accelerator. In 2022 27th Asia and South Pacific Design Automation Conference (ASP-DAC\u201922). IEEE, 250\u2013255."},{"key":"e_1_3_2_17_2","first-page":"629","volume-title":"2017 22nd Asia and South Pacific Design Automation Conference (ASP-DAC\u201917)","author":"Guan Yijin","year":"2017","unstructured":"Yijin Guan, Zhihang Yuan, Guangyu Sun, and Jason Cong. 2017. FPGA-based accelerator for long short-term memory recurrent neural networks. In 2017 22nd Asia and South Pacific Design Automation Conference (ASP-DAC\u201917). IEEE, 629\u2013634."},{"key":"e_1_3_2_18_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA47549.2020.00035"},{"key":"e_1_3_2_19_2","first-page":"770","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition","author":"He Kaiming","year":"2016","unstructured":"Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770\u2013778."},{"key":"e_1_3_2_20_2","first-page":"1","volume-title":"2017 27th International Conference on Field Programmable Logic and Applications (FPL\u201917)","author":"Jiao Li","year":"2017","unstructured":"Li Jiao, Cheng Luo, Wei Cao, Xuegong Zhou, and Lingli Wang. 2017. Accelerating low bit-width convolutional neural networks with embedded FPGA. In 2017 27th International Conference on Field Programmable Logic and Applications (FPL\u201917). IEEE, 1\u20134."},{"key":"e_1_3_2_21_2","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3079856.3080246","volume-title":"Proceedings of the 44th Annual International Symposium on Computer Architecture","year":"2017","unstructured":"Norman P. Jouppi, Cliff Young, Nishant Patil, David A. Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, Rick Boyle, Pierre-luc Cantin, Clifford Chao, Chris Clark, Jeremy Coriell, Mike Daley, Matt Dau, Jeffrey Dean, Ben Gelb, Tara Vazir Ghaemmaghami, Rajendra Gottipati, William Gulland, Robert Hagmann, C. Richard Ho, Doug Hogberg, John Hu, Robert Hundt, Dan Hurt, Julian Ibarz, Aaron Jaffey, Alek Jaworski, Alexander Kaplan, Harshit Khaitan, Daniel Killebrew, Andy Koch, Naveen Kumar, Steve Lacy, James Laudon, James Law, Diemthu Le, Chris Leary, Zhuyuan Liu, Kyle Lucke, Alan Lundin, Gordon MacKean, Adriana Maggiore, Maire Mahony, Kieran Miller, Rahul Nagarajan, Ravi Narayanaswami, Ray Ni, Kathy Nix, Thomas Norrie, Mark Omernick, Narayana Penukonda, Andy Phelps, Jonathan Ross, Matt Ross, Amir Salek, Emad Samadiani, Chris Severn, Gregory Sizikov, Matthew Snelham, Jed Souter, Dan Steinberg, Andy Swing, Mercedes Tan, Gregory Thorson, Bo Tian, Horia Toma, Erick Tuttle, Vijay Vasudevan, Richard Walter, Walter Wang, Eric Wilcox, and Doe Hyun Yoon. 2017. In-datacenter performance analysis of a tensor processing unit. In Proceedings of the 44th Annual International Symposium on Computer Architecture. 1\u201312."},{"key":"e_1_3_2_22_2","doi-asserted-by":"crossref","first-page":"154","DOI":"10.1109\/IPDPSW.2018.00031","volume-title":"2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW\u201918)","author":"K\u00e4stner Florian","year":"2018","unstructured":"Florian K\u00e4stner, Benedikt Jan\u00dfen, Frederik Kautz, Michael H\u00fcbner, and Giulio Corradi. 2018. Hardware\/software codesign for convolutional neural networks exploiting dynamic partial reconfiguration on PYNQ. In 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW\u201918). IEEE, 154\u2013161."},{"key":"e_1_3_2_23_2","volume-title":"The 2021 ACM\/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA\u201921)","author":"Khan Hamza","year":"2021","unstructured":"Hamza Khan, Asma Khan, Zainab Khan, Lun Bin Huang, Kun Wang, and Lei He. 2021. NPE: An FPGA-based overlay processor for natural language processing. In The 2021 ACM\/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA\u201921), Lesley Shannon and Michael Adler (Eds.). ACM, 227. 10.1145\/3431920. 3439477"},{"key":"e_1_3_2_24_2","doi-asserted-by":"publisher","DOI":"10.1145\/3370748.3406567"},{"key":"e_1_3_2_25_2","first-page":"612","volume-title":"2021 IEEE International Symposium on High-performance Computer Architecture (HPCA\u201921)","author":"Li Jiajun","year":"2021","unstructured":"Jiajun Li, Ahmed Louri, Avinash Karanth, and Razvan Bunescu. 2021. CSCNN: Algorithm-hardware co-design for CNN accelerators using centrosymmetric filters. In 2021 IEEE International Symposium on High-performance Computer Architecture (HPCA\u201921). IEEE, 612\u2013625."},{"key":"e_1_3_2_26_2","first-page":"9204","volume-title":"Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021 (NeurIPS\u201921)","author":"Liu Hanxiao","year":"2021","unstructured":"Hanxiao Liu, Zihang Dai, David R. So, and Quoc V. Le. 2021. Pay attention to MLPs. In Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021 (NeurIPS\u201921), Marc\u2019Aurelio Ranzato, Alina Beygelzimer, Yann N. Dauphin, Percy Liang, and Jennifer Wortman Vaughan (Eds.). 9204\u20139215. https:\/\/proceedings.neurips.cc\/paper\/2021\/hash\/4cc05b35c2f937c5bd9e7d41d3686fff-Abstract.html"},{"key":"e_1_3_2_27_2","first-page":"513","volume-title":"Design, Automation & Test in Europe Conference & Exhibition (DATE\u201921)","author":"Liu Zejian","year":"2021","unstructured":"Zejian Liu, Gang Li, and Jian Cheng. 2021. Hardware acceleration of fully quantized BERT for efficient natural language processing. In Design, Automation & Test in Europe Conference & Exhibition (DATE\u201921). IEEE, 513\u2013516. 10.23919\/DATE51398.2021.9474043"},{"key":"e_1_3_2_28_2","article-title":"Swin transformer: Hierarchical vision transformer using shifted windows","author":"Liu Ze","year":"2021","unstructured":"Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. 2021. Swin transformer: Hierarchical vision transformer using shifted windows. In International Conference on Computer Vision (ICCV\u201921).","journal-title":"International Conference on Computer Vision (ICCV\u201921)"},{"key":"e_1_3_2_29_2","first-page":"84","volume-title":"33rd IEEE International System-on-Chip Conference (SoCC\u201920)","author":"Lu Siyuan","year":"2020","unstructured":"Siyuan Lu, Meiqi Wang, Shuang Liang, Jun Lin, and Zhongfeng Wang. 2020. Hardware accelerator for multi-head attention and position-wise feed-forward in the transformer. In 33rd IEEE International System-on-Chip Conference (SoCC\u201920). IEEE, 84\u201389. 10.1109\/SOCC49529.2020.9524802"},{"key":"e_1_3_2_30_2","first-page":"116","volume-title":"Proceedings of the European Conference on Computer Vision (ECCV\u201918)","author":"Ma Ningning","year":"2018","unstructured":"Ningning Ma, Xiangyu Zhang, Hai-Tao Zheng, and Jian Sun. 2018. Shufflenet v2: Practical guidelines for efficient CNN architecture design. In Proceedings of the European Conference on Computer Vision (ECCV\u201918). 116\u2013131."},{"key":"e_1_3_2_31_2","first-page":"1520","volume-title":"Proceedings of the IEEE International Conference on Computer Vision","author":"Noh Hyeonwoo","year":"2015","unstructured":"Hyeonwoo Noh, Seunghoon Hong, and Bohyung Han. 2015. Learning deconvolution network for semantic segmentation. In Proceedings of the IEEE International Conference on Computer Vision. 1520\u20131528."},{"key":"e_1_3_2_32_2","first-page":"24","volume-title":"2021 31st International Conference on Field-Programmable Logic and Applications (FPL\u201921)","author":"Panahi Atiyehsadat","year":"2021","unstructured":"Atiyehsadat Panahi, Suhail Balsalama, Ange-Thierry Ishimwe, Joel Mandebi Mbongue, and David Andrews. 2021. A customizable domain-specific memory-centric FPGA overlay for machine learning applications. In 2021 31st International Conference on Field-Programmable Logic and Applications (FPL\u201921). IEEE, 24\u201327."},{"key":"e_1_3_2_33_2","volume-title":"Proceedings of Machine Learning and Systems 2020 (MLSys\u201920)","author":"Park Junki","year":"2020","unstructured":"Junki Park, Hyunsung Yoon, Daehyun Ahn, Jungwook Choi, and Jae-Joon Kim. 2020. OPTIMUS: OPTImized matrix multiplication structure for transformer neural network accelerator. In Proceedings of Machine Learning and Systems 2020 (MLSys\u201920), Inderjit S. Dhillon, Dimitris S. Papailiopoulos, and Vivienne Sze (Eds.). mlsys.org. https:\/\/proceedings.mlsys.org\/book\/311.pdf"},{"key":"e_1_3_2_34_2","doi-asserted-by":"crossref","first-page":"142","DOI":"10.1109\/ISQED51717.2021.9424344","volume-title":"2021 22nd International Symposium on Quality Electronic Design (ISQED\u201921)","author":"Peng Hongwu","year":"2021","unstructured":"Hongwu Peng, Shaoyi Huang, Tong Geng, Ang Li, Weiwen Jiang, Hang Liu, Shusen Wang, and Caiwen Ding. 2021. Accelerating transformer-based deep learning models on FPGAs using column balanced block pruning. In 2021 22nd International Symposium on Quality Electronic Design (ISQED\u201921). IEEE, 142\u2013148."},{"key":"e_1_3_2_35_2","first-page":"26","volume-title":"Proceedings of the 2016 ACM\/SIGDA International Symposium on Field-programmable Gate Arrays","author":"Qiu Jiantao","year":"2016","unstructured":"Jiantao Qiu, Jie Wang, Song Yao, Kaiyuan Guo, Boxun Li, Erjin Zhou, Jincheng Yu, Tianqi Tang, Ningyi Xu, Sen Song, Yu Wang, and Huazhong Yang. 2016. Going deeper with embedded FPGA platform for convolutional neural network. In Proceedings of the 2016 ACM\/SIGDA International Symposium on Field-programmable Gate Arrays. 26\u201335."},{"key":"e_1_3_2_36_2","first-page":"4510","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition","author":"Sandler Mark","year":"2018","unstructured":"Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. 2018. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4510\u20134520."},{"key":"e_1_3_2_37_2","first-page":"1394","volume-title":"Proceedings of the 59th ACM\/IEEE Design Automation Conference","author":"Sun Mengshu","year":"2022","unstructured":"Mengshu Sun, Zhengang Li, Alec Lu, Haoyu Ma, Geng Yuan, Yanyue Xie, Hao Tang, Yanyu Li, Miriam Leeser, Zhangyang Wang, Xue Lin, and Zhenman Fang. 2022. FPGA-aware automatic acceleration framework for vision transformer with mixed-scheme quantization: Late breaking results. In Proceedings of the 59th ACM\/IEEE Design Automation Conference. 1394\u20131395."},{"key":"e_1_3_2_38_2","article-title":"Vaqf: Fully automatic software-hardware co-design framework for low-bit vision transformer","author":"Sun Mengshu","year":"2022","unstructured":"Mengshu Sun, Haoyu Ma, Guoliang Kang, Yifan Jiang, Tianlong Chen, Xiaolong Ma, Zhangyang Wang, and Yanzhi Wang. 2022. Vaqf: Fully automatic software-hardware co-design framework for low-bit vision transformer. arXiv preprint arXiv:2201.06618 (2022).","journal-title":"arXiv preprint arXiv:2201.06618"},{"key":"e_1_3_2_39_2","first-page":"24261","volume-title":"Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021 (NeurIPS\u201921)","author":"Tolstikhin Ilya O.","year":"2021","unstructured":"Ilya O. Tolstikhin, Neil Houlsby, Alexander Kolesnikov, Lucas Beyer, Xiaohua Zhai, Thomas Unterthiner, Jessica Yung, Andreas Steiner, Daniel Keysers, Jakob Uszkoreit, Mario Lucic, and Alexey Dosovitskiy. 2021. MLP-Mixer: An all-MLP architecture for vision. In Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021 (NeurIPS\u201921), Marc\u2019Aurelio Ranzato, Alina Beygelzimer, Yann N. Dauphin, Percy Liang, and Jennifer Wortman Vaughan (Eds.). 24261\u201324272. https:\/\/proceedings.neurips.cc\/paper\/2021\/hash\/cba0a4ee5ccd02fda0fe3f9a3e7b89fe-Abstract.html"},{"key":"e_1_3_2_40_2","article-title":"ResMLP: Feedforward networks for image classification with data-efficient training","author":"Touvron Hugo","year":"2021","unstructured":"Hugo Touvron, Piotr Bojanowski, Mathilde Caron, Matthieu Cord, Alaaeldin El-Nouby, Edouard Grave, Gautier Izacard, Armand Joulin, Gabriel Synnaeve, Jakob Verbeek, and Herv\u00e9 J\u00e9gou. 2021. ResMLP: Feedforward networks for image classification with data-efficient training. arXiv preprint arXiv:2105.03404 (2021).","journal-title":"arXiv preprint arXiv:2105.03404"},{"key":"e_1_3_2_41_2","doi-asserted-by":"crossref","first-page":"65","DOI":"10.1145\/3020078.3021744","volume-title":"Proceedings of the 2017 ACM\/SIGDA International Symposium on Field-programmable Gate Arrays","author":"Umuroglu Yaman","year":"2017","unstructured":"Yaman Umuroglu, Nicholas J. Fraser, Giulio Gambardella, Michaela Blott, Philip Leong, Magnus Jahre, and Kees Vissers. 2017. Finn: A framework for fast, scalable binarized neural network inference. In Proceedings of the 2017 ACM\/SIGDA International Symposium on Field-programmable Gate Arrays. 65\u201374."},{"key":"e_1_3_2_42_2","first-page":"5998","volume-title":"Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett (Eds.). 5998\u20136008. https:\/\/proceedings.neurips.cc\/paper\/2017\/hash\/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html"},{"key":"e_1_3_2_43_2","unstructured":"Ross Wightman. 2019. PyTorch Image Models. https:\/\/github.com\/rwightman\/pytorch-image-models. 10.5281\/zenodo.4414861"},{"key":"e_1_3_2_44_2","first-page":"1","volume-title":"2018 IEEE 29th International Conference on Application-specific Systems, Architectures and Processors (ASAP\u201918)","author":"Wijeratne Sasindu","year":"2018","unstructured":"Sasindu Wijeratne, Sandaruwan Jayaweera, Mahesh Dananjaya, and Ajith Pasqual. 2018. Reconfigurable co-processor architecture with limited numerical precision to accelerate deep convolutional neural networks. In 2018 IEEE 29th International Conference on Application-specific Systems, Architectures and Processors (ASAP\u201918). IEEE, 1\u20137."},{"key":"e_1_3_2_45_2","first-page":"33","volume-title":"2021 31st International Conference on Field-programmable Logic and Applications (FPL\u201921)","author":"Wu Chen","year":"2021","unstructured":"Chen Wu, Jinming Zhuang, Kun Wang, and Lei He. 2021. MP-OPU: A mixed precision FPGA-based overlay processor for convolutional neural networks. In 2021 31st International Conference on Field-programmable Logic and Applications (FPL\u201921). IEEE, 33\u201337."},{"key":"e_1_3_2_46_2","doi-asserted-by":"crossref","first-page":"23","DOI":"10.1145\/3289602.3293902","volume-title":"Proceedings of the 2019 ACM\/SIGDA International Symposium on Field-programmable Gate Arrays","author":"Yang Yifan","year":"2019","unstructured":"Yifan Yang, Qijing Huang, Bichen Wu, Tianjun Zhang, Liang Ma, Giulio Gambardella, Michaela Blott, Luciano Lavagno, Kees Vissers, John Wawrzynek, and Kurt Keutzer. 2019. Synetgy: Algorithm-hardware co-design for convnet accelerators on embedded FPGAs. In Proceedings of the 2019 ACM\/SIGDA International Symposium on Field-programmable Gate Arrays. 23\u201332."},{"key":"e_1_3_2_47_2","first-page":"348","volume-title":"2022 25th Euromicro Conference on Digital System Design (DSD\u201922)","author":"Yi Changjae","year":"2022","unstructured":"Changjae Yi, Donghyun Kang, and Soonhoi Ha. 2022. Hardware-software codesign of a CNN accelerator. In 2022 25th Euromicro Conference on Digital System Design (DSD\u201922). IEEE, 348\u2013356."},{"key":"e_1_3_2_48_2","first-page":"472","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition","author":"Yu Fisher","year":"2017","unstructured":"Fisher Yu, Vladlen Koltun, and Thomas Funkhouser. 2017. Dilated residual networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 472\u2013480."},{"key":"e_1_3_2_49_2","doi-asserted-by":"publisher","DOI":"10.1109\/TVLSI.2019.2939726"},{"key":"e_1_3_2_50_2","first-page":"122","volume-title":"Proceedings of the 2020 ACM\/SIGDA International Symposium on Field-programmable Gate Arrays (FPGA\u201920)","author":"Yu Yunxuan","year":"2020","unstructured":"Yunxuan Yu, Tiandong Zhao, Kun Wang, and Lei He. 2020. Light-OPU: An FPGA-based overlay processor for lightweight convolutional neural networks. In Proceedings of the 2020 ACM\/SIGDA International Symposium on Field-programmable Gate Arrays (FPGA\u201920). Association for Computing Machinery, New York, NY, 122\u2013132. 10.1145\/3373087.3375311"},{"key":"e_1_3_2_51_2","doi-asserted-by":"publisher","DOI":"10.1109\/TVLSI.2020.2995741"},{"key":"e_1_3_2_52_2","doi-asserted-by":"crossref","first-page":"323","DOI":"10.1109\/SOCC.2016.7905501","volume-title":"2016 29th IEEE International System-on-chip Conference (SOCC\u201916)","author":"Yuan Bo","year":"2016","unstructured":"Bo Yuan. 2016. Efficient hardware architecture of softmax layer in deep neural network. In 2016 29th IEEE International System-on-chip Conference (SOCC\u201916). IEEE, 323\u2013326."},{"key":"e_1_3_2_53_2","doi-asserted-by":"crossref","first-page":"161","DOI":"10.1145\/2684746.2689060","volume-title":"Proceedings of the 2015 ACM\/SIGDA International Symposium on Field-programmable Gate Arrays","author":"Zhang Chen","year":"2015","unstructured":"Chen Zhang, Peng Li, Guangyu Sun, Yijin Guan, Bingjun Xiao, and Jason Cong. 2015. Optimizing FPGA-based accelerator design for deep convolutional neural networks. In Proceedings of the 2015 ACM\/SIGDA International Symposium on Field-programmable Gate Arrays. 161\u2013170."},{"issue":"5","key":"e_1_3_2_54_2","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3477002","article-title":"Algorithm-hardware co-design of attention mechanism on FPGA devices","volume":"20","author":"Zhang Xinyi","year":"2021","unstructured":"Xinyi Zhang, Yawen Wu, Peipei Zhou, Xulong Tang, and Jingtong Hu. 2021. Algorithm-hardware co-design of attention mechanism on FPGA devices. ACM Transactions on Embedded Computing Systems (TECS) 20, 5s (2021), 1\u201324.","journal-title":"ACM Transactions on Embedded Computing Systems (TECS)"},{"key":"e_1_3_2_55_2","first-page":"6848","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition","author":"Zhang Xiangyu","year":"2018","unstructured":"Xiangyu Zhang, Xinyu Zhou, Mengxiao Lin, and Jian Sun. 2018. Shufflenet: An extremely efficient convolutional neural network for mobile devices. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6848\u20136856."},{"key":"e_1_3_2_56_2","article-title":"A battle of network structures: An empirical study of CNN, transformer, and MLP","author":"Zhao Yucheng","year":"2021","unstructured":"Yucheng Zhao, Guangting Wang, Chuanxin Tang, Chong Luo, Wenjun Zeng, and Zheng-Jun Zha. 2021. A battle of network structures: An empirical study of CNN, transformer, and MLP. arXiv preprint arXiv:2108.13002 (2021).","journal-title":"arXiv preprint arXiv:2108.13002"}],"container-title":["ACM Transactions on Reconfigurable Technology and Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3635157","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,1,15]],"date-time":"2024-01-15T12:01:07Z","timestamp":1705320067000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3635157"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,1,15]]},"references-count":55,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2024,3,31]]}},"alternative-id":["10.1145\/3635157"],"URL":"https:\/\/doi.org\/10.1145\/3635157","relation":{},"ISSN":["1936-7406","1936-7414"],"issn-type":[{"value":"1936-7406","type":"print"},{"value":"1936-7414","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,1,15]]},"assertion":[{"value":"2023-03-19","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-11-22","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-01-15","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}