{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,11,21]],"date-time":"2024-11-21T05:28:43Z","timestamp":1732166923715,"version":"3.28.0"},"reference-count":63,"publisher":"Association for Computing Machinery (ACM)","issue":"4","funder":[{"DOI":"10.13039\/501100012166","name":"National Key R&D Program of China","doi-asserted-by":"crossref","award":["2022YFB4501702"],"id":[{"id":"10.13039\/501100012166","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["62122053"],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2024,12,31]]},"abstract":"Autonomous micromobility systems (AMS) such as low-speed minicabs and robots are thriving. In AMS, multiple Deep Neural Networks execute in parallel on heterogeneous AI accelerators. An emerging paradigm called Accelerator Level Parallelism (ALP) suggests managing accelerators holistically. However, there lacks a specialized and practical solution populating ALP for an AMS, where the varying real-time requirements under different working scenarios bring an opportunity to dynamically tradeoff between latency and efficiency. Furthermore, accelerator heterogeneity introduces enormous configuration space, and the shared-memory architecture results in dynamic bandwidth interference.<\/jats:p>\n \n In this article, we propose\n \n A\n 2<\/jats:sup>\n <\/jats:italic>\n , a novel AMS resource manager optimizing energy and memory space efficiency under variable latency constraints. We gain insight from prior\n Learn&Control<\/jats:italic>\n scheme to design an\n Analyze&Adapt<\/jats:italic>\n scheme specialized for heterogeneous AI accelerators under shared-memory architecture. It features analyzing the system thoroughly offline to support two-step adaptation online. We build a prototype of\n \n A\n 2<\/jats:sup>\n <\/jats:italic>\n and evaluate it on a commercial edge platform. We show that\n \n A\n 2<\/jats:sup>\n <\/jats:italic>\n achieves 32.8% improvements in power and 13.8% in memory compared with control-based methods. As for timeliness enhancement,\n \n A\n 2<\/jats:sup>\n <\/jats:italic>\n reduces the deadline violation rate by 9.2 percentage points (12.8% \u2192 3.6%) on average compared to directly porting\n Learn&Control<\/jats:italic>\n methods.\n <\/jats:p>","DOI":"10.1145\/3688611","type":"journal-article","created":{"date-parts":[[2024,8,21]],"date-time":"2024-08-21T23:31:39Z","timestamp":1724283099000},"page":"1-20","update-policy":"http:\/\/dx.doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["A\n 2<\/sup>\n : Towards Accelerator Level Parallelism for Autonomous Micromobility Systems"],"prefix":"10.1145","volume":"21","author":[{"ORCID":"http:\/\/orcid.org\/0000-0002-1326-2654","authenticated-orcid":false,"given":"Lingyu","family":"Sun","sequence":"first","affiliation":[{"name":"Shanghai Jiao Tong University, Shanghai, China"}]},{"ORCID":"http:\/\/orcid.org\/0000-0003-4372-7851","authenticated-orcid":false,"given":"Xiaofeng","family":"Hou","sequence":"additional","affiliation":[{"name":"Shanghai Jiao Tong University, Shanghai, China"}]},{"ORCID":"http:\/\/orcid.org\/0000-0001-6218-4659","authenticated-orcid":false,"given":"Chao","family":"Li","sequence":"additional","affiliation":[{"name":"Shanghai Jiao Tong University, Shanghai, China"}]},{"ORCID":"http:\/\/orcid.org\/0000-0003-0378-2311","authenticated-orcid":false,"given":"Jiacheng","family":"Liu","sequence":"additional","affiliation":[{"name":"The Chinese University of Hong Kong, Hong Kong, Hong Kong"}]},{"ORCID":"http:\/\/orcid.org\/0000-0003-3764-8065","authenticated-orcid":false,"given":"Xinkai","family":"Wang","sequence":"additional","affiliation":[{"name":"Shanghai Jiao Tong University, Shanghai, China"}]},{"ORCID":"http:\/\/orcid.org\/0000-0001-5832-0347","authenticated-orcid":false,"given":"Quan","family":"Chen","sequence":"additional","affiliation":[{"name":"Shanghai Jiao Tong University, Shanghai, China"}]},{"ORCID":"http:\/\/orcid.org\/0000-0003-0034-2302","authenticated-orcid":false,"given":"Minyi","family":"Guo","sequence":"additional","affiliation":[{"name":"Shanghai Jiao Tong University, Shanghai, China"}]}],"member":"320","published-online":{"date-parts":[[2024,11,20]]},"reference":[{"key":"e_1_3_2_2_2","unstructured":"2019. NVIDIA Tegra Xavier. Retrieved from https:\/\/en.wikichip.org\/wiki\/nvidia\/tegra\/xavier"},{"key":"e_1_3_2_3_2","unstructured":"2022. Retrieved from https:\/\/www.nvidia.cn\/self-driving-cars\/drive-platform\/hardware\/"},{"key":"e_1_3_2_4_2","unstructured":"2022. Stress-ng Test Tool. Retrieved from https:\/\/github.com\/ColinIanKing\/stress-ng\/tree\/a71aad422b1b934981d10e7b3be866b2744f19c7"},{"key":"e_1_3_2_5_2","unstructured":"2023. Europe\u2019s 6-wheeled Delivery Robots Begin Invasion of US Campuses. Retrieved from https:\/\/sifted.eu\/articles\/starship-robot-delivery\/"},{"key":"e_1_3_2_6_2","unstructured":"2023. NDT Localization. Retrieved from https:\/\/github.com\/koide3\/hdl_localization"},{"key":"e_1_3_2_7_2","unstructured":"2024. Mobileye EyeQ. Retrieved from https:\/\/www.mobileye.com\/technology\/eyeq-chip\/"},{"key":"e_1_3_2_8_2","unstructured":"2024. VPI Dense Optical Flow. Retrieved from https:\/\/docs.nvidia.com\/vpi\/sample_optflow_dense.html"},{"key":"e_1_3_2_9_2","unstructured":"2024. VPI Temporal Noise Reduction. Retrieved from https:\/\/docs.nvidia.com\/vpi\/sample_tnr.html"},{"key":"e_1_3_2_10_2","article-title":"HetSched: Quality-of-mission aware scheduling for autonomous vehicle SoCs","author":"Amarnath Aporva","year":"2022","unstructured":"Aporva Amarnath, Subhankar Pal, Hiwot Kassa, Augusto Vega, Alper Buyuktosunoglu, Hubertus Franke, John-David Wellman, Ronald Dreslinski, and Pradip Bose. 2022. HetSched: Quality-of-mission aware scheduling for autonomous vehicle SoCs. arXiv:2203.13396. Retrieved from https:\/\/arxiv.org\/abs\/2203.13396","journal-title":"arXiv:2203.13396"},{"key":"e_1_3_2_11_2","first-page":"499","volume-title":"Proceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation","author":"Bai Zhihao","year":"2020","unstructured":"Zhihao Bai, Zhen Zhang, Yibo Zhu, and Xin Jin. 2020. PipeSwitch: Fast pipelined context switching for deep learning applications. In Proceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation. 499\u2013514."},{"key":"e_1_3_2_12_2","first-page":"371","volume-title":"Proceedings of the 2020 USENIX Annual Technical Conference","author":"Bateni Soroush","year":"2020","unstructured":"Soroush Bateni and Cong Liu. 2020. NeuOS: A latency-predictable multi-dimensional optimization framework for DNN-driven autonomous systems. In Proceedings of the 2020 USENIX Annual Technical Conference. 371\u2013385."},{"key":"e_1_3_2_13_2","doi-asserted-by":"publisher","DOI":"10.1109\/RTAS48715.2020.00007"},{"key":"e_1_3_2_14_2","doi-asserted-by":"publisher","DOI":"10.1109\/RTSS.2018.00020"},{"key":"e_1_3_2_15_2","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2018.00077"},{"key":"e_1_3_2_16_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIE.2020.3009585"},{"key":"e_1_3_2_17_2","doi-asserted-by":"publisher","DOI":"10.1145\/3489517.3530572"},{"key":"e_1_3_2_18_2","doi-asserted-by":"publisher","DOI":"10.1109\/4235.996017"},{"key":"e_1_3_2_19_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-70928-2_60"},{"key":"e_1_3_2_20_2","doi-asserted-by":"publisher","DOI":"10.1145\/3307650.3326633"},{"key":"e_1_3_2_21_2","doi-asserted-by":"publisher","DOI":"10.1145\/3352460.3358312"},{"key":"e_1_3_2_22_2","doi-asserted-by":"publisher","DOI":"10.1109\/RTSS.2018.00019"},{"key":"e_1_3_2_23_2","first-page":"421","volume-title":"Proceedings of the 2016 USENIX Annual Technical Conference","author":"Farrell Anne","year":"2016","unstructured":"Anne Farrell and Henry Hoffmann. 2016. MEANTIME: Achieving both minimal energy and timeliness with approximate computing. In Proceedings of the 2016 USENIX Annual Technical Conference. 421\u2013435."},{"key":"e_1_3_2_24_2","first-page":"154","volume-title":"Proceedings of an International Conference on Genetic Algorithms and Their Applications","volume":"154","author":"Goldberg David E.","year":"1985","unstructured":"David E. Goldberg and Robert Lingle. 1985. Alleles, loci, and the traveling salesman problem. In Proceedings of an International Conference on Genetic Algorithms and Their Applications, Vol. 154. Lawrence Erlbaum Hillsdale, NJ, 154\u2013159."},{"key":"e_1_3_2_25_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA48891.2023.10160831"},{"key":"e_1_3_2_26_2","doi-asserted-by":"publisher","DOI":"10.1109\/RTAS48715.2020.000-8"},{"key":"e_1_3_2_27_2","doi-asserted-by":"publisher","DOI":"10.1145\/3460970"},{"key":"e_1_3_2_28_2","unstructured":"Junjie Huang Guan Huang Zheng Zhu Yun Ye and Dalong Du. 2022. Bevdet: High-performance multi-camera 3d object detection in bird-eye-view. arXiv:2112.11790. Retrieved from https:\/\/arxiv.org\/abs\/2112.11790"},{"key":"e_1_3_2_29_2","doi-asserted-by":"publisher","DOI":"10.1109\/RTAS.2015.7108419"},{"key":"e_1_3_2_30_2","doi-asserted-by":"publisher","DOI":"10.1109\/RTSS49844.2020.00027"},{"key":"e_1_3_2_31_2","doi-asserted-by":"publisher","DOI":"10.1109\/RTSS55097.2022.00033"},{"key":"e_1_3_2_32_2","doi-asserted-by":"publisher","DOI":"10.1109\/RTSS52674.2021.00038"},{"key":"e_1_3_2_33_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA53966.2022.00065"},{"key":"e_1_3_2_34_2","doi-asserted-by":"publisher","DOI":"10.1109\/CPSNA.2015.23"},{"key":"e_1_3_2_35_2","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO56248.2022.00033"},{"key":"e_1_3_2_36_2","unstructured":"Hyoukjun Kwon Liangzhen Lai Michael Pellauer Tushar Krishna Yu-Hsin Chen and Vikas Chandra. 2020. Heterogeneous dataflow accelerators for Multi-DNN Workloads. arXiv:1909.07437. Retrieved from https:\/\/arxiv.org\/abs\/1909.07437"},{"key":"e_1_3_2_37_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA51647.2021.00016"},{"key":"e_1_3_2_38_2","doi-asserted-by":"publisher","DOI":"10.1109\/VLSI-DAT.2018.8373244"},{"key":"e_1_3_2_39_2","unstructured":"Tingting Liang Hongwei Xie Kaicheng Yu Zhongyu Xia Zhiwei Lin YongtaoWang Tao Tang BingWang and Zhi Tang. 2022. Bevfusion: A simple and robust lidar-camera fusion framework. In Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems. 10421\u201310434."},{"key":"e_1_3_2_40_2","doi-asserted-by":"publisher","DOI":"10.1145\/3173162.3173191"},{"key":"e_1_3_2_41_2","doi-asserted-by":"publisher","DOI":"10.1109\/RTSS55097.2022.00034"},{"key":"e_1_3_2_42_2","doi-asserted-by":"publisher","DOI":"10.1109\/MC.2017.3001256"},{"key":"e_1_3_2_43_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA48891.2023.10160968"},{"key":"e_1_3_2_44_2","doi-asserted-by":"crossref","unstructured":"Raju Machupalli Masum Hossain and Mrinal Mandal. 2022. Review of ASIC accelerators for deep neural network. Microprocessors and Microsystems 89 1 (2022) 104441.","DOI":"10.1016\/j.micpro.2022.104441"},{"key":"e_1_3_2_45_2","article-title":"POAS: A high-performance scheduling framework for exploiting accelerator level parallelism","author":"Mart\u00ednez Pablo Antonio","year":"2022","unstructured":"Pablo Antonio Mart\u00ednez, Gregorio Bernab\u00e9, and Jose Manuel Garc\u00eda. 2022. POAS: A high-performance scheduling framework for exploiting accelerator level parallelism. arXiv:2209.10245. Retrieved from https:\/\/arxiv.org\/abs\/2209.10245","journal-title":"arXiv:2209.10245"},{"key":"e_1_3_2_46_2","doi-asserted-by":"publisher","DOI":"10.1145\/3296957.3173184"},{"key":"e_1_3_2_47_2","doi-asserted-by":"publisher","DOI":"10.1145\/3410463.3414671"},{"key":"e_1_3_2_48_2","doi-asserted-by":"publisher","DOI":"10.1109\/FPT.2016.7929192"},{"key":"e_1_3_2_49_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58568-6_12"},{"key":"e_1_3_2_50_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00700"},{"key":"e_1_3_2_51_2","article-title":"Tackling variabilities in autonomous driving","author":"Qi Yuqiong","year":"2021","unstructured":"Yuqiong Qi, Yang Hu, Haibin Wu, Shen Li, Haiyu Mao, Xiaochun Ye, Dongrui Fan, and Ninghui Sun. 2021. Tackling variabilities in autonomous driving. arXiv:2104.10415. Retrieved from https:\/\/arxiv.org\/abs\/2104.10415","journal-title":"arXiv:2104.10415"},{"key":"e_1_3_2_52_2","doi-asserted-by":"publisher","DOI":"10.1016\/B978-0-08-050684-5.50022-7"},{"key":"e_1_3_2_53_2","doi-asserted-by":"publisher","DOI":"10.5555\/827268.828976"},{"key":"e_1_3_2_54_2","first-page":"353","volume-title":"Proceedings of the 2020 USENIX Annual Technical Conference","author":"Wan Chengcheng","year":"2020","unstructured":"Chengcheng Wan, Muhammad Santriaji, Eri Rogers, Henry Hoffmann, Michael Maire, and Shan Lu. 2020. ALERT: Accurate learning for energy and timeliness. In Proceedings of the 2020 USENIX Annual Technical Conference. 353\u2013369."},{"key":"e_1_3_2_55_2","doi-asserted-by":"publisher","DOI":"10.1109\/RTSS52674.2021.00021"},{"key":"e_1_3_2_56_2","doi-asserted-by":"publisher","DOI":"10.1109\/RTSS46320.2019.00042"},{"key":"e_1_3_2_57_2","doi-asserted-by":"publisher","DOI":"10.1109\/IISWC59245.2023.00014"},{"key":"e_1_3_2_58_2","doi-asserted-by":"publisher","DOI":"10.1145\/3466752.3480101"},{"key":"e_1_3_2_59_2","doi-asserted-by":"publisher","DOI":"10.1109\/RTAS.2019.00033"},{"key":"e_1_3_2_60_2","doi-asserted-by":"publisher","DOI":"10.1007\/s40903-015-0032-7"},{"key":"e_1_3_2_61_2","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO50266.2020.00089"},{"key":"e_1_3_2_62_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCD50377.2020.00031"},{"key":"e_1_3_2_63_2","doi-asserted-by":"publisher","DOI":"10.1109\/IV47402.2020.9304602"},{"key":"e_1_3_2_64_2","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO50266.2020.00059"}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3688611","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,11,20]],"date-time":"2024-11-20T12:59:03Z","timestamp":1732107543000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3688611"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,11,20]]},"references-count":63,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2024,12,31]]}},"alternative-id":["10.1145\/3688611"],"URL":"https:\/\/doi.org\/10.1145\/3688611","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"type":"print","value":"1544-3566"},{"type":"electronic","value":"1544-3973"}],"subject":[],"published":{"date-parts":[[2024,11,20]]},"assertion":[{"value":"2024-02-07","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-07-25","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-11-20","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}