{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,9,2]],"date-time":"2024-09-02T07:46:33Z","timestamp":1725263193493},"reference-count":25,"publisher":"IOP Publishing","issue":"4","license":[{"start":{"date-parts":[[2022,11,4]],"date-time":"2022-11-04T00:00:00Z","timestamp":1667520000000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2022,11,4]],"date-time":"2022-11-04T00:00:00Z","timestamp":1667520000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/iopscience.iop.org\/info\/page\/text-and-data-mining"}],"funder":[{"DOI":"10.13039\/100010661","name":"Horizon 2020 Framework Programme","doi-asserted-by":"publisher","award":["772369"],"id":[{"id":"10.13039\/100010661","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["iopscience.iop.org"],"crossmark-restriction":false},"short-container-title":["Mach. Learn.: Sci. Technol."],"published-print":{"date-parts":[[2022,12,1]]},"abstract":"Abstract<\/jats:title>In this paper, we investigate how field programmable gate arrays can serve as hardware accelerators for real-time semantic segmentation tasks relevant for autonomous driving. Considering compressed versions of the ENet convolutional neural network architecture, we demonstrate a fully-on-chip deployment with a latency of 4.9 ms per image, using less than 30% of the available resources on a Xilinx ZCU102 evaluation board. The latency is reduced to 3 ms per image when increasing the batch size to ten, corresponding to the use case where the autonomous vehicle receives inputs from multiple cameras simultaneously. We show, through aggressive filter reduction and heterogeneous quantization-aware training, and an optimized implementation of convolutional layers, that the power consumption and resource utilization can be significantly reduced while maintaining accuracy on the Cityscapes dataset.<\/jats:p>","DOI":"10.1088\/2632-2153\/ac9cb5","type":"journal-article","created":{"date-parts":[[2022,10,21]],"date-time":"2022-10-21T23:25:55Z","timestamp":1666394755000},"page":"045011","update-policy":"http:\/\/dx.doi.org\/10.1088\/crossmark-policy","source":"Crossref","is-referenced-by-count":8,"title":["Real-time semantic segmentation on FPGAs for autonomous vehicles with hls4ml"],"prefix":"10.1088","volume":"3","author":[{"ORCID":"http:\/\/orcid.org\/0000-0002-4660-9757","authenticated-orcid":true,"given":"Nicol\u00f2","family":"Ghielmetti","sequence":"first","affiliation":[]},{"ORCID":"http:\/\/orcid.org\/0000-0003-3651-0232","authenticated-orcid":false,"given":"Vladimir","family":"Loncar","sequence":"additional","affiliation":[]},{"ORCID":"http:\/\/orcid.org\/0000-0003-1939-4268","authenticated-orcid":true,"given":"Maurizio","family":"Pierini","sequence":"additional","affiliation":[]},{"given":"Marcel","family":"Roed","sequence":"additional","affiliation":[]},{"given":"Sioni","family":"Summers","sequence":"additional","affiliation":[]},{"ORCID":"http:\/\/orcid.org\/0000-0002-7671-243X","authenticated-orcid":true,"given":"Thea","family":"Aarrestad","sequence":"additional","affiliation":[]},{"given":"Christoffer","family":"Petersson","sequence":"additional","affiliation":[]},{"given":"Hampus","family":"Linander","sequence":"additional","affiliation":[]},{"ORCID":"http:\/\/orcid.org\/0000-0002-0055-2935","authenticated-orcid":true,"given":"Jennifer","family":"Ngadiuba","sequence":"additional","affiliation":[]},{"given":"Kelvin","family":"Lin","sequence":"additional","affiliation":[]},{"ORCID":"http:\/\/orcid.org\/0000-0001-8189-3741","authenticated-orcid":true,"given":"Philip","family":"Harris","sequence":"additional","affiliation":[]}],"member":"266","published-online":{"date-parts":[[2022,11,4]]},"reference":[{"key":"mlstac9cb5bib1","article-title":"CERN yellow reports: monographs","author":"Apollinari","year":"2017"},{"key":"mlstac9cb5bib2","author":"Garrett","year":"2010"},{"key":"mlstac9cb5bib3","article-title":"Benchmarking tinyml systems: challenges and direction","author":"Banbury","year":"2021"},{"key":"mlstac9cb5bib4","first-page":"pp 873","article-title":"Large-scale deep unsupervised learning using graphics processors","author":"Raina","year":"2009"},{"key":"mlstac9cb5bib5","article-title":"On efficient real-time semantic segmentation: a survey","author":"Holder","year":"2022"},{"key":"mlstac9cb5bib6","doi-asserted-by":"publisher","DOI":"10.1088\/1748-0221\/13\/07\/P07027","article-title":"Fast inference of deep neural networks in FPGAs for particle physics","volume":"13","author":"Duarte","year":"2018","journal-title":"J. Instrum."},{"key":"mlstac9cb5bib7","doi-asserted-by":"publisher","DOI":"10.1088\/1748-0221\/15\/05\/P05026","article-title":"Fast inference of boosted decision trees in FPGAs for particle physics","volume":"15","author":"Summers","year":"2020","journal-title":"J. Instrum."},{"key":"mlstac9cb5bib8","doi-asserted-by":"publisher","DOI":"10.1088\/2632-2153\/aba042","article-title":"Compressing deep neural networks on FPGAs to binary and ternary precision with hls4ml","volume":"2","author":"Loncar","year":"2021","journal-title":"Mach. Learn.: Sci. Technol."},{"key":"mlstac9cb5bib9","doi-asserted-by":"publisher","DOI":"10.3389\/fdata.2020.598927","article-title":"Distance-weighted graph neural networks on fpgas for real-time particle reconstruction in high energy physics","volume":"3","author":"Iiyama","year":"2021","journal-title":"Front. Big Data"},{"key":"mlstac9cb5bib10","article-title":"Accelerated charged particle tracking with graph neural networks onFPGAs","volume":"vol 12","author":"Heintz","year":"2020"},{"key":"mlstac9cb5bib11","doi-asserted-by":"publisher","first-page":"969","DOI":"10.1140\/epjc\/s10052-021-09770-w","article-title":"Model compression and simplification pipelines for fast deep neural network inference in FPGAs in HEP","volume":"81","author":"Francescato","year":"2021","journal-title":"Eur. Phys. J. C"},{"key":"mlstac9cb5bib11","doi-asserted-by":"publisher","first-page":"1064","DOI":"10.1140\/epjc\/s10052-021-09875-2","volume":"81","author":"Francescato","year":"2021","journal-title":"Eur. Phys. J. C"},{"key":"mlstac9cb5bib12","article-title":"Fast muon tracking with machine learning implemented in FPGA","author":"Sun","year":"2022"},{"key":"mlstac9cb5bib13","doi-asserted-by":"publisher","first-page":"675","DOI":"10.1038\/s42256-021-00356-5","article-title":"Automatic heterogeneous quantization of deep neural networks for low-latency inference on the edge for particle detectors","volume":"3","author":"Coelho","year":"2021","journal-title":"Nat. Mach. Intell."},{"key":"mlstac9cb5bib14","doi-asserted-by":"publisher","DOI":"10.1088\/2632-2153\/ac0ea1","article-title":"Fast convolutional neural networks on FPGAs with hls4ml","volume":"2","author":"Aarrestad","year":"2021","journal-title":"Mach. Learn.: Sci. Technol."},{"key":"mlstac9cb5bib15","article-title":"hls4ml: an open-source codesign workflow to empower scientific low-power machine learning devices","volume":"vol 3","author":"Fahim","year":"2021"},{"key":"mlstac9cb5bib16","article-title":"Qkeras","author":"Coelho","year":"2019"},{"key":"mlstac9cb5bib17","article-title":"Enet: a deep neural network architecture for real-time semantic segmentation","author":"Paszke","year":"2016"},{"key":"mlstac9cb5bib18","article-title":"Xilinx ZCU102 evaluation board"},{"key":"mlstac9cb5bib19","doi-asserted-by":"crossref","article-title":"The cityscapes dataset for semantic urban scene understanding","author":"Cordts","year":"2016","DOI":"10.1109\/CVPR.2016.350"},{"key":"mlstac9cb5bib20","doi-asserted-by":"crossref","article-title":"Rich feature hierarchies for accurate object detection and semantic segmentation","author":"Girshick","year":"2014","DOI":"10.1109\/CVPR.2014.81"},{"key":"mlstac9cb5bib21","article-title":"Keras surgeon","author":"Whetton","year":"2016"},{"key":"mlstac9cb5bib22","article-title":"A survey of quantization methods for efficient neural network inference","author":"Gholami","year":"2021"},{"key":"mlstac9cb5bib23","doi-asserted-by":"publisher","first-page":"675","DOI":"10.1038\/s42256-021-00356-5","article-title":"Automatic heterogeneous quantization of deep neural networks for low-latency inference on the edge for particle detectors","volume":"3","author":"Coelho","year":"2021","journal-title":"Nat. Mach. Intell."},{"key":"mlstac9cb5bib24","first-page":"pp 321","article-title":"Design and implementation of real-time semantic segmentation network based on FPGA","author":"Jia","year":"2021"}],"container-title":["Machine Learning: Science and Technology"],"original-title":[],"link":[{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ac9cb5","content-type":"text\/html","content-version":"am","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ac9cb5\/pdf","content-type":"application\/pdf","content-version":"am","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ac9cb5","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ac9cb5\/pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ac9cb5\/pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ac9cb5\/pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ac9cb5\/pdf","content-type":"application\/pdf","content-version":"am","intended-application":"similarity-checking"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ac9cb5\/pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,3,9]],"date-time":"2023-03-09T14:22:14Z","timestamp":1678371734000},"score":1,"resource":{"primary":{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ac9cb5"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,11,4]]},"references-count":25,"journal-issue":{"issue":"4","published-online":{"date-parts":[[2022,11,4]]},"published-print":{"date-parts":[[2022,12,1]]}},"URL":"https:\/\/doi.org\/10.1088\/2632-2153\/ac9cb5","relation":{},"ISSN":["2632-2153"],"issn-type":[{"value":"2632-2153","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,11,4]]},"assertion":[{"value":"Real-time semantic segmentation on FPGAs for autonomous vehicles with hls4ml","name":"article_title","label":"Article Title"},{"value":"Machine Learning: Science and Technology","name":"journal_title","label":"Journal Title"},{"value":"paper","name":"article_type","label":"Article Type"},{"value":"\u00a9 2022 The Author(s). Published by IOP Publishing Ltd","name":"copyright_information","label":"Copyright Information"},{"value":"2022-05-25","name":"date_received","label":"Date Received","group":{"name":"publication_dates","label":"Publication dates"}},{"value":"2022-10-21","name":"date_accepted","label":"Date Accepted","group":{"name":"publication_dates","label":"Publication dates"}},{"value":"2022-11-04","name":"date_epub","label":"Online publication date","group":{"name":"publication_dates","label":"Publication dates"}}]}}