{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,9,15]],"date-time":"2024-09-15T22:31:20Z","timestamp":1726439480934},"reference-count":24,"publisher":"World Scientific Pub Co Pte Ltd","issue":"03n04","funder":[{"DOI":"10.13039\/501100003816","name":"Huawei Technologies","doi-asserted-by":"publisher","award":["HO2018085418"],"id":[{"id":"10.13039\/501100003816","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Parallel Process. Lett."],"published-print":{"date-parts":[[2022,9]]},"abstract":" The size of deep neural networks (DNNs) grows rapidly as the complexity of the machine learning algorithm increases. Distributed deep learning based on model parallelism has been widely used to satisfy the requirements of DNN training related to computation and memory. In this paper, we propose a training framework for pipeline parallelism called BaPipe (Balanced Pipeline) that can automatically explore methods to schedule pipeline parallelism and balanced partition strategies for DNN training on heterogeneous accelerator clusters. In BaPipe, each accelerator calculates the forward and backward propagation for the assigned partition of networks to implement an intra-batch pipeline parallelism strategy. By considering the parameters of DNN models as well as the computation, memory, and communication resources of each accelerator, BaPipe automatically selects the most suitable method of pipeline scheduling from among multiple proposed scheduling modes. It also uses a novel strategy to automatically investigate load balancing in the context of inter-layer partition, intra-layer partition, and coarse-grained partition. We trained such DNNs as VGG-16, ResNet-50, and Google\u2019s Neural Machine Translation (GNMT) on GPU clusters, and simulated the training-related performance of FPGA clusters. Compared with the state-of-the-art frameworks for data parallelism (DP) and pipeline parallelism, BaPipe provides a speedup of [Formula: see text] and [Formula: see text] of memory reduction on various homogeneous and heterogeneous platforms. <\/jats:p>","DOI":"10.1142\/s0129626422500050","type":"journal-article","created":{"date-parts":[[2022,8,24]],"date-time":"2022-08-24T10:17:16Z","timestamp":1661336236000},"source":"Crossref","is-referenced-by-count":3,"title":["BaPipe: Balanced Pipeline Parallelism for DNN Training"],"prefix":"10.1142","volume":"32","author":[{"given":"Letian","family":"Zhao","sequence":"first","affiliation":[{"name":"State Key Laboratory of Particle Detection and Electronics, University of Science and Technology of China, Hefei 230026, China"},{"name":"Institute of Microelectronics, Department of Physics, University of Science and Technology of China, Hefei 230026, China"}]},{"given":"Rui","family":"Xu","sequence":"additional","affiliation":[{"name":"State Key Laboratory of Particle Detection and Electronics, University of Science and Technology of China, Hefei 230026, China"},{"name":"Institute of Microelectronics, Department of Physics, University of Science and Technology of China, Hefei 230026, China"}]},{"given":"Tianqi","family":"Wang","sequence":"additional","affiliation":[{"name":"State Key Laboratory of Particle Detection and Electronics, University of Science and Technology of China, Hefei 230026, China"},{"name":"Institute of Microelectronics, Department of Physics, University of Science and Technology of China, Hefei 230026, China"}]},{"given":"Teng","family":"Tian","sequence":"additional","affiliation":[{"name":"State Key Laboratory of Particle Detection and Electronics, University of Science and Technology of China, Hefei 230026, China"},{"name":"Institute of Microelectronics, Department of Physics, University of Science and Technology of China, Hefei 230026, China"}]},{"given":"Xiaotian","family":"Wang","sequence":"additional","affiliation":[{"name":"State Key Laboratory of Particle Detection and Electronics, University of Science and Technology of China, Hefei 230026, China"},{"name":"Institute of Microelectronics, Department of Physics, University of Science and Technology of China, Hefei 230026, China"}]},{"given":"Wei","family":"Wu","sequence":"additional","affiliation":[{"name":"State Key Laboratory of Particle Detection and Electronics, University of Science and Technology of China, Hefei 230026, China"},{"name":"Institute of Microelectronics, Department of Physics, University of Science and Technology of China, Hefei 230026, China"}]},{"given":"Chio-In","family":"Ieong","sequence":"additional","affiliation":[{"name":"Huawei Technologies, Shenzhen 518129, China"}]},{"ORCID":"http:\/\/orcid.org\/0000-0002-4159-2925","authenticated-orcid":false,"given":"Xi","family":"Jin","sequence":"additional","affiliation":[{"name":"State Key Laboratory of Particle Detection and Electronics, University of Science and Technology of China, Hefei 230026, China"},{"name":"Institute of Microelectronics, Department of Physics, University of Science and Technology of China, Hefei 230026, China"}]}],"member":"219","published-online":{"date-parts":[[2022,8,19]]},"reference":[{"key":"S0129626422500050BIB001","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"S0129626422500050BIB002","doi-asserted-by":"publisher","DOI":"10.1145\/3065386"},{"key":"S0129626422500050BIB003","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46493-0_38"},{"key":"S0129626422500050BIB006","first-page":"5998","volume-title":"Advances in Neural Information Processing Systems","author":"Vaswani A.","year":"2017"},{"key":"S0129626422500050BIB007","first-page":"4171","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume\u00a01 (Long and Short Papers)","author":"Devlin J.","year":"2019"},{"key":"S0129626422500050BIB008","doi-asserted-by":"publisher","DOI":"10.1145\/3341301.3359646"},{"key":"S0129626422500050BIB009","first-page":"103","volume-title":"Advances in Neural Information Processing Systems","author":"Huang Y.","year":"2019"},{"key":"S0129626422500050BIB010","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2019.2935967"},{"key":"S0129626422500050BIB011","doi-asserted-by":"publisher","DOI":"10.1109\/ICAC.2019.00024"},{"key":"S0129626422500050BIB013","doi-asserted-by":"publisher","DOI":"10.1109\/FCCM.2018.00021"},{"key":"S0129626422500050BIB014","doi-asserted-by":"publisher","DOI":"10.1109\/TC.2008.29"},{"key":"S0129626422500050BIB015","doi-asserted-by":"publisher","DOI":"10.1145\/3079856.3080244"},{"key":"S0129626422500050BIB016","doi-asserted-by":"publisher","DOI":"10.1109\/FPL.2018.00074"},{"key":"S0129626422500050BIB018","first-page":"307","volume-title":"2020 USENIX Annual Technical Conference (USENIX ATC\u00a020)","author":"Park J. H.","year":"2020"},{"key":"S0129626422500050BIB020","doi-asserted-by":"publisher","DOI":"10.1109\/ICDCS.2019.00028"},{"key":"S0129626422500050BIB021","doi-asserted-by":"publisher","DOI":"10.1145\/3320060"},{"key":"S0129626422500050BIB022","doi-asserted-by":"publisher","DOI":"10.1145\/3363554"},{"key":"S0129626422500050BIB023","doi-asserted-by":"publisher","DOI":"10.1109\/ICDCS.2018.00020"},{"key":"S0129626422500050BIB025","first-page":"10414","volume-title":"Advances in Neural Information Processing Systems","author":"Shazeer N.","year":"2018"},{"key":"S0129626422500050BIB027","doi-asserted-by":"publisher","DOI":"10.1145\/3322795.3331461"},{"volume-title":"SysML\u00a02019","year":"2019","author":"Jia Z.","key":"S0129626422500050BIB028"},{"key":"S0129626422500050BIB030","first-page":"8026","volume-title":"Advances in Neural Information Processing Systems","author":"Paszke A.","year":"2019"},{"key":"S0129626422500050BIB031","first-page":"265","volume-title":"12th USENIX Symposium on Operating Systems Design and Implementation (OSDI\u00a016)","author":"Abadi M.","year":"2016"},{"key":"S0129626422500050BIB032","doi-asserted-by":"publisher","DOI":"10.1145\/2647868.2654889"}],"container-title":["Parallel Processing Letters"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.worldscientific.com\/doi\/pdf\/10.1142\/S0129626422500050","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,10,10]],"date-time":"2022-10-10T04:39:21Z","timestamp":1665376761000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.worldscientific.com\/doi\/10.1142\/S0129626422500050"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,8,19]]},"references-count":24,"journal-issue":{"issue":"03n04","published-print":{"date-parts":[[2022,9]]}},"alternative-id":["10.1142\/S0129626422500050"],"URL":"https:\/\/doi.org\/10.1142\/s0129626422500050","relation":{},"ISSN":["0129-6264","1793-642X"],"issn-type":[{"type":"print","value":"0129-6264"},{"type":"electronic","value":"1793-642X"}],"subject":[],"published":{"date-parts":[[2022,8,19]]}}}