{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,7,25]],"date-time":"2024-07-25T07:19:43Z","timestamp":1721891983535},"reference-count":42,"publisher":"Association for Computing Machinery (ACM)","issue":"1","funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["61634004 and 61934002"],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Natural Science Foundation of Shaanxi Province for Distinguished Young Scholars","award":["2020JC-26"]},{"name":"National Key R8D Program of China","award":["2018YFE0202800"]},{"name":"The Youth Innovation Team of Shaanxi Universities"},{"name":"Open Project Program of the State Key Laboratory of Mathematical Engineering and Advanced Computing","award":["2019A01"]},{"DOI":"10.13039\/501100012226","name":"Fundamental Research Funds for the Central Universities","doi-asserted-by":"crossref","award":["JB190105 and XJS200119"],"id":[{"id":"10.13039\/501100012226","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["J. Emerg. Technol. Comput. Syst."],"published-print":{"date-parts":[[2021,1,31]]},"abstract":"Machine learning is at the heart of many services provided by data centers. To improve the performance of machine learning, several parameter (gradient) synchronization methods have been proposed in the literature. These synchronization algorithms have different communication characteristics and accordingly place different demands on the network architecture. However, traditional data-center networks cannot easily meet these demands. Therefore, we analyze the communication profiles associated with several common synchronization algorithms and propose a machine learning--oriented network architecture to match their characteristics. The proposed design, named Lotus, because it looks like a lotus flower, is a hybrid optical\/electrical architecture based on arrayed waveguide grating routers (AWGRs). In Lotus, a complete bipartite graph is used within the group to improve bisection bandwidth and scalability. Each pair of groups is connected by an optical link, and AWGRs between adjacent groups enhance path diversity and network reliability. We also present an efficient routing algorithm to make full use of the path diversity of Lotus, which leads to a further increase in network performance. Simulation results show that the network performance of Lotus is better than Dragonfly and 3D-Torus under realistic traffic patterns for different synchronization algorithms.<\/jats:p>","DOI":"10.1145\/3415749","type":"journal-article","created":{"date-parts":[[2020,9,17]],"date-time":"2020-09-17T10:18:31Z","timestamp":1600337911000},"page":"1-21","update-policy":"http:\/\/dx.doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":3,"title":["Lotus"],"prefix":"10.1145","volume":"17","author":[{"given":"Yunfeng","family":"Lu","sequence":"first","affiliation":[{"name":"The State Key Laboratory of Integrated Service Networks, Xidian University, Xi\u2019an, Shaanxi"}]},{"given":"Huaxi","family":"Gu","sequence":"additional","affiliation":[{"name":"The State Key Laboratory of Integrated Service Networks, Xidian University, Xi\u2019an, Shaanxi"}]},{"given":"Xiaoshan","family":"Yu","sequence":"additional","affiliation":[{"name":"The State Key Laboratory of Integrated Service Networks, Xidian University, Xi\u2019an, Shaanxi"}]},{"given":"Krishnendu","family":"Chakrabarty","sequence":"additional","affiliation":[{"name":"The Department of Electrical and Computer Engineering, Duke University, Durham, NC"}]}],"member":"320","published-online":{"date-parts":[[2020,9,17]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"[Online]. Retrieved from https:\/\/images.nvidia.com\/content\/pdf\/dgx1-v100-system-architecture-whitepaper.pdf. [Online]. Retrieved from https:\/\/images.nvidia.com\/content\/pdf\/dgx1-v100-system-architecture-whitepaper.pdf."},{"key":"e_1_2_1_2_1","unstructured":"[Online]. Retrieved from https:\/\/item.jd.com\/10448410875.html. [Online]. Retrieved from https:\/\/item.jd.com\/10448410875.html."},{"key":"e_1_2_1_3_1","unstructured":"[Online]. Retrieved from http:\/\/www.lusterinc.com\/. [Online]. Retrieved from http:\/\/www.lusterinc.com\/."},{"key":"e_1_2_1_4_1","unstructured":"[Online]. Retrieved from https:\/\/www.finisar.com\/. [Online]. Retrieved from https:\/\/www.finisar.com\/."},{"key":"e_1_2_1_5_1","volume-title":"Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI\u201916)","author":"Abadi Mart\u00edn","year":"2016"},{"key":"e_1_2_1_6_1","volume-title":"Proceedings of the 25th European MPI Users\u2019 Group Meeting (EuroMPI\u201918)","author":"Awan Ammar Ahmad"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/MCOM.2018.1600804"},{"key":"e_1_2_1_8_1","volume-title":"A neural probabilistic language model. J. Mach. Learn. Res. 3 (Mar","author":"Bengio Yoshua","year":"2003"},{"key":"e_1_2_1_9_1","volume-title":"Qixiang and Keren Bergman","author":"Cheng Madeleine Glick","year":"2020"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/TASL.2011.2134090"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/3229543.3229544"},{"key":"e_1_2_1_13_1","unstructured":"Andrew Gibiansky. 2017. Bringing HPC techniques to deep learning.(2017). Retrieved from http:\/\/research.baidu.com\/bringing-hpc-techniques-deep-learning. Andrew Gibiansky. 2017. Bringing HPC techniques to deep learning.(2017). Retrieved from http:\/\/research.baidu.com\/bringing-hpc-techniques-deep-learning."},{"key":"e_1_2_1_14_1","volume-title":"Deep Learning","author":"Goodfellow Ian"},{"key":"e_1_2_1_15_1","unstructured":"Priya Goyal Piotr Doll\u00e1r Ross B. Girshick Pieter Noordhuis Lukasz Wesolowski Aapo Kyrola Andrew Tulloch Yangqing Jia and Kaiming He. 2017. Accurate large minibatch SGD: Training imagenet in 1 Hour. Retrieved from http:\/\/arxiv.org\/abs\/1706.02677. Priya Goyal Piotr Doll\u00e1r Ross B. Girshick Pieter Noordhuis Lukasz Wesolowski Aapo Kyrola Andrew Tulloch Yangqing Jia and Kaiming He. 2017. Accurate large minibatch SGD: Training imagenet in 1 Hour. Retrieved from http:\/\/arxiv.org\/abs\/1706.02677."},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2017.17"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/JLT.2015.2510656"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/1851275.1851207"},{"key":"e_1_2_1_19_1","volume-title":"Proceedings of the IEEE International Symposium on High Performance Computer Architecture (HPCA\u201918)","author":"Hazelwood K.","year":"2018"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7299173"},{"key":"e_1_2_1_21_1","volume-title":"Proceedings of the International Symposium on Parallel Architectures, Algorithms, and Networks (I-SPAN\u201900)","author":"Horiguchi S.","year":"2000"},{"key":"e_1_2_1_22_1","unstructured":"Xianyan Jia Shutao Song Wei He Yangzihao Wang Haidong Rong Feihu Zhou Liqiang Xie Zhenyu Guo Yuanzhou Yang Liwei Yu Tiegang Chen Guangxiao Hu Shaohuai Shi and Xiaowen Chu. 2018. Highly scalable deep learning training system with mixed-precision: Training imagenet in four minutes. Retrieved from http:\/\/arxiv.org\/abs\/1807.11205. Xianyan Jia Shutao Song Wei He Yangzihao Wang Haidong Rong Feihu Zhou Liqiang Xie Zhenyu Guo Yuanzhou Yang Liwei Yu Tiegang Chen Guangxiao Hu Shaohuai Shi and Xiaowen Chu. 2018. Highly scalable deep learning training system with mixed-precision: Training imagenet in four minutes. Retrieved from http:\/\/arxiv.org\/abs\/1807.11205."},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/TADVP.2008.2011138"},{"key":"e_1_2_1_24_1","volume-title":"Proceedings of the International Symposium on Computer Architecture. 77--88","author":"Kim J.","year":"2008"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/2741948.2741965"},{"key":"e_1_2_1_26_1","volume-title":"Alexander J. Smola, Amr Ahmed, Vanja Josifovski, James Long, Eugene J. Shekita, and Bor-Yiing Su.","author":"Li Mu","year":"2014"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/TETC.2016.2593643"},{"key":"e_1_2_1_28_1","unstructured":"Hiroaki Mikami Hisahiro Suganuma Pongsakorn U.-Chupala Yoshiki Tanaka and Yuichi Kageyama. 2018. ImageNet\/ResNet-50 training in 224 seconds. Retrieved from http:\/\/arxiv.org\/abs\/1811.05233. Hiroaki Mikami Hisahiro Suganuma Pongsakorn U.-Chupala Yoshiki Tanaka and Yuichi Kageyama. 2018. ImageNet\/ResNet-50 training in 224 seconds. Retrieved from http:\/\/arxiv.org\/abs\/1811.05233."},{"key":"e_1_2_1_29_1","unstructured":"OPNET Modeler. 2009. Opnet Technologies Inc. Retrieved from https:\/\/opnetprojects.com\/opnet-network-simulator\/. OPNET Modeler. 2009. Opnet Technologies Inc. Retrieved from https:\/\/opnetprojects.com\/opnet-network-simulator\/."},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/JLT.2015.2395352"},{"key":"e_1_2_1_31_1","unstructured":"Baidu Research. 2017. baidu-allreduce. Retrieved from https:\/\/github.com\/baidu-research\/baidu-allreduce. Baidu Research. 2017. baidu-allreduce. Retrieved from https:\/\/github.com\/baidu-research\/baidu-allreduce."},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/3229591.3229594"},{"key":"e_1_2_1_33_1","unstructured":"Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. Retrieved from https:\/\/arxiv.org\/abs\/1409.1556. Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. Retrieved from https:\/\/arxiv.org\/abs\/1409.1556."},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/TETC.2014.2310455"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1109\/JLT.2011.2172989"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/3219617.3219656"},{"key":"e_1_2_1_37_1","volume-title":"Proceedings of the IEEE Conference on Computer Communications (INFOCOM\u201919)","author":"Wang S.","year":"2019"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/2987550.2987586"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1109\/TBDATA.2015.2472014"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1109\/JSTQE.2012.2209174"},{"key":"e_1_2_1_41_1","unstructured":"Chris Ying Sameer Kumar Dehao Chen Tao Wang and Youlong Cheng. 2018. Image classification at supercomputer scale. Retrieved from http:\/\/arxiv.org\/abs\/1811.06992. Chris Ying Sameer Kumar Dehao Chen Tao Wang and Youlong Cheng. 2018. Image classification at supercomputer scale. Retrieved from http:\/\/arxiv.org\/abs\/1811.06992."},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2017.2732987"}],"container-title":["ACM Journal on Emerging Technologies in Computing Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3415749","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,1]],"date-time":"2023-01-01T21:29:37Z","timestamp":1672608577000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3415749"}},"subtitle":["A New Topology for Large-scale Distributed Machine Learning"],"short-title":[],"issued":{"date-parts":[[2020,9,17]]},"references-count":42,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2021,1,31]]}},"alternative-id":["10.1145\/3415749"],"URL":"https:\/\/doi.org\/10.1145\/3415749","relation":{},"ISSN":["1550-4832","1550-4840"],"issn-type":[{"value":"1550-4832","type":"print"},{"value":"1550-4840","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,9,17]]},"assertion":[{"value":"2020-03-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2020-08-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2020-09-17","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}